Network Working Group | S. Leonard |
Internet-Draft | Penango, Inc. |
Intended status: Standards Track | October 13, 2015 |
Expires: April 15, 2016 |
Alternative Local-Part Synthesis for DANE
draft-seantek-dane-alps-00
This document describes how to synthesize alternative local-part productions when looking up email addresses with DANE for S/MIME and OpenPGP.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 15, 2016.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
DNS-Based Authentication of Named Entities (DANE) [RFC6698] is used to store signed assertions about names related to domain names directly in the DNS. When storing assertions related to email addresses, [SMIMEA] and [OPENPGPKEY] transform the local-part into a terminal label that is stored in a DNS zone that is associated with the domain portion of email address.
Tension exists because local-parts and DNS labels do not share the same syntax or semantics. The local-part production of [RFC5322] is formally case-sensitive and can technically represent any ASCII character, i.e., octets in the range 00-7F (although only a much smaller range is in common use, and only certain characters can be used to ensure deliverability via SMTP [RFC5321]). In contrast, DNS labels are comprised of up to 63 octets, but are compared case-insensitively in the ASCII range [RFC1035].
Several solutions have been proposed. These include constructing case-sensitive versus case-insensitive queries, each option exclusive of the other. [DNSMBOX] adds two proposals: online DNS servers for the zone that compute (and sign) responses on-the-fly, and regular expressions that can be converted into Deterministic Finite Automata (DFA), which are then stored piecemeal in DNS records.
This document proposes a rule-based mechanism for clients to synthesize alternative local-parts (ALPs) that are likely to be used as inputs to the DNS-mailbox record lookup process, obviating the need for servers to compute (or precompute) every possible permutation of a domain's local-part mailbox strings.
Readers should be familiar with [RFC6698], [SMIMEA], and [OPENPGPKEY]. Readers who plan on implementing email address internationalization [RFC6530] should also be familiar with equivalence issues in [UNICODE], such as case conversion, case folding, and normalization forms.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
All integers are encoded in network byte order.
The bare term character, without qualification, refers to a Unicode scalar value [UNICODE].
The ALPR resource record (RR) is an ordered list of rules to synthesize alternative local-part (ALP) strings. This document uses the term “alternative” rather than “canonical” deliberately. There is no notion of a canonical mailbox form in [RFC5321] or [RFC5322]. If two different strings happen to deliver to the same mailbox, that is a fortunate coincidence but my no means a guaranteed outcome. Furthermore, email’s recent foray into internationalization [RFC6530] means that many characters can be represented with different code points; choosing to which mailbox to deliver such addressed messages is firmly in the purview of the receiving mail server.
The type value for the ALPR RRtype is defined in Section 6.1. The ALPR resource record is class independent. The ALPR resource record has no special TTL requirements.
There are two purposes of this protocol. The first purpose is to generate alternative local-parts, not equivalent ones. Local-parts are considered alternative to each other if they deliver to the same mailbox [RFC5322]. In contrast, local-parts are equivalent to each other if they deliver to the same place and represent the same recipient. In conforming mail systems, postMaster@example.com and POSTmaster@example.com are equivalent: mail systems are expected to treat them exactly the same [RFC5321]. In contrast, postmaster.john@example.com and postmaster.george@example.com might be alternatives to postmaster@example.com, but even if they deliver to the same mailbox, mail may not end up at the same place or be read by the same recipient. (George and John may share a mailbox, but have different folders, even though they are jointly responsible for postmaster mail.) [[NB: this definition really needs more discussion. Specifically shouldn't DANE suggest that two local-parts are the same or very similar if they share the same PUBLIC KEY?]]
The second purpose is to ease implementation of DANE mailbox-related technologies such as SMIMEA and OPENPGPKEY, by making it more convenient for DNS operators to publish static lists in their zones. “Convenience” for DNS operators necessarily implies more work for clients. Without a protocol such as the one described in this document, the onus would be on the DNS server to provide matching public key records for diverse inputs on the fly (see [DNSMBOX]).
Conformance is as follows:
An implementation MAY optimize network retrieval by querying for the highest-priority item first, i.e., the original local-part. (Due to interleaving, the priority order of the ALPs is not determined until all rules are applied.)
The results of the DANE mailbox lookup protocols are considered to be in the ALP priority order. The “best” (most faithful) association is the one that most closely matches the input, and the “worst” (least faithful) association is the one that least closely matches the input, i.e., the ALP derived from the application of the greatest number of rules. Whether an application accepts, rejects, or takes other action (e.g., presents the inaccuracy to the user) is implementation-defined. However, to the extent that public key results differ in the priority order, an application SHOULD consider the public keys to be “better” at the top of the priority order. For example, if 3 ALPs produce 8 local-parts and 8 different public keys, encrypting a message to all 8 public keys will increase the attack surface (as some keys may be compromised) but is very unlikely to increase readability, for a well-maintained DNS zone. For example, if a user refers to two addresses: user+topsecret@example.com, and user@example.com, and uses separate public keys for each address, the user's security intent would be thwarted if all mail encrypted to the key at user+topsecret@example.com were also encrypted to user@example.com. Perhaps the user only wishes to expose his or her private key for user+topsecret@example.com to trustworthy computing devices, thus making a convenience tradeoff that is up to the user—not the protocol—to dictate.
The ALPR wire format is comprised of a 16-bit unsigned integer indicating the number of rules, followed by a sequence of rules. Each rule is comprised of:
a 16-bit unsigned integer for the rule identifier;
a 16-bit specifier for the rule parameters, including the length of the rule parameters (if any);
a variable number of octets comprising the rule parameters.
Rule identifiers are registered with IANA in a sub-registry for this RRtype.
The specifier has the following meanings: When the high-order bit is 0, the specifier is a 16-bit unsigned integer indicating the length of the octets of the subsequent rule parameters. Therefore, the parameters cannot exceed 32767 octets. The rule parameters are well-formed UTF-8 sequences, delimited by the octet 0xFF (0b11111111). Conveniently, 0xFF does not appear in UTF-8.
When the high-order quartet is 0x4 (0b1000), the specifier is a 12-bit unsigned integer comprised of the remaining three quartets, representing the number of 32-bit signed integers present in the subsequent rule parameters. (Therefore, each incremental value represents an additional four octets in the rule parameters.) The rule parameters are 32-bit signed integers.
When the three high-order quartets are 0xFFF (0b111111111111), the specifier is the rule parameter itself, as indicated by the low-order quartet. There are no octets that follow that comprise the rule parameter.
The following special values are defined:
0xC (0b1100) means false, or =0.
0xD (0b1101) means true, or >0.
0xE (0b1110) means null, or <0.
Additionally:
0xF (0b1111) means undefined or "no parameters given".
All other bit combinations are reserved.
The ALPR presentation format is an ordered sequence of rules. Rules are delimited by one or more newlines.
Each rule is comprised of:
The algorithms for formulating the domain names for the records are modified under this document, compared to the SMIMEA and OPENPGPKEY records, as follows:
The following rules are the initial rules that ALP implementations MUST implement. If the rule or rule range make no remarks about parameters, then parameters MUST NOT be given.
The rule identifier (0) is reserved; assignment requires future Standards Action. Otherwise, it has no special meaning.
This rule makes a local-part case-insensitive by mapping the uppercase ASCII range (41-5A) into the lowercase ASCII range (61-7A). Only ASCII characters are affected.
This rule makes a local-part case-insensitive by mapping the lowercase ASCII range (61-7A) into the uppercase ASCII range (41-5A). Only ASCII characters are affected.
This rule facilitates "dot removal" and the removal of other characters more generally. The single parameter is a string containing a list of characters to be removed.
This rule eliminates ranges of irrelevant characters. The single parameter is a string containing pairs of characters; each pair is inclusive and is comprised of the first (lowest valued), followed by the last (highest valued), characters to be removed. If the first character has a code point value greater than the last character, the pair has no effect. If the string has an odd number of characters, then the last character is paired with U+10FFFF, i.e., the end of the Unicode range.
This rule eliminates sub-addresses. The single parameter is a string containing a list of characters that delimit the sub-address. The first character from the list that appears in the local-part delimits the sub-address portion; both that character and all subsequent characters are removed.
This rule eliminates sub-addresses. The single parameter is a string containing a list of characters that delimit the sub-address. The first character from the list that appears in the local-part delimits the sub-address portion; that character is kept but all subsequent characters are removed.
This rule contracts the first substring in a local-part to one character, prior to the delimiter. For example, it takes input of john.smith and outputs jsmith. The single parameter is a string containing a list of characters that delimit the name. The first character from the list that appears in the local-part delimits names; both that character and all prior characters, except for the first character, are removed. (Beware that this rule operates on "characters", i.e., Unicode scalar values. It will eliminate combining characters ([UNICODE] Conformance Clause D52) that follow the first character/code point.)
This rule is identical to (7) except that the delimiter is preserved. For example, it takes input of john.smith and outputs j.smith.
This rule contracts the first substring in a local-part to one abstract character, prior to the delimiter. For example, it takes input of A'_l_m_o_s_.K=u=r=t=a'=g= and outputs A'_K=u=r=t=a'=g=. (In this example, ' is U+0301 COMBINING ACUTE ACCENT, _ is U+0332 COMBINING LOW LINE, and = is U+0333 COMBINING DOUBLE LOW LINE.) The single parameter is a string containing a list of characters that delimit the name. The first character from the list that appears in the local-part delimits names; both that character and all prior characters, except for the first extended combining character sequence ([UNICODE] Conformance Clause D56a), are removed.
This rule is identical to (9) except that the delimiter is preserved. For example, it takes input of A'_l_m_o_s_.K=u=r=t=a'=g= and outputs A'_.K=u=r=t=a'=g=.
This rule preserves the first characters. The single parameter is a positive integer indicating the number of characters from the beginning of the input to preserve.
This rule preserves the last characters. The single parameter is a positive integer indicating the number of characters from the end of the input to preserve.
This rule preserves the first extended combining character sequences. The single parameter is a positive integer indicating the number of extended combining character sequences from the beginning of the input to preserve.
This rule preserves the last extended combining character sequences. The single parameter is a positive integer indicating the number of extended combining character sequences from the end of the input to preserve.
If a prefix string appears in the input, then the output of this rule is just that prefix. The parameters are strings that are the prefixes. Prefix matching is performed in parameter order; the first match wins. This rule faciliates the variable envelope return path (VERP) technique.
If a suffix string appears in the input, then the output of this rule is just that suffix. The parameters are strings that are the suffixes. Suffix matching is performed in parameter order; the first match wins. This rule faciliates the variable envelope return path (VERP) technique.
This rule applies Unicode Normalization Form C to the input: toNFC(X).
This rule applies Unicode Normalization Form D to the input: toNFD(X).
This rule applies Unicode Normalization Form KC to the input: toNFKC(X).
This rule applies Unicode Normalization Form KD to the input: toNFKD(X).
These rules apply the various case algorithms defined in [UNICODE]. All rules in this section take a single parameter, which is a string identifying the language [BCP47] for tailored operations. The parameter is REQUIRED (i.e., it MUST NOT be omitted). The default casing algorithm is accessed with the empty string parameter "". A conformant implementation MUST implement the default casing algorithm, as well as the casing algorithm for en (which [UNICODE] defines as the same thing). Other tailored operations are OPTIONAL.
This rule applies Unicode Uppercase Mapping to the input: toLowercase(X).
This rule applies Unicode Uppercase Mapping to the input: toLowercase(X).
This rule applies Unicode Titlecase Mapping to the input: toTitlecase(X). A conformant implementation MAY implement this rule.
This rule applies Unicode Case Folding to the input: toCasefold(X).
This rule applies Unicode NFKC Case Folding to the input: toNFKC_Casefold(X). The Unicode Standard recommends using this algorithm “when doing caseless matching of strings interpreted as identifiers.” See Section 3.13 of [UNICODE] and [UAX31]. The purpose of this rule is to facilitate “identifier caseless matching”: toNFKC_Casefold(NFD(X)). See Conformance Clause D147 of [UNICODE]. To implement identifier caseless matching, specify rule 257 [Rule_NFD] followed immediately by rule 388 (this rule).
This rule range is meant for miscellaneous standardized algorithms that apply to characters beyond ASCII. [[TODO: Complete.]]
This rule maps the following characters in the input to . U+002E FULL STOP:
The purpose of this rule is to implement similar behavior on the local-part side as on the domain side of an address, by treating “label-separators” similarly. See Section 6 of [UTS46].
[[TODO: complete. This section is meant for PRECIS-related rules that do not fall in the rules listed above.]]
"helOMy.wo\"rld+top\!seCreT"@example.com
The C in "seCret" actually refers to the Unicode code points U+0043 LATIN CAPITAL LETTER C, and U+0327 COMBINING CEDILLA.
Figure 1: Example Email Addresss (addr-spec)
An example address is provided above. The UTF-8 encoding of the abstract character represented by C is 43 CC A7.
example.com. IN ALPR ( 1 5 "+-" 3 "." 4 33 47 258 )
This record is stored at example.com.
Figure 2: Example ALPR Record
In Figure 2, the first rule is ASCII lowercase mapping, the second rule is subaddress truncation with + and - as delimiters, the third rule is character removal of the period U+002E, the fourth rule ostensibly is range removal, and the fifth rule is Unicode normalization to Form KC.
From the input, the local-part string is helOMy.wo"rld+top!seCreT (note the single embedded U+0022 QUOTATION MARK). Rules 1, 5, 3, and 258 will be applied to it in that order; Rule 4 will not be applied because the parameter data is malformed (Rule 4 only accepts a single string parameter).
Rule 1 Rule 5 Rule 3 Rule 4 Rule 258 helOMy.wo"rld+top!seCreT ------+------------------+----------------> \ \ \ \ \ \ \ helOMy.wo"rld+top!se[C]reT --> \ \ \ (C3 87) \ \ helOMywo"rld+top!seCret ---+----------------> \ \ \ \ \ helOMywo"rld+top!se[C]reT --> \ \ \ helOMy.wo"rld --+-----------------------------------> \ \ \ helOMywo"rld ----------------------------> \ helomy.wo"rld+top!secret -+------------------+----------------> \ (63 CC A7) \ \ \ \ helomy.wo"rld+top!se[c]ret --> \ \ (C3 A7) \ helomywo"rld+top!secret -----+----------------> \ \ \ helomywo"rld+top!se[c]ret --> \ \ helomy.wo"rld --+-----------------------------------> \ helomywo"rld ------------------------->
Alternative Local-Part Synthesis Tree, producing 12 results out of 32 theoretically possible results (16 possible results after eliminating Rule 4).
ALPS Tree
Rule 1 outputs lowercase in the ASCII range: O becomes o, M becomes m, T becomes t, and C with cedilla combining character becomes c with cedilla combining character (UTF-8: 63 CC A7). Note that if C were the precomposed character U+00C7 LATIN CAPITAL LETTER C WITH CEDILLA, it would not have been affected by Rule 1. Rules 1 and 2 do not distinguish combining character sequences.
Rule 5 chops off the sub-address after the + plus sign. Now we have 4 local-parts (the maximum possible).
Rule 3 removes the dot. Now we have 8 local-parts (the maximum possible).
Rule 4 is skipped because it is malformed.
Rule 258 normalizes the local-parts, converting the C or c with cedilla respectively to U+00C7 LATIN CAPITAL LETTER C WITH CEDILLA (UTF-8: C3 87) or U+00E7 LATIN SMALL LETTER C WITH CEDILLA (UTF-8: C3 A7). Because Rule 5 chopped off that sequence in its alternatives, now we have only 4 additional local-parts—12 total.
This document uses a new DNS RRtype, ALPR, whose value will be allocated by IANA from the Resource Record (RR) TYPEs subregistry of the Domain Name System (DNS) Parameters registry.
A. Submission Date: 2015-10-10 B. Submission Type: [X] New RRTYPE [ ] Modification to existing RRTYPE C. Contact Information for submitter: Name: Sean Leonard Email Address: dev+ietf@seantek.com International telephone number: Other contact handles: D. Motivation for the new RRTYPE application? Support for the document draft-seantek-dane-alps. E. Description of the proposed RR type. See draft-ietf-dane-alps for a complete description. F. What existing RRTYPE or RRTYPEs come closest to filling that need and why are they unsatisfactory? None. G. What mnemonic is requested for the new RRTYPE (optional)? ALPR H. Does the requested RRTYPE make use of any existing IANA Registry or require the creation of a new IANA sub-registry in DNS Parameters? Yes: a sub-registry for rules. [[TBD.]] I. Does the proposal require/expect any changes in DNS servers/resolvers that prevent the new type from being processed as an unknown RRTYPE (see [RFC3597])? No. J. Comments: The document is an Internet-Draft for discussion in the DANE WG.
DNS zones that are signed with DNSSEC can be used to authenticate the ALPR resource records. For auditability purposes, an auditor needs to include both the key-mailbox association records (e.g., SMIMEA and OPENPGPKEY), and ALPR resource records.
[UNICODE] | The Unicode Consortium, "The Unicode Standard, Version 8.0.0", ISBN 978-1-936213-10-8, August 2015. Mountain View, CA: The Unicode Consortium. |
[RFC1035] | Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, November 1987. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC5321] | Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, DOI 10.17487/RFC5321, October 2008. |
[RFC5322] | Resnick, P., "Internet Message Format", RFC 5322, DOI 10.17487/RFC5322, October 2008. |
[BCP47] | Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, September 2009. |
[RFC6698] | Hoffman, P. and J. Schlyter, "The DNS-Based Authentication of Named Entities (DANE) Transport Layer Security (TLS) Protocol: TLSA", RFC 6698, DOI 10.17487/RFC6698, August 2012. |
[SMIMEA] | Hoffman, P. and J. Schlyter, "Using Secure DNS to Associate Certificates with Domain Names For S/MIME", Internet-Draft draft-ietf-dane-smime-09, August 2015. |
[OPENPGPKEY] | Wouters, P., "Using DANE to Associate OpenPGP public keys with email addresses", Internet-Draft draft-ietf-dane-openpgpkey-05, August 2015. |
[RFC6530] | Klensin, J. and Y. Ko, "Overview and Framework for Internationalized Email", RFC 6530, DOI 10.17487/RFC6530, February 2012. |
[DNSMBOX] | Levine, J., "Encoding mailbox local-parts in the DNS", Internet-Draft draft-levine-dns-mailbox-01, September 2015. |
[UAX31] | Davis, M., "Unicode Identifier and Pattern Syntax", UAX #31, June 2015. |
[UTS46] | Davis, M. and M. Suignard, "Unicode IDNA Compatibility Processing", UTS #46, June 2015. |