Network Working Group P. Saint-Andre
Internet-Draft Cisco Systems, Inc.
Intended status: Informational July 12, 2013
Expires: January 13, 2014

Username Interoperability
draft-saintandre-username-interop-01

Abstract

Various Internet protocols define constructs for usernames, i.e., the local part of an address such as "localpart@example.com". This document describes a subset of characters to allow in usernames for maximal interoperability across Internet protocols. The subset might prove useful in cases where a provider offers multiple services using the same underlying identifier.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 13, 2014.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Various Internet protocols define constructs for usernames, i.e., the local part of an address such as "localpart@example.com". This document describes a subset of characters to allow in usernames for maximal interoperability across Internet protocols. The subset might prove useful in cases where a provider offers multiple services using the same underlying identifier.

2. Terminology

Many important terms used in this document are defined in [I-D.ietf-precis-framework], [RFC6365], and [UNICODE].

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Subset

To define an interoperable subset of characters for local parts of application identifiers, this document subclasses the PRECIS IdentifierClass specified in [I-D.ietf-precis-framework]. In essence, the IdentifierClass restricts the allowable characters to lettersand digits from all the scripts of Unicode [UNICODE] while grandfathering in all the characters from the ASCII range [RFC20]. The subclass defined here, "UsernameIdentifierClass", further restricts the characters from the ASCII range to those known to work across multiple application protocols (as described under Appendix A).

The syntax is defined as follows using the Augmented Backus-Naur Form (ABNF) as specified in [RFC5234].

   uname = 1*1023(upoint)
         ;
         ; a "upoint" is a UTF-8 encoded Unicode code point 
         ; that conforms to the "UsernameIdentifierClass" 
         ; subclass of the PRECIS IdentifierClass
         ;
        

A uname MUST NOT be zero octets in length and MUST NOT be more than 1023 octets in length. This rule is to be enforced after any normalization and mapping of code points.

A uname MUST consist only of Unicode code points that conform to the "UsernameIdentifierClass" subclass of the "IdentifierClass" base string class defined in [I-D.ietf-precis-framework]. The UsernameIdentifierClass subclass includes all code points allowed by the IdentifierClass base class, with the exception of the following characters, which are disallowed:

U+0022 (QUOTATION MARK), i.e., '"'
U+0023 (NUMBER SIGN), i.e., '#'
U+0025 (PERCENT SIGN), i.e., '%'
U+0026 (AMPERSAND), i.e., '&'
U+0027 (APOSTROPHE), i.e., "'"
U+0028 (LEFT PARENTHESIS), i.e., '('
U+0029 (RIGHT PARENTHESIS), i.e., ')'
U+002C (COMMA), i.e., ','
U+002E (FULL STOP), i.e., '.'
U+002F (SOLIDUS), i.e., '/'
U+003A (COLON), i.e., ':'
U+003B (SEMICOLON), i.e., ';'
U+003C (LESS-THAN SIGN), i.e., '<'
U+003E (GREATER-THAN SIGN), i.e., '>'
U+003F (QUESTION MARK), i.e., '?'
U+0040 (COMMERCIAL AT), i.e., '@'
U+005B (LEFT SQUARE BRACKET), i.e., '['
U+005C (REVERSE SOLIDUS), i.e., '\'
U+005D (RIGHT SQUARE BRACKET), i.e., ']'
U+005E (CIRCUMFLEX ACCENT), i.e., '^'
U+0060 (GRAVE ACCENT), i.e., '`'
U+007B (LEFT CURLY BRACKET), i.e., '{'
U+007C (VERTICAL), i.e., '|'
U+007D (RIGHT CURLY BRACKET), i.e., '}'

The normalization and mapping rules for the UsernameIdentifierClass are as follows, where the operations specified MUST be completed in the order shown:

  1. Fullwidth and halfwidth characters MUST be mapped to their decomposition equivalents.

  2. So-called additional mappings MAY be applied, such as those defined in [I-D.ietf-precis-mappings].

  3. Uppercase and titlecase characters MUST be mapped to their lowercase equivalents.

  4. All characters MUST be mapped using Unicode Normalization Form C (NFC).

With regard to directionality, applications MUST apply the "Bidi Rule" defined in [RFC5893] (i.e., each of the six conditions of the Bidi Rule must be satisfied).

4. Security Considerations

Deploying usernames that are interoperable across multiple protocols could potentially give malicious entities multiple ways to attack an account or user.

5. IANA Considerations

The IANA shall add the following entry to the PRECIS Subclass Registry:

Subclass:
UsernameIdentifierClass.
Base Class:
IdentifierClass.
Exclusions:
24 non-alphanumeric characters in the ASCII range.
Specification:
RFC XXXX. [Note to RFC Editor: please change XXXX to the number issued for this specification.]

6. References

6.1. Normative References

[I-D.ietf-precis-framework] Saint-Andre, P. and M. Blanchet, "Precis Framework: Handling Internationalized Strings in Protocols", Internet-Draft draft-ietf-precis-framework-09, July 2013.
[I-D.ietf-precis-mappings] Yoneya, Y. and T. NEMOTO, "Mapping characters for precis classes", Internet-Draft draft-ietf-precis-mappings-02, May 2013.
[RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, October 1969.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.
[RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, August 2010.
[UNICODE] The Unicode Consortium, "The Unicode Standard, Version 3.2.0", 2000.

The Unicode Standard, Version 3.2.0 is defined by The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the Unicode Standard Annex #27: Unicode 3.1 (http://www.unicode.org/reports/tr27/) and by the Unicode Standard Annex #28: Unicode 3.2 (http://www.unicode.org/reports/tr28/).

6.2. Informative References

[I-D.ietf-appsawg-acct-uri] Saint-Andre, P., "The 'acct' URI Scheme", Internet-Draft draft-ietf-appsawg-acct-uri-06, July 2013.
[RFC821] Postel, J.B., "Simple Mail Transfer Protocol", STD 10, RFC 821, August 1982.
[RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
[RFC3856] Rosenberg, J., "A Presence Event Package for the Session Initiation Protocol (SIP)", RFC 3856, August 2004.
[RFC3860] Peterson, J., "Common Profile for Instant Messaging (CPIM)", RFC 3860, August 2004.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC4120] Neuman, C., Yu, T., Hartman, S. and K. Raeburn, "The Kerberos Network Authentication Service (V5)", RFC 4120, July 2005.
[RFC4282] Aboba, B., Beadles, M., Arkko, J. and P. Eronen, "The Network Access Identifier", RFC 4282, December 2005.
[RFC5322] Resnick, P., "Internet Message Format", RFC 5322, October 2008.
[RFC6120] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, March 2011.
[RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in Internationalization in the IETF", BCP 166, RFC 6365, September 2011.

Appendix A. Analysis

This document takes the following username constructs into consideration:

Each of those address formats defines something that can be used as the "local part" of an address.

The local part of an email address uses either the "local-part" or the "dot-atom-text" rule in [RFC5322]. Here we make the simplifying assumption that the "dot-atom-text" rule applies:

   dot-atom-text =  1*atext *("." 1*atext)
   atext         =  ALPHA / DIGIT /    ; Any character except 
                    "!" / "#" / "$" /  ; controls, SP, and 
                    "%" / "&" / "'" /  ; specials. Used for 
                    "*" / "+" / "-" /  ; atoms.
                    "/" / "=" / "?" /
                    "^" / "_" / "`" / 
                    "{" / "|" / "}" /
                    "~"
        

We make the same simplifying assumption for im: and pres: URIs (although their specifications reference [RFC2822]).

A Kerberos Principal Name is a sequence of strings of type KerberosString, where each KerberosString is a GeneralString that is constrained to contain only characters in IA5String.

   PrincipalName   ::= SEQUENCE {
           name-type       [0] Int32,
           name-string     [1] SEQUENCE OF KerberosString
   }
   KerberosString  ::= GeneralString (IA5String)
        

A Network Address Identifier inherits from [RFC821]. Here we care only about the "username" rule:

   username    =  dot-string
   dot-string  =  string
   dot-string  =/ dot-string "." string
   string      =  char
   string      =/ string char
   char        =  c
   char        =/ "\" x
   c           =  %x21    ; '!'              allowed
                          ; '"'              not allowed
   c           =/ %x23    ; '#'              allowed
   c           =/ %x24    ; '$'              allowed
   c           =/ %x25    ; '%'              allowed
   c           =/ %x26    ; '&'              allowed
   c           =/ %x27    ; '''              allowed
                          ; '(', ')'         not allowed
   c           =/ %x2A    ; '*'              allowed
   c           =/ %x2B    ; '+'              allowed
                          ; ','              not allowed
   c           =/ %x2D    ; '-'              allowed
                          ; '.'              not allowed
   c           =/ %x2F    ; '/'              allowed
   c           =/ %x30-39 ; '0'-'9'          allowed
                          ; ';', ':', '<'    not allowed
   c           =/ %x3D    ; '='              allowed
                          ; '>'              not allowed
   c           =/ %x3F    ; '?'              allowed
                          ; '@'              not allowed
   c           =/ %x41-5a ; 'A'-'Z'          allowed
                          ; '[', '\', ']'    not allowed
   c           =/ %x5E    ; '^'              allowed
   c           =/ %x5F    ; '_'              allowed
   c           =/ %x60    ; '`'              allowed
   c           =/ %x61-7A ; 'a'-'z'          allowed
   c           =/ %x7B    ; '{'              allowed
   c           =/ %x7C    ; '|'              allowed
   c           =/ %x7D    ; '}'              allowed
   c           =/ %x7E    ; '~'              allowed
                          ; DEL              not allowed
   c           =/ %x80-FF ; UTF-8-Octet      allowed
   x           =  %x00-FF ; all 128 ASCII characters
        

The local part of a sip:/sips: URI inherits from the "userinfo" rule in [RFC3986] with several changes; here we discuss the SIP "user" rule only:

   user             =  1*( unreserved / escaped / user-unreserved )
   user-unreserved  =  "&" / "=" / "+" / "$" / "," / ";" / "?" / "/"
   unreserved       =  alphanum / mark
   mark             =  "-" / "_" / "." / "!" / "~" / "*" / "'"
                       / "(" / ")"
        

The local part of an XMPP address allows any ASCII character except space, controls, and the " & ' / : < > @ characters.

The 'acct' URI syntax borrows the 'host', 'pct-encoded', 'sub-delims', 'unreserved' rules from [RFC3986]:

acctURI      =  "acct" ":" userpart "@" host
userpart     =  unreserved / sub-delims
                0*( unreserved / pct-encoded / sub-delims )
          

To summarize the foregoing information, the following table lists the allowed and disallowed characters in the local part of identifiers for each protocol (aside from the alphanumeric, space, and control characters), in order by hexadecimal character number (where each "A" row shows the allowed characters and each "D" row shows the disallowed characters).

Table 1: Allowed and Disallowed Characters (Non-Alphanumeric)

+---+----------------------------------+
| EMAIL ADDRESSES, IM/PRES URIs        |
+---+----------------------------------+
| A | ! #$%&'  *+ - /   = ?    ^_`{|}~ |
| D |  "     ()  , . :;< > @[\]        |
+---+----------------------------------+
| KERBEROS PRINCIPAL NAMES             |
+---+----------------------------------+
| A | !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
| D |                                  |
+---+----------------------------------+
| NETWORK ADDRESS IDENTIFIERS          |
+---+----------------------------------+
| A | ! #$%&'  *+ - /   = ?    ^_`{|}~ |
| D |  "     ()  , . :;< > @[\]        |
+---+----------------------------------+
| SIP/SIPS URIs                        |
+---+----------------------------------+
| A | !  $ &'()*+,-./ ; = ?     _    ~ |
| D |  "# %          : < > @[\]^ `{|}  |
+---+----------------------------------+
| XMPP ADDRESSES                       |
+---+----------------------------------+
| A | ! #$%  ()*+,-.  ; = ? [\]^_`{|}~ |
| D |  "   &'       /: < > @           |
+---+----------------------------------+
| ACCT URIs                            |
+---+----------------------------------+
| A | !  $%&'()*+,-.  ; =    \ ^_`{|}~ |
| D |  "#           /: < >?@[ ]        |
+---+----------------------------------+
       

The interoperable subset allows only characters that are allowed in all of the foregoing formats, as shown in the following table.

Table 2: Subset Characters (Non-Alphanumeric)

+---+----------------------------------+
| INTEROPERABLE SUBSET                 |
+---+----------------------------------+
| A | !  $     *+ -     =       _    ~ |
| D |  "# %&'()  , ./:;< >?@[\]^ `{|}  |
+---+----------------------------------+
       

Appendix B. Acknowledgements

Thanks to Sean Turner for inspiring the work on this document. Thanks also to Paul Hoffman, John Klensin, and Glen Zorn for their comments.

Author's Address

Peter Saint-Andre Cisco Systems, Inc. 1899 Wynkoop Street, Suite 600 Denver, CO 80202 USA Phone: +1-303-308-3282 EMail: psaintan@cisco.com