Internet DRAFT - draft-gulbrandsen-6855bis

draft-gulbrandsen-6855bis







EXTRA                                                         P. Resnick
Internet-Draft                                     Qualcomm Incorporated
Intended status: Standards Track                               C. Newman
Expires: 27 January 2024                                          Oracle
                                                                 S. Shen
                                                                   CNNIC
                                                          A. Gulbrandsen
                                                                   ICANN
                                                            26 July 2023


                         IMAP Support for UTF-8
                      draft-gulbrandsen-6855bis-00

Abstract

   This specification extends the Internet Message Access Protocol
   (IMAP4rev1, RFC 3501) to support UTF-8 encoded international
   characters in user names, mail addresses, and message headers.  This
   specification replaces RFC 6855.  This specification does not extend
   IMAP4rev2 [RFC9051], since that protocol includes everything in this
   extension.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 27 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.



Resnick, et al.          Expires 27 January 2024                [Page 1]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
   3.  "UTF8=ACCEPT" IMAP Capability and UTF-8 in IMAP
           Quoted-Strings  . . . . . . . . . . . . . . . . . . . . .   3
   4.  "APPEND" Command  . . . . . . . . . . . . . . . . . . . . . .   4
   5.  "LOGIN" Command and UTF-8 . . . . . . . . . . . . . . . . . .   4
   6.  "UTF8=ONLY" Capability  . . . . . . . . . . . . . . . . . . .   5
   7.  Dealing with Legacy Clients . . . . . . . . . . . . . . . . .   5
   8.  Issues with UTF-8 Header Mailstore  . . . . . . . . . . . . .   7
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   10. Security Considerations . . . . . . . . . . . . . . . . . . .   7
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     11.1.  Normative References . . . . . . . . . . . . . . . . . .   7
     11.2.  Informative References . . . . . . . . . . . . . . . . .   8
   Appendix A.  Appendix A.  Design Rationale  . . . . . . . . . . .   9
   Appendix B.  Appendix B.  Acknowledgments . . . . . . . . . . . .  10
   Appendix C.  Appendix B.  Changes since RFC 6855  . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   This specification forms part of the Email Address
   Internationalization protocols described in the Email Address
   Internationalization Framework document [RFC6530].  It extends IMAP
   [RFC3501}} to permit UTF-8 [RFC3629] in headers, as described in
   "Internationalized Email Headers" [RFC6532].  It also adds a
   mechanism to support mailbox names using the UTF-8 charset.  This
   specification creates two new IMAP capabilities to allow servers to
   advertise these new extensions.

   This specification assumes that the IMAP server will be operating in
   a fully internationalized environment, i.e., one in which all clients
   accessing the server will be able to accept non-ASCII message header
   fields and other information, as specified in Section 3.  At least
   during a transition period, that assumption will not be realistic for
   many environments; the issues involved are discussed in Section 7
   below.

   This specification replaces an earlier, experimental approach to the
   same problem [RFC5738] as well as [RFC6855].



Resnick, et al.          Expires 27 January 2024                [Page 2]

Internet-Draft                 UTF8=ACCEPT                     July 2023


2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  "UTF8=ACCEPT" IMAP Capability and UTF-8 in IMAP Quoted-Strings

   The "UTF8=ACCEPT" capability indicates that the server supports the
   ability to open mailboxes containing internationalized messages with
   the "SELECT" and "EXAMINE" commands, and the server can provide UTF-8
   responses to the "LIST" and "LSUB" commands.  This capability also
   affects other IMAP extensions that can return mailbox names or their
   prefixes, such as NAMESPACE [RFC2342] and ACL [RFC4314].

   The "UTF8=ONLY" capability, described in Section 6, implies the
   "UTF8=ACCEPT" capability.  A server is said to support "UTF8=ACCEPT"
   if it advertises either "UTF8=ACCEPT" or "UTF8=ONLY".

   A client MUST use the "ENABLE" command [RFC5161] with the
   "UTF8=ACCEPT" option (defined in Section 4 below) to indicate to the
   server that the client accepts UTF-8 in quoted-strings and supports
   the "UTF8=ACCEPT" extension.  The "ENABLE UTF8=ACCEPT" command is
   only valid in the authenticated state.

   The IMAP base specification [RFC3501] forbids the use of 8-bit
   characters in atoms or quoted-strings.  Thus, a UTF-8 string can only
   be sent as a literal.  This can be inconvenient from a coding
   standpoint, and unless the server offers IMAP non-synchronizing
   literals [RFC2088], this requires an extra round trip for each UTF-8
   string sent by the client.  When the IMAP server supports
   "UTF8=ACCEPT", it supports UTF-8 in quoted-strings with the following
   syntax:

        quoted        =/ DQUOTE *uQUOTED-CHAR DQUOTE
               ; QUOTED-CHAR is not modified, as it will affect
               ; other RFC 3501 ABNF non-terminals.

        uQUOTED-CHAR  = QUOTED-CHAR / UTF8-2 / UTF8-3 / UTF8-4

        UTF8-2        =   <Defined in Section 4 of RFC 3629>

        UTF8-3        =   <Defined in Section 4 of RFC 3629>

        UTF8-4        =   <Defined in Section 4 of RFC 3629>




Resnick, et al.          Expires 27 January 2024                [Page 3]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   When this extended quoting mechanism is used by the client, the
   server MUST reject, with a "BAD" response, any octet sequences with
   the high bit set that fail to comply with the formal syntax
   requirements of UTF-8 [RFC3629].  The IMAP server MUST NOT send UTF-8
   in quoted-strings to the client unless the client has indicated
   support for that syntax by using the "ENABLE UTF8=ACCEPT" command.

   If the server supports "UTF8=ACCEPT", the client MAY use extended
   quoted syntax with any IMAP argument that permits a string (including
   astring and nstring).  However, if characters outside the US-ASCII
   repertoire are used in an inappropriate place, the results would be
   the same as if other syntactically valid but semantically invalid
   characters were used.  Specific cases where UTF-8 characters are
   permitted or not permitted are described in the following paragraphs.

   All IMAP servers that support "UTF8=ACCEPT" SHOULD accept UTF-8 in
   mailbox names, and those that also support the Mailbox International
   Naming Convention described in RFC 3501, Section 5.1.3, MUST accept
   UTF8-quoted mailbox names and convert them to the appropriate
   internal format.  Mailbox names MUST comply with the Net-Unicode
   Definition ([RFC5198], Section 2) with the specific exception that
   they MUST NOT contain control characters (U+0000-U+001F and U+0080-U+
   009F), a delete character (U+007F), a line separator (U+2028), or a
   paragraph separator (U+2029).

   Once an IMAP client has enabled UTF-8 support with the "ENABLE
   UTF8=ACCEPT" command, it MUST NOT issue a "SEARCH" command that
   contains a charset specification.  If an IMAP server receives such a
   "SEARCH" command in that situation, it SHOULD reject the command with
   a "BAD" response (due to the conflicting charset labels).

4.  "APPEND" Command

   If the server supports "UTF8=ACCEPT", then the server accepts UTF-8
   headers in the "APPEND" command message argument.

   If an IMAP server supports "UTF8=ACCEPT" and the IMAP client has not
   issued the "ENABLE UTF8=ACCEPT" command, the server MUST reject, with
   a "NO" response, an "APPEND" command that includes any 8-bit
   character in message header fields.

5.  "LOGIN" Command and UTF-8

   This specification does not extend the IMAP "LOGIN" command [RFC3501]
   to support UTF-8 usernames and passwords.  Whenever a client needs to
   use UTF-8 usernames or passwords, it MUST use the IMAP "AUTHENTICATE"
   command, which is already capable of passing UTF-8 usernames and
   credentials.



Resnick, et al.          Expires 27 January 2024                [Page 4]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   Although using the IMAP "AUTHENTICATE" command in this way makes it
   syntactically legal to have a UTF-8 username or password, there is no
   guarantee that the user provisioning system utilized by the IMAP
   server will allow such identities.  This is an implementation
   decision and may depend on what identity system the IMAP server is
   configured to use.

6.  "UTF8=ONLY" Capability

   The "UTF8=ONLY" capability indicates that the server supports
   "UTF8=ACCEPT" (see Section 4) and that it requires support for UTF-8
   from clients.  In particular, this means that the server will send
   UTF-8 in quoted-strings, and it will not accept the older
   international mailbox name convention (modified UTF-7 [RFC3501]).
   Because these are incompatible changes to IMAP, explicit server
   announcement and client confirmation is necessary: clients MUST use
   the "ENABLE UTF8=ACCEPT" command before using this server.  A server
   that advertises "UTF8=ONLY" will reject, with a "NO [CANNOT]"
   response [RFC5530], any command that might require UTF-8 support and
   is not preceded by an "ENABLE UTF8=ACCEPT" command.

   IMAP clients that find support for a server that announces
   "UTF8=ONLY" problematic are encouraged to at least detect the
   announcement and provide an informative error message to the end-
   user.

   Because the "UTF8=ONLY" server capability includes support for
   "UTF8=ACCEPT", the capability string will include, at most, one of
   those and never both.  For the client, "ENABLE UTF8=ACCEPT" is always
   used -- never "ENABLE UTF8=ONLY".

7.  Dealing with Legacy Clients

   In most situations, it will be difficult or impossible for the
   implementer or operator of an IMAP (or POP) server to know whether
   all of the clients that might access it, or the associated mail store
   more generally, will be able to support the facilities defined in
   this document.  In almost all cases, servers that conform to this
   specification will have to be prepared to deal with clients that do
   not enable the relevant capabilities.  Unfortunately, there is no
   completely satisfactory way to do so other than for systems that wish
   to receive email that requires SMTPUTF8 capabilities to be sure that
   all components of those systems -- including IMAP and other clients
   selected by users -- are upgraded appropriately.

   When a message that requires SMTPUTF8 is encountered and the client
   does not enable UTF-8 capability, choices available to the server
   include hiding the problematic message(s), creating in-band or out-



Resnick, et al.          Expires 27 January 2024                [Page 5]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   of-band notifications or error messages, or somehow trying to create
   a surrogate of the message with the intention of providing useful
   information to that client about what has occurred.  Such surrogate
   messages cannot be actual substitutes for the original message: they
   will almost always be impossible to reply to (either at all or
   without loss of information) and the new header fields or specialized
   constructs for server-client communications may go beyond the
   requirements of current email specifications (e.g., [RFC5322]).
   Consequently, such messages may confuse some legacy mail user agents
   (including IMAP clients) or not provide expected information to
   users.  There are also trade-offs in constructing surrogates of the
   original message between accepting complexity and additional
   computation costs in order to try to preserve as much information as
   possible (for example, in "Post-Delivery Message Downgrading for
   Internationalized Email Messages" [RFC6857]) and trying to minimize
   those costs while still providing useful information (for example, in
   "Simplified POP and IMAP Downgrading for Internationalized Email"
   [RFC6858]).

   Implementations that choose to perform downgrading SHOULD use one of
   the standardized algorithms provided in RFC 6857 or RFC 6858.
   Getting downgrade algorithms right, and minimizing the risk of
   operational problems and harm to the email system, is tricky and
   requires careful engineering.  These two algorithms are well
   understood and carefully designed.

   Because such messages are really surrogates of the original ones, not
   really "downgraded" ones (although that terminology is often used for
   convenience), they inevitably have relationships to the originals
   that the IMAP specification [RFC3501] did not anticipate.  This
   brings up two concerns in particular: First, digital signatures
   computed over and intended for the original message will often not be
   applicable to the surrogate message, and will often fail signature
   verification.  (It will be possible for some digital signatures to be
   verified, if they cover only parts of the original message that are
   not affected in the creation of the surrogate.)  Second, servers that
   may be accessed by the same user with different clients or methods
   (e.g., POP or webmail systems in addition to IMAP or IMAP clients
   with different capabilities) will need to exert extreme care to be
   sure that UIDVALIDITY [RFC3501] behaves as the user would expect.
   Those issues may be especially sensitive if the server caches the
   surrogate message or computes and stores it when the message arrives
   with the intent of making either form available depending on client
   capabilities.  Additionally, in order to cope with the case when a
   server compliant with this extension returns the same UIDVALIDITY to
   both legacy and "UTF8=ACCEPT"-aware clients, a client upgraded from
   being non-"UTF8=ACCEPT"-aware MUST discard its cache of messages
   downloaded from the server.



Resnick, et al.          Expires 27 January 2024                [Page 6]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   The best (or "least bad") approach for any given environment will
   depend on local conditions, local assumptions about user behavior,
   the degree of control the server operator has over client usage and
   upgrading, the options that are actually available, and so on.  It is
   impossible, at least at the time of publication of this
   specification, to give good advice that will apply to all situations,
   or even particular profiles of situations, other than "upgrade legacy
   clients as soon as possible".

8.  Issues with UTF-8 Header Mailstore

   When an IMAP server uses a mailbox format that supports UTF-8 headers
   and it permits selection or examination of that mailbox without
   issuing "ENABLE UTF8=ACCEPT" first, it is the responsibility of the
   server to comply with the IMAP base specification [RFC3501] and the
   Internet Message Format [RFC5322] with respect to all header
   information transmitted over the wire.  The issue of handling
   messages containing non-ASCII characters in legacy environments is
   discussed in Section 7.

9.  IANA Considerations

   the "IMAP 4 Capabilities" registry contains a number of references to
   RFC6855.  IANA, please change them to point to this document.

10.  Security Considerations

   The security considerations of UTF-8 [RFC3629] and SASLprep [RFC4013]
   apply to this specification, particularly with respect to use of
   UTF-8 in usernames and passwords.  Otherwise, this is not believed to
   alter the security considerations of IMAP.

   Special considerations, some of them with security implications,
   occur if a server that conforms to this specification is accessed by
   a client that does not, as well as in some more complex situations in
   which a given message is accessed by multiple clients that might use
   different protocols and/or support different capabilities.  Those
   issues are discussed in Section 7.

11.  References

11.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.




Resnick, et al.          Expires 27 January 2024                [Page 7]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   [RFC3501]  Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
              4rev1", RFC 3501, DOI 10.17487/RFC3501, March 2003,
              <https://www.rfc-editor.org/rfc/rfc3501>.

   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
              2003, <https://www.rfc-editor.org/rfc/rfc3629>.

   [RFC4013]  Zeilenga, K., "SASLprep: Stringprep Profile for User Names
              and Passwords", RFC 4013, DOI 10.17487/RFC4013, February
              2005, <https://www.rfc-editor.org/rfc/rfc4013>.

   [RFC5161]  Gulbrandsen, A., Ed. and A. Melnikov, Ed., "The IMAP
              ENABLE Extension", RFC 5161, DOI 10.17487/RFC5161, March
              2008, <https://www.rfc-editor.org/rfc/rfc5161>.

   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
              Interchange", RFC 5198, DOI 10.17487/RFC5198, March 2008,
              <https://www.rfc-editor.org/rfc/rfc5198>.

   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
              DOI 10.17487/RFC5322, October 2008,
              <https://www.rfc-editor.org/rfc/rfc5322>.

   [RFC6530]  Klensin, J. and Y. Ko, "Overview and Framework for
              Internationalized Email", RFC 6530, DOI 10.17487/RFC6530,
              February 2012, <https://www.rfc-editor.org/rfc/rfc6530>.

   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
              Email Headers", RFC 6532, DOI 10.17487/RFC6532, February
              2012, <https://www.rfc-editor.org/rfc/rfc6532>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

11.2.  Informative References

   [RFC2088]  Myers, J., "IMAP4 non-synchronizing literals", RFC 2088,
              DOI 10.17487/RFC2088, January 1997,
              <https://www.rfc-editor.org/rfc/rfc2088>.

   [RFC2342]  Gahrns, M. and C. Newman, "IMAP4 Namespace", RFC 2342,
              DOI 10.17487/RFC2342, May 1998,
              <https://www.rfc-editor.org/rfc/rfc2342>.






Resnick, et al.          Expires 27 January 2024                [Page 8]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   [RFC4314]  Melnikov, A., "IMAP4 Access Control List (ACL) Extension",
              RFC 4314, DOI 10.17487/RFC4314, December 2005,
              <https://www.rfc-editor.org/rfc/rfc4314>.

   [RFC5530]  Gulbrandsen, A., "IMAP Response Codes", RFC 5530,
              DOI 10.17487/RFC5530, May 2009,
              <https://www.rfc-editor.org/rfc/rfc5530>.

   [RFC5738]  Resnick, P. and C. Newman, "IMAP Support for UTF-8",
              RFC 5738, DOI 10.17487/RFC5738, March 2010,
              <https://www.rfc-editor.org/rfc/rfc5738>.

   [RFC6855]  Resnick, P., Ed., Newman, C., Ed., and S. Shen, Ed., "IMAP
              Support for UTF-8", RFC 6855, DOI 10.17487/RFC6855, March
              2013, <https://www.rfc-editor.org/rfc/rfc6855>.

   [RFC6857]  Fujiwara, K., "Post-Delivery Message Downgrading for
              Internationalized Email Messages", RFC 6857,
              DOI 10.17487/RFC6857, March 2013,
              <https://www.rfc-editor.org/rfc/rfc6857>.

   [RFC6858]  Gulbrandsen, A., "Simplified POP and IMAP Downgrading for
              Internationalized Email", RFC 6858, DOI 10.17487/RFC6858,
              March 2013, <https://www.rfc-editor.org/rfc/rfc6858>.

   [RFC8620]  Jenkins, N. and C. Newman, "The JSON Meta Application
              Protocol (JMAP)", RFC 8620, DOI 10.17487/RFC8620, July
              2019, <https://www.rfc-editor.org/rfc/rfc8620>.

   [RFC9051]  Melnikov, A., Ed. and B. Leiba, Ed., "Internet Message
              Access Protocol (IMAP) - Version 4rev2", RFC 9051,
              DOI 10.17487/RFC9051, August 2021,
              <https://www.rfc-editor.org/rfc/rfc9051>.

Appendix A.  Appendix A.  Design Rationale

   This non-normative section discusses the reasons behind some of the
   design choices in this specification.

   The "UTF8=ONLY" mechanism simplifies diagnosis of interoperability
   problems when legacy support goes away.  In the situation where
   backwards compatibility is not working anyway, the non-conforming
   "just-send-UTF-8 IMAP" has the advantage that it might work with some
   legacy clients.  However, the difficulty of diagnosing
   interoperability problems caused by a "just-send-UTF-8 IMAP"
   mechanism is the reason the "UTF8=ONLY" capability mechanism was
   chosen.




Resnick, et al.          Expires 27 January 2024                [Page 9]

Internet-Draft                 UTF8=ACCEPT                     July 2023


Appendix B.  Appendix B.  Acknowledgments

   The authors wish to thank the participants of the EAI working group
   for their contributions to this document, with particular thanks to
   Harald Alvestrand, David Black, Randall Gellens, Arnt Gulbrandsen,
   Kari Hurtta, John Klensin, Xiaodong Lee, Charles Lindsey, Alexey
   Melnikov, Subramanian Moonesamy, Shawn Steele, Daniel Taharlev, and
   Joseph Yee for their specific contributions to the discussion.

Appendix C.  Appendix B.  Changes since RFC 6855

   This non-normative section describes the changes made since
   [RFC6855].

   This document removes APPEND's UTF8 data item, making the
   UTF8-related syntax compatible with IMAP4rev2 as defined by [RFC9051]
   and making it simpler for clients to support IMAP4rev1 and IMAP4rev2
   with the same code.

   IMAP4rev2 [RFC9051] provides roughly the same abilities as [RFC6855]
   but does not include APPEND's UTF8 item.  None of [RFC6855],
   IMAP4rev2 or JMAP [RFC8620] specify any way to learn whether a
   particular message was stored using the UTF8 data item.  As of today,
   an IMAP client cannot learn whether a particular message was stored
   using the UTF8 data item, nor would it be able to trust that
   information even if IMAP4rev1/2 were extended to provide that
   information.

   In July 2023, one of the authors found only one IMAP client that uses
   the UTF8 data item, and that client uses it incorrectly (it sends the
   data item for all messages if the server supports UTF8=ACCEPT,
   without regard to whether a particular message includes any UTF8 at
   all).

   For these reason, it was judged best to revise [RFC6855] and adopt
   the same syntax as IMAP4rev2.

Authors' Addresses

   Pete Resnick
   Qualcomm Incorporated
   5775 Morehouse Drive
   San Diego,  CA 92121-1714
   United States of America
   Email: presnick@qualcomm.com






Resnick, et al.          Expires 27 January 2024               [Page 10]

Internet-Draft                 UTF8=ACCEPT                     July 2023


   Chris Newman
   Oracle
   800 Royal Oaks
   Monrovia,,  AA 91016<
   United States of America
   Email: chris.newman@oracle.com


   Sean Shen
   CNNIC
   No.4 South 4th Zhongguancun Street
   Beijing
   100190
   China
   Email: shenshuo@cnnic.cn


   Arnt Gulbrandsen
   ICANN
   6 Rond Point Schumann, Bd. 1
   1040 Brussels
   Belgium
   Email: arnt@gulbrandsen.priv.no




























Resnick, et al.          Expires 27 January 2024               [Page 11]