Internet DRAFT - draft-sullivan-dns-zone-codepoint-pples
draft-sullivan-dns-zone-codepoint-pples
Network Working Group A. Sullivan
Internet-Draft Dyn, Inc.
Intended status: Informational D. Thaler
Expires: December 7, 2012 Microsoft
O. Kolkman
NLnet Labs
June 5, 2012
Principles for Unicode Code Point Inclusion in Labels in the DNS Root
draft-sullivan-dns-zone-codepoint-pples-00
Abstract
Traditionally, the management of the DNS root zone permitted only
"alphabetic" labels. As long as the root zone included only ASCII
characters, and as long as there was only one form of a label, the
restriction plainly meant that only the letters A-Z and a-z were
permitted. The advent of internationalized labels using IDNA2008
presents some complications for the restriction. One of the
complications is the meaning of the term "alphabetic" when applied to
the Unicode code points in U-labels. This memo presents a set of
principles that can be used to determine whether a Unicode code point
may be wisely included in the repertoire of permissible code points
in a U-label in a zone.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 7, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
Sullivan, et al. Expires December 7, 2012 [Page 1]
Internet-Draft Root Zone Code Point Principles June 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Background and Introduction . . . . . . . . . . . . . . . . . . 3
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Conservatism Principle . . . . . . . . . . . . . . . . . . . . 4
3. Inclusion Principle . . . . . . . . . . . . . . . . . . . . . . 4
4. Simplicity Principle . . . . . . . . . . . . . . . . . . . . . 4
5. Predictability Principle . . . . . . . . . . . . . . . . . . . 5
6. Stability Principle . . . . . . . . . . . . . . . . . . . . . . 5
7. Letter Principle . . . . . . . . . . . . . . . . . . . . . . . 6
8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 6
9. Security Considerations . . . . . . . . . . . . . . . . . . . . 6
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7
12. Informative References . . . . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8
Sullivan, et al. Expires December 7, 2012 [Page 2]
Internet-Draft Root Zone Code Point Principles June 2012
1. Background and Introduction
In recent communications ([IABCOMM1] and [IABCOMM2]), the IAB has
emphasized the importance of conservatism in allocating labels
conforming to IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893],
[RFC5894], [RFC5895]) inside the root zone. Traditional LDH-labels
(see [RFC5890] for definitions of IDNA terms) in the root zone used
only alphabetic characters (i.e., ASCII a-z or A-Z). Matters are
more complicated with U-labels, however. The IAB communications
recommended that U-labels permit only code points with a
General_Category (gc) of Ll (Lowercase_Letter), Lo (Other_Letter), or
Lm (Modifier_Letter), but noted that for practical considerations
other code points might be permitted on a case-by-case basis. In
what follows we will use the Unicode notation; e.g., gc=Ll.
The IAB recommendation does, however, present some problems that need
to be addressed. First, it is by no means clear that all of the code
points with gc=Lo or gc=Lm and which are permitted under IDNA2008 are
appropriate for the root zone. To take but one example, the code
point U+02BC MODIFIER LETTER APOSTROPHE has gc=Lm. In practically
every rendering (we are unaware of an exception), U+02BC is
indistinguishable from U+2019 RIGHT SINGLE QUOTATION MARK, which has
gc=Pf (Final_Punctuation). U+02BC will also be read by large numbers
of people as being the same character as U+0027 APOSTROPHE, which has
gc=Po (Other_Punctuation). U+02BC is PROTOCOL VALID (PVALID) under
IDNA2008 (see [RFC5892]), whereas both other code points are
DISALLOWED. So, to begin with, it is plain that not every code point
with gc in {Ll, Lo, Lm} is consistent with any conservatism
principle.
To make matters worse, some languages are dependent on code points
with gc=Mc (Spacing_Mark) or gc=Mn (Nonspacing_Mark). This
dependency is particularly common in Indic languages, though not
exclusive to them. (At the risk of vastly oversimplifying, the
overarching issue is mostly the interaction of complex writing
systems and the way Unicode works.) To restrict users of those
languages only to code points with gc in {Ll, Lo, Lm} would be
extremely limiting. While DNS labels are not words, or sentences, or
phrases (as noted in [RFC4690]), they are intended as useful
mnemonics. Mnemonics that diverge wildly from the usual conventions
in a language are likely to attract strong objections, particularly
in the root. The objections might drag the discussion away from
sound management of the shared DNS root zone and towards discussions
of cultural hegemony. That sort of discussion itself might present
risks for the operation of the root zone.
For reasons of sound management, it is not desirable to decide
whether to permit a given code point only when an application
Sullivan, et al. Expires December 7, 2012 [Page 3]
Internet-Draft Root Zone Code Point Principles June 2012
containing that code point is pending. That approach reduces
predictability and is bound to appear subject to special pleas. It
is better instead to come up with a set of principles for guiding
decisions about code points. These principles can then function as
meta-rules, determining the rules for inclusion of any code point
(from those permitted by IDNA) in labels in the root. The principles
might also be adopted by other zones that are shared by much of the
Internet. Such a set of principles follows in the sections below.
Each section includes remarks on the extent to which the principle
could be wisely adopted by zones other than the root.
1.1. Terminology
Terms relevant to IDNA2008 can be found in [RFC5890]. Other relevant
internationalization terms are defined in [RFC6365].
This memo does not propose a protocol standard, and the use of words
like "should" follow the ordinary English meaning, and not that laid
out in [RFC2119].
2. Conservatism Principle
The root zone is, by definition, the one DNS zone that must be shared
by everybody. Therefore, any decision to permit a code point in the
root zone should be as conservative as practicable. Doubts should
always be resolved in favor of rejecting a code point for inclusion
rather than in favor of including it, in order to minimize risk.
This principle is easily (and wisely) adoptable by any zone. It is
also the one that is most likely to yield the safest result.
3. Inclusion Principle
Just as IDNA2008 starts from the principle that the Unicode range is
excluded, and then adds code points according to derived properties
of the code points, so the root zone should only permit inclusion of
a code point if it is known to be safe. The default treatment of a
code point should be that it is excluded.
This principle is easily (and wisely) adoptable by any zone.
4. Simplicity Principle
The rules for determining whether a code point is to be included
should be simple enough that they are readily understood by someone
Sullivan, et al. Expires December 7, 2012 [Page 4]
Internet-Draft Root Zone Code Point Principles June 2012
with a moderate background in the DNS and Unicode issues. This
principle does not mean that a completely naive person needs to be
able to understand the rationale for why a code point is included,
but it does mean that the reason for inclusion of very peculiar code
points, even if the code points are safe in themselves, will be too
difficult to understand and will therefore be rejected.
The meaning of "simple" or "readily understood" is context dependent.
For instance, the root zone has to serve everyone in the world; for
practical purposes, this means that the reasons for including a code
point need to be comprehensible even to people who cannot use the
script where the code point is found. In a zone that permits a very
small subset of Unicode characters (for instance, only those needed
to write a single language) and that supports a clearly-delineated
linguistic community (for instance, the speakers of a single language
with well-understood written conventions), more complicated rules
might be acceptable.
5. Predictability Principle
The rules for determining whether a code point is to be included
should be predictable enough that those with the requisite
understanding of DNS, IDNA, and Unicode would all generally reach the
same conclusion. This is not a requirement for algorithmic treatment
of code points (the difficulties with the Unicode Letter and Mark
categories illustrate why that would be too difficult). It is rather
to say that the consistent application of professional judgment is
likely to yield the same results; combined with the principle in
Section 2, when results are not predictable the anomalous code point
would not be included.
Just as in Section 4, this principle is not easily extended to zones
lower than the root because what is predictable within a given
language community is possibly very surprising across languages.
6. Stability Principle
Once a code point is permitted, it is at least very hard to stop
permitting that code point. In general, the list of code points to
be permitted should change very slowly, if at all, and usually only
in the direction of permitting an addition as time and experience
indicates that inclusion of such a code point is both safe and
consistent with these principles.
This principle likely extends to every delegation-centric domain: if
one delegation is permitted to use a code point, it is very hard to
Sullivan, et al. Expires December 7, 2012 [Page 5]
Internet-Draft Root Zone Code Point Principles June 2012
see why others might not.
7. Letter Principle
In keeping with the spirit of the note in [RFC1123] that top-level
labels "will be alphabetic", the rules should not include code points
that are not normally used to write words, or that are in some cases
normally used for purposes other than writing words. This is not the
same as using Unicode's General_Category to include only letters.
But it is a restriction that expands the possible class of included
code points beyond the Unicode letters, but only expands so far as to
include the things that are normally used the way letters are. Under
this principle, code points with (for example) gc=Mn might be
included -- but only those that are used to write words and not (for
instance) musical symbols. This principle should be applied as
narrowly as possible; as [RFC4690] says, "While DNS labels may
conveniently be used to express words in many circumstances, the goal
is not to express words (or sentences or phrases), but to permit the
creation of unambiguous labels with good mnemonic value."
Because the root zone must be shared by everyone, this principle is
more important in it than in zones that are intended for use by
clearly-defined linguistic communities.
8. Conclusion
The foregoing principles could be applied generally when considering
any range of Unicode code points for possible inclusion in the root
zone. It is worth observing that doing anything (especially in light
of Section 6) implicitly disadvantages communities with a writing
system not yet well understood and not represented in the technical
and policy communities involved in the discussion. That disadvantage
is to be guarded against as much as practical, but is effectively
impossible to prevent (while still taking action) in light of
imperfect human knowledge.
9. Security Considerations
The principles outlined in this memo are partly intended to reduce
the possibility of confusion among different labels. While these
principles may contribute to reduction of risk, they are not
sufficient to provide a comprehensive internationalization policy for
zone management.
Sullivan, et al. Expires December 7, 2012 [Page 6]
Internet-Draft Root Zone Code Point Principles June 2012
10. IANA Considerations
None. RFC Editor: this section may be removed on publication.
11. Acknowledgements
The authors thank the participants in the IAB Internationalization
programme for the discussion of the ideas in this memo.
12. Informative References
[IABCOMM1]
Internet Architecture Board, "IAB Statement: 'The
interpretation of rules in the ICANN gTLD Applicant
Guidebook.'", February 2012.
[IABCOMM2]
Internet Architecture Board, "Response to ICANN questions
concerning 'The interpretation of rules in the ICANN gTLD
Applicant Guidebook'", March 2012.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", STD 3, RFC 1123, October 1989.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
Recommendations for Internationalized Domain Names
(IDNs)", RFC 4690, September 2006.
[RFC5890] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Definitions and Document Framework",
RFC 5890, August 2010.
[RFC5891] Klensin, J., "Internationalized Domain Names in
Applications (IDNA): Protocol", RFC 5891, August 2010.
[RFC5892] Faltstrom, P., "The Unicode Code Points and
Internationalized Domain Names for Applications (IDNA)",
RFC 5892, August 2010.
[RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
Internationalized Domain Names for Applications (IDNA)",
RFC 5893, August 2010.
Sullivan, et al. Expires December 7, 2012 [Page 7]
Internet-Draft Root Zone Code Point Principles June 2012
[RFC5894] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Background, Explanation, and
Rationale", RFC 5894, August 2010.
[RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for
Internationalized Domain Names in Applications (IDNA)
2008", RFC 5895, September 2010.
[RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in
Internationalization in the IETF", BCP 166, RFC 6365,
September 2011.
Authors' Addresses
Andrew Sullivan
Dyn, Inc.
150 Dow St
Manchester, NH 03101
U.S.A.
Email: asullivan@dyn.com
Dave Thaler
Microsoft
One Microsoft Way
Redmond, WA 98052
U.S.A.
Email: dthaler@microsoft.com
Olaf Kolkman
NLnet Labs
Science Park 400
Amsterdam 1098 XH
The Netherlands
Email: olaf@NLnetLabs.nl
Sullivan, et al. Expires December 7, 2012 [Page 8]