Network Working Group | J. Levine |
Internet-Draft | Taughannock Networks |
Intended status: Informational | P. Hoffman |
Expires: April 02, 2013 | VPN Consortium |
October 2012 |
Variant Names in Top Level Domains
draft-levine-tld-variant-00
IDNA [RFC5890] provides a method to map a subset of names written in Unicode into the DNS. Some languages allow a particular name to be written in multiple ways that are represented differently in IDNA, known as "variants". We survey the approaches that ICANN-managed top level domains have taken to the registration and provisioning of variant names.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 02, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
IDNA [RFC5890] provides a method to map a subset of names written in Unicode into the DNS [RFC1035]. Some languages allow a particular name to be written in multiple ways that are represented differently in IDNA, known as "variants". In some cases, the variants are multiple equally valid ways of writing the same thing, such as traditional and simplified Chinese characters. Some languages written in Latin characters with accents and diacritical marks allow the marks to be omitted in some situations, such as French which often omits accents on capital letters. Due to the difficulty of representing accented characters in ASCII systems, many users have informally used unaccented characters in DNS names, even when they are not linguistically equivalent to the accented versions.
The proper handing of variant names has been a topic of extensive debate and research, with little consensus reached on how to handle them, or even what characters are variants of each other. Many people would like variant names to behave "the same", for a diverse range of meanings of "same." In some cases it is a textual similarity, such as variants having corresponding DNS records, in some it is functional similarity, such as variant names resolving to the same web server, or the same page in a web server, while in others it is user experience similarity, such as names resolving to web pages which while not identical are perceived by human users as equivalent.
This document provides a snapshot of variant handling in the top level domains managed by ICANN, so called gTLDs (generic TLDs) and sTLDs (sponsored TLDs), as of late 2012. We chose those domains because ICANN requires each TLD to describe its IDN and variant practices, and the TLD zone files are available for inspection, to verify what actually goes into the zones.
We use some terminology that has become fairly well agreed when discussing variant names.
ICANN has published a variety of documents on variant management. The most important are the "Guidelines for the Implementation of Internationalized Domain Names" issued in Version 1.0 [G1] and Version 3.0 [G3].
TLDs are supposed to register an IDN practices document with IANA for each language in which the TLD accepts IDN registrations, to be entered in an IANA registry [IANAIDN]. The practices document lists the Unicode characters allowed in names in the language, which characters are considered equivalent, and which of an equivalent group is preferred. Some TLDs have been more diligent than others at keeping the registry up to date.
Some of the ICANN agreements with each TLD [ICANNAGREE] describe the TLD's IDN practices, but most don't.
The .AERO TLD has no IDNs, and no rules or practices for them.
The .ASIA domain accepts registrations in many Asian languages. They have IANA tables for Japanese, Korean, and Chinese. The IANA tables refer to their CJK IDN policies [ASIACJK], which say that applied-for and preferred IDN variants are "active and included in the zone." No IDN publication mechanism is described in the documentation, but from inspection of the zonefile, it is clear that the zone is using parallel NS for variants.
ICANN gave the registry (Neustar) non-specific permission to register in a letter in 2004 [TWOMEY04A]. The IDN rules were apparently discussed with ICANN, but not defined; see Appendix 9 of the registry agreement [ICANNBIZ9].
They have about a dozen IANA tables. No IDN publication mechanism is described, but from inspection it appears that variants are blocked.
The IDN rules are described in Appendix S Part VII.2 [ICANNCATS] of the ICANN agreement. "Registry will take a very cautious approach in its IDN offerings. IDNs will be bundled with the equivalent ASCII domains." The only language is Catalan. No IDN publication mechanism is described.
Bundles consist of names with accented and unaccented vowels, and "ll" and the Catalan letter written as two L's with a dot in between.
When a registrant registers an IDN, the registry also includes the ASCII version. From inspection of the zonefile, the ASCII version is provisioned with NS, and the IDN is a DNAME pointing to the ASCII version.
ICANN and Verisign have extensive correspondence about IDNs and variants, at including letters from Ben Turner [TURNER03] and Ed Lewis [LEWIS03].
The IANA registry has tables for several dozen languages, including archaic languages such as hieroglyphics and Aramaic. Verisign publishes documents describing Scripts and Languages [VRSNLANG], Character Variants [VRSNCHAR], Registration Rules [VRSNRULES], and additional registration logic [VRSNADDL].
In Chinese, variants are blocked (see [VRSNADDL].) In other languages there appears to be no bundling or blocking.
The .COOP TLD has no IDNs, and no rules or practices for them.
The IANA registry has tables for Danish, Hungarian, Lithuanian, Latvian, and Swedish from 2005. The domain also has names in Greek, Russian, Arabic, and other languages but no IANA tables.
The registry agreement Appendix 9 [ICANNINFO9] refers to a 2003 letter from Paul Twomey [TWOMEY03] that refers to blocking variants.
The .JOBS TLD has no IDNs, and no rules or practices for them.
The zone file has about 22,000 IDNs. The domain has no tables at IANA. The registry agreement Appendix S [ICANNMOBIS] says that IDNs are provisioned according to [G1].
The zone file has many of IDNs, spot checks find that many lame or dead. A 2004 letter from Paul Twomey [TWOMEY04] refers to [G1].
The registry has a detailed policy page [MUSEUMIDN]. IDNs are accepted in Latin and Hebrew scripts, with plans for Arabic, Chinese, Japanese, Korean, Cyrillic, and Greek. They do no bundling or blocking, but names that may be confusable due to visual similarity are not allowed, apparently determined by manual inspection, which is practical due to the very small size of the domain.
The .NAME domain is now owned by Verisign, and has same long list of scripts as .COM and .NET. http://www.icann.org/correspondence/twomey-to-rasmussen-15aug04.pdf refers to Appendix K of the agreement, but appendices are numbered. Appendix 11 [ICANNNAME11] is about restrictions on names, but says nothing about IDNs. The Letter above refers to [G1].
The domain is managed the same as .COM.
A 2003 letter from Paul Twomey [TWOMEY03A] refers to [G1]. The registry has a list of IDN languages [PIRIDN], all written in Latin script. The practices for some but not all are registered with IANA, Since none of the languages do bundling, there is presumably no blocking.
The .POST TLD appears to have no registrations at all yet.
The .PRO TLD has no IDNs, and no rules or practices for them.
The zone has many IDNs. It is probably operating according to a 2004 letter from Paul Twomey [TWOMEY04A] which didn not mention specific TLDs. Its policy page [TELPOLICY] has links to IDN practices for 17 languages, all but one of which are registered with IANA. None of the Latin scripts do bundling or blocking. The Japanese practices say that variants are blocked. The Chinese table says:
The zone has no DNAME records, so the second paragraph strongly suggests parallel NS.
The .TEL TLD, intended as an online directory, does not allow registrants to enter arbitrary RR's in the zone. Nearly all names have NS records pointing to Telnic's own name servers. The A records all point to Telnic's own web server that shows directory information, A NAPTR record provides the telephone number for registrants for whom they have one. Users can only directly provision MX records. Except that there are 16 domains, none IDNs, that point to random other name servers and mostly appear to be parked.
The .TRAVEL TLD has no IDNs, and no rules or practices for them.
The .XXX TLD has no IDNs, and no rules or practices for them.
Many of the references may appear to be incomplete. This is due to bugs in the current version of XML2RFC. Consult the XML for full names and URLs.