Internet DRAFT - draft-klensin-iri-sri
draft-klensin-iri-sri
IETF J. Klensin
Internet-Draft S. Moonesamy
Obsoletes: 3987 (if approved) July 9, 2012
Intended status: Standards Track
Expires: January 10, 2013
An XML-based Simple Resource Identifier
draft-klensin-iri-sri-00.txt
Abstract
While the URI specification has been widely deployed, it has long
been recognized that many valid URIs, especially those that contain
extensive information in the "tail" are unsuitable for user
presentation, especially for internationalized environments. IRIs
have been proposed as a solution for that problem but inherit (and
are constrained by) the complex and sometimes method-dependent syntax
model of URIs as well as positional and ordering assumptions that
make them more difficult to localize than is desirable.
This specification illustrates a way to define an "above URI" model
for a localization-friendly simple reference identifier (SRI) that
explicitly identifies fields and is more appropriate than IRIs to
support localization. The current version is intended simply to
initiate a discussion. In particular, while it is written to use an
XML element syntax model, variations using JSON or some other system
with explicitly-labeled data fields might be as, or more,
appropriate.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 10, 2013.
Copyright Notice
Klensin & Moonesamy Expires January 10, 2013 [Page 1]
Internet-Draft SRI July 2012
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Status and Discussion . . . . . . . . . . . . . . . . . . 3
2. Tagged Elements . . . . . . . . . . . . . . . . . . . . . . . 4
3. Data Element Description . . . . . . . . . . . . . . . . . . . 4
3.1. scheme Element . . . . . . . . . . . . . . . . . . . . . . 5
3.2. authority Element . . . . . . . . . . . . . . . . . . . . 5
3.2.1. user-info Element . . . . . . . . . . . . . . . . . . 5
3.2.2. host Element . . . . . . . . . . . . . . . . . . . . . 5
3.2.3. port Element . . . . . . . . . . . . . . . . . . . . . 5
3.3. path Element . . . . . . . . . . . . . . . . . . . . . . . 5
3.4. query Element . . . . . . . . . . . . . . . . . . . . . . 5
3.5. fragment Element . . . . . . . . . . . . . . . . . . . . . 6
4. Internationalization and Escapes . . . . . . . . . . . . . . . 6
5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
8. Security Considerations . . . . . . . . . . . . . . . . . . . 7
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7
9.1. Normative References . . . . . . . . . . . . . . . . . . . 7
9.2. Informative References . . . . . . . . . . . . . . . . . . 8
Appendix A. This Specification and the IRI Approach . . . . . . . 8
Appendix B. XML DTD . . . . . . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
Klensin & Moonesamy Expires January 10, 2013 [Page 2]
Internet-Draft SRI July 2012
1. Introduction
While the URI specification [RFC3986] has been widely deployed, it
has long been recognized that many valid URIs, especially those that
contain extensive information in the "tail" are unsuitable for user
presentation, especially for internationalized environments. IRIs
[RFC3987] have been proposed as a solution for that problem but
inherit (and are constrained by) the complex and sometimes method-
dependent syntax model of URIs as well as positional and ordering
assumptions that make them more difficult to localize than is
desirable.
This specification illustrates a way to define a localization-
friendly "above URI" simple syntax (a "SRI") that explicitly
identifies fields and is more appropriate than IRIs to support
localization.
[[anchor2: Note in Draft: "Simple" is chosen in the grand tradition
of "simple" protocols like SMTP and SIP". Certainly the parsing of
the compound identifier into components is simpler than the URI
model. But suggestions for alternate terms would be welcome if
"simple" turns into flame-bait.]]
This specification obviates most, if not all, of the perceived need
for IRIs and hence obsoletes the specification of them in RFC 3087.
A discussion of the reasons for that action appears in Appendix A.
1.1. Terminology
The terms "i18n" and "l10n" are liberally used as abbreviations for
"internationalization" and "localization", respectively, in this
specification.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
1.2. Status and Discussion
[[anchor5: RFC Editor: Please remove this subsection.]]
This draft is a pre-proposal to stimulate discussion of the IRI
approach and alternatives to it. While it is deliberately
incomplete, the path to an actual proposal should be clear. Also,
the choice of an XML element syntax model [XML] structure was fairly
arbitrary. It would probably be equally reasonable to support a JSON
[RFC4627] or other structure instead (or additionally) as long as the
basic syntax chosen supports clear identification of data elements
Klensin & Moonesamy Expires January 10, 2013 [Page 3]
Internet-Draft SRI July 2012
and a very precise and context-independent syntax for element values.
Discussion of this draft should occur on the IRI WG mailing list.
Details about subscription and archives for the list may be found at
XXXXX.
2. Tagged Elements
Much of the complexity in the URI specification lies in trying to
identify and extract the various parts of a URI. That process is
complicated by scheme-dependent elements and the associated
delimiters which may be reserved or not depending on the scheme.
That work may be appropriate if some system actually needs to parse
and execute a URI -- an activity that requires understanding the
scheme in any event-- but may be less appropriate for an i18n / l10n
overlay.
This specification overcomes that problem and the associated
complexities introduced by characters outside the ASCII repertoire,
URI escaping conventions, and so on by eliminating the constraint of
forward compatibility with URIs in favor of a more international
format that can be easily localized and equally easily be mapped into
that URI syntax.
3. Data Element Description
This section maps the various components of URIs into XML elements.
For purposes of this specification, the URI syntax is discarded; only
the data elements are retained. The mapping from an XML-structured
document using these elements to URI syntax should be fairly obvious
[[anchor8: ...and possibly covered in more detail in a future version
of this spec]]. It is obviously possible to specify a collection of
elements with this specification that, when mapped back into URI
syntax, will be invalid or confusing for a particular scheme. If
that is perceived as an issue, specific lists of what elements are
valid for which schemes should be easy to compile.
The basic structure starts with a localization-friendly element that
contains all other elements (and has no direct textual content):
<SRI>
[[anchor9: Note in Draft: Each of the subsections that follow can
probably benefit from some fleshing-out. For this version, the
general intent should be clear. It is likely that several more
subsidiary elements are needed, but that is a topic for future
discussion.]]
Klensin & Moonesamy Expires January 10, 2013 [Page 4]
Internet-Draft SRI July 2012
3.1. scheme Element
<scheme> SchemeName </scheme>
The Scheme element has no subsidiary elements.
3.2. authority Element
<authority>
Authority elements as below.
</authority>
The Authority element has the subsidiary elements listed in the
subsections below.
3.2.1. user-info Element
3.2.2. host Element
Domain names are subject to special rules because of IDNA
considerations, so the normal content of the host element is a domain
element. [Domain-]relative URIs do not use the domain element.
3.2.2.1. domain Element
<domain> Fully-qualified-domain-name </domain>
3.2.3. port Element
<port> NN <port>
NN is a numeric port number.
3.3. path Element
<path> PathString </path>
[[anchor16: Subsidiary elements here, including <domain> and/or <SRI>
when appropriate.]]
3.4. query Element
<query> QueryString </query>
[[anchor18: Subsidiary elements here, including <domain> and/or <SRI>
when appropriate.]]
Klensin & Moonesamy Expires January 10, 2013 [Page 5]
Internet-Draft SRI July 2012
3.5. fragment Element
<fragment> FragmentName or other identifier </fragment>
The Fragment element has no subsidiary elements.
4. Internationalization and Escapes
Part of the goal for the format specified here is to express the
abstract components of a URI as naturally as possible. Consequently,
any text component of any element can be expressed in UTF-8 in
normalization form NFC. Escapes ("%" or otherwise) are prohibited
except as required by XML. If "%" appears, it must be doubled in
mapping to URI syntax.
5. Examples
[[anchor22: There should be several of these, each showing a URI and
the matching XRI form.]]
The URI that would appear as
http://example.com/test?sri=http://example.net/
Would appear in this form as:
<uri>
<sri>
<scheme>http</scheme>
<authority>
<host>
<domain>example.com</domain>
</host>
</authority>
<path>/test</path>
<query>
<sri>
<scheme>http</scheme>
<authority>
<host>
<domain>example.net</domain>
</host>
</authority>
</sri>
</query>
</sri>
</uri>
Klensin & Moonesamy Expires January 10, 2013 [Page 6]
Internet-Draft SRI July 2012
[[anchor23: Note in draft: RFC (and I-D) constraints prohibit showing
one of these data structures with characters in it outside the ASCII
repertoire. If the document ever progresses to RFC, an alternate
form that can show such examples including such characters should be
a requirement.]]
6. Acknowledgements
Some of the structuring information for this document was derived
from a W3C working draft on URLs [W3C-URL] as well as the URI
specification. The thinking that led to this work started with a
discussion many years ago with James Seng in which he pointed out
that the "natural" ordering of components of compound identifiers
differed by culture.
7. IANA Considerations
[[anchor24: RFC Editor: Please remove this section before
publication.]]
This memo includes no requests to or actions for IANA.
8. Security Considerations
The model introduced in this specification does not raise any
security issues not already present in the URI specification that
would not be caught by a URI processor. Because it is less subtle
and complex than the URI specification, it may actually lead to a
reduction in vunerabilities.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005.
Klensin & Moonesamy Expires January 10, 2013 [Page 7]
Internet-Draft SRI July 2012
[XML] Bray, T., Ed., Paoli, J., Ed., Sperberg-McQueen, C., Ed.,
and E. Maler, Ed., "Extensible Markup Language (XML) 1.0
(Second Edition), W3C=20 Recommendation", October 2000,
<http://www.w3.org/TR/REC-xml>.
9.2. Informative References
[IRI-Charter]
IETF, "Internationalized Resource Identifiers (iri)",
Captured 2012-07-05, 2019,
<http://datatracker.ietf.org/wg/iri/charter/>.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[RFC4627] Crockford, D., "The application/json Media Type for
JavaScript Object Notation (JSON)", RFC 4627, July 2006.
[RFC5890] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Definitions and Document Framework",
RFC 5890, August 2010.
[RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
Encodings for Internationalized Domain Names", RFC 6055,
February 2011.
[W3C-URL] W3C, "URL", Captured 2012-07-03, 2012,
<http://www.w3.org/TR/url/>.
Appendix A. This Specification and the IRI Approach
The original IRI specification [RFC3987] was intended as a strict
superset of the URI syntax [RFC3986] with all URI forms being
permitted but with the use of non-escaped UTF-8 strings also being
allowed. IRIs were not separate protocol identifiers or intended for
use "on the wire". Instead, they were intended as an overlay for
URIs that was more convenient for users. In part because of the
interaction with the original [RFC3490] and revised [RFC5890]
versions of the IDNA specification, the mapping from IRIs to URIs was
not unique: one could map a domain name expressed as a UTF-8 string
into either a URI escape sequence or into a set of IDNA A-labels.
That choice interacted badly with the domain name encoding
considerations discussed by the IAB [RFC6055] and, more importantly,
with URI comparisons in caches and similar contexts.
Based on those and other considerations, an IETF WG charged with IRI
Klensin & Moonesamy Expires January 10, 2013 [Page 8]
Internet-Draft SRI July 2012
revision [IRI-Charter] concluded that IRIs should be treated as a
separate protocol identifier, primarily for use in new protocols,
rather than as a strictly-forward-compatible URI overlay. That
decision immediately raised the question of whether it was more
valuable to preserve a URI-like syntax or depart from it entirely.
This specification resulted from the desire to explore the
possibilities that would be opened up by abandoning the constraint of
apparent similarity to the URI syntax. But, just as the decision to
move to a separate protocol identifier essentially recognizes that
the IRIs defined in RFC 3987 was not feasible and an IRI variation
that defined a new protocol element while retaining the general form
of the URI syntax would obsolete 3987, this specification does as
well: whether the underlying syntax model is changed or not, the WG
has concluded that IRIs as defined in RFC 3987 are inappropriate for
general use on the public Internet.
Appendix B. XML DTD
<!ELEMENT uri (sri)>
<!-- Simple Resource Identifier -->
<!ELEMENT sri (scheme, authority, path?, query?,
fragment?)>
<!ELEMENT authority ( user-info?, host, port?)>
<!ELEMENT scheme (#PCDATA)>
<!ELEMENT user-info (#PCDATA)>
<!ELEMENT host (domain | ip-address)>
<!ELEMENT port ((#PCDATA)>
<!ELEMENT path (#PCDATA | domain | sri)*>
<!ELEMENT query (#PCDATA | domain | sri)*>
<!ELEMENT fragment (#PCDATA)*>
<!-- This contains a FQDN -->
<!ELEMENT domain (#PCDATA)>
<!ELEMENT ip-address (#PCDATA)>
Klensin & Moonesamy Expires January 10, 2013 [Page 9]
Internet-Draft SRI July 2012
Authors' Addresses
John C Klensin
1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140
USA
Phone: +1 617 491 5735
Email: john-ietf@jck.com
Subramanian Moonesamy
76, Ylang Ylang Avenue
Quatre Bornes
Mauritius
Email: sm+ietf@elandsys.com
Klensin & Moonesamy Expires January 10, 2013 [Page 10]