Uniform Resource Names (urnbis) J. Klensin
Internet-Draft June 8, 2016
Updates: 3986 (if approved)
Intended status: Standards Track
Expires: December 10, 2016

URN Semantics Clarification
draft-ietf-urnbis-semantics-clarif-04

Abstract

Experience has shown that identifiers associated with persistent names have properties and requirements that may be somewhat different from identifiers associated with the locations of objects. This is especially true when such names are expected to be stable for a very long time or when they identify large and complex entities. In order to allow Uniform Resource Names (URNs) to evolve to meet the needs of the Library, Museum, Publisher, and Information Science communities and other users, this specification separates URNs from the semantic constraints that many people believe are part of the specification for Uniform Resource Identifiers (URIs) in RFC 3986, updating that document accordingly. The syntax of URNs is still constrained to that of RFC 3986, so generic URI parsers are unaffected by this change.

Advice to RFC Editor and WG

Note to RFC Editor: Various comments in drafts of this documents were written to describe the situation in, and perspective of, the WG. They will need careful checking for tense if the document is queued for publication as an RFC.
WG participants: please do not spend time reporting or discussing that type of obvious editorial issues or, e.g., the amount of white space after periods -- the RFC Production Center are really very good at their jobs.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on December 10, 2016.

Copyright Notice

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The Generic URI Syntax specification [RFC3986] covers both locators and names and mixtures of the two (See its Section 1.1.3) and describes Uniform Resource Locators (URLs) -- first documented in the IETF in RFC 1738 [RFC1738] -- as an embodiment of the locator concept and Uniform Resource Names (URNs), specifically those using the "urn" scheme [RFC2141], as an embodiment of the names that do not directly provide for resource location. This specification is concerned only about URNs of the variety described in RFC 2141 [RFC2141] and its successors [RFC2141bis] (i.e., those that use the "urn" scheme). URLs, other types of names, and any URI types that may not fall into one of the above categories are out of its scope and unaffected by it.

Experience with URNs since the publication of RFC 3986 has identified several ways in which their inclusion under the 3986 scope has hampered understanding, adoption, and especially extension (specifically types of extensions that were anticipated, but not defined, in RFC 2141). The need for extensions to the URN concept is now being felt in some communities, especially those that include libraries, museums, publishers, and other information scientists.

In particular, the Generic URI Syntax specification goes beyond syntax to specify the meaning and interpretation of various fields, especially the "query" and "fragment" ones and the various syntax forms and interpretations it allows for <hier-part>. There is disagreement in the community about whether some of the statements in RFC 3986 are normative requirements or discussion of possible options and, if the latter, whether the options given are an exclusive list. Sequences of statements in that document that can be read in different ways reinforce those disagreements. As one example, the 3986 discussion of fragments (see Section 3.5, especially the first two paragraphs) has been read as leaving the interpretation of strings in fragment syntax that are not associated with retrievable objects and media types as undefined and unconstrained and hence available for other uses. Others have read the second paragraph as prohibiting any interpretation or use of fragments on a per-scheme basis, essentially prohibiting them when the URI does not resolve to an object with a media type.

This document does not attempt to resolve those disagreements. Doing so does not seem to be necessary and would be far out of scope for the WG that produced it, and would mire URN work in controversies that might never be resolved. Instead, it provides what might best be thought of as an interpretation rule: if someone reads a statement about the meaning or interpretation of a particular field, or non-syntactic restrictions on it, as inconsistent between RFC 3986 and this document and/or [RFC2141bis], these URN-specific documents prevail (again, only for the "urn:" scheme; any extension to other types of names would be the subject of other work).

In other words, this specification excludes URNs from the RFC 3986 definitions of meaning and interpretation so that RFC 3986 applies, for URNs, to their syntax only. The meaning --and any more specific syntax rules-- for those fields for URNs are now defined in a URN-specific document [RFC2141bis]. URNs remain members of the URI family and parsers for generic URI syntax are not affected by this specification although parsers that make assumptions based on other URI schemes obviously might be.

Neither this specification nor the successor to RFC 2141 [RFC2141bis] discusses DDDS [RFC3401] resolution or conversion to (and interpretation of) URCs [RFC2483] or, with the exception of providing some syntax to cover some specific cases, URN "resolution" more generally. Any of those topics that do need to be addressed should be covered in other documents. The document also does not discuss alternatives to URNs, either those that might use a different scheme name within the RFC 3986 URI framework or those that might use a different framework entirely. In particular, some externally-defined content or object identification systems could be represented either by a URN namespace or through separate URI schemes. This specification does not offer advice on that choice other than to suggest that the two options not be confused (or both used in a way that would be confusing).

In the long term, as the expanded syntax and uses of URNs become commonplace and RFC 3986 is updated, this specification is likely to become of historical interest only, providing an extended rationale for decisions made and adjustment of the boundary between URN specifications and generic URI ones, especially those that are used as locators rather than names.

2. Pragmatic Goals

Despite the important background and rationale in other sections of this document, the change made (or clarification provided) by this specification is driven by a desire to avoid philosophical debates about terminology, ultimate truths, or even different interpretations of RFC 3986. Instead, it is motivated by three very pragmatic principles and goals:

  1. Accommodate the communities who think URNs are necessary, i.e., that they can and should be usefully distinguished from other URIs, at least location-oriented ones (including URI schemes defined prior to the time work started on this document in August 2014). In particular, provide a foundation for extensions to the URN syntax (as allowed by and partially defined in RFC 2141) to support requirements encountered by some of those communities.
  2. Provide a path to avoid getting bogged down in declarative statements about definitions and debates about what is and is not abstractly correct.
  3. Avoid a fork in the standard that would be likely to lead to multiple, conflicting, definitions or criteria for URNs.

In addition, this document is intended to move past debates about whether or not URNs are intended to be parsed at all (i.e., whether a "urn"-scheme URI is simply opaque to a URI parser once the scheme name is identified) and, if not, how much of it is actually expected to be understood and broken into identifiable parts by such a parser. It establishes a principle that, for the "urn" scheme, parsing into the components identified in RFC 3986 may be performed but that any meanings or interpretation assigned to those components (including application of the normal English meanings of such terms as "query" or "fragment") are a matter for URN-specific specifications. That principle and its application provides a foundation for the distinguishing terms "q-component", "r-component", and "f-component" that are developed in the accompanying URN definition specification [RFC2141bis].

3. The role of queries and fragments in URNs

[CREF1]Note in Draft to WG: Given what is now above and material in 2141bis, I suggest removing this section entirely (if needed, transposing it into 2141bis). If we do retain it, it almost certainly needs more work. Comments, specifically about whether it should be removed and, if so, what changes (if any) are needed to 2141bis, would be appreciated. --JcK (-04)

Part of the concern that led to this document was a desire to accommodate URN components that would be analogous to the query and fragment components of generalized URNs but that might have different properties. For many cases, the analogy cannot be exact. For example, RFC 3986 ties the interpretation of fragments to media types. Since media type is a function of specific content, URNs that are never resolved cannot have an associated media type, nor can URNs that resolve to, for example, other URIs that may then not be resolved further. Similarly, while the RFC 3986 syntax for queries (and fragments) may be entirely appropriate for URN use, terminology like "Service Request" (see Appendix B of the predecessor "URNs are not..." draft [ServiceRequests] for additional discussion) may be more suitable to the URN context than "query" (if, indeed, the portion of the URN that is syntactically equivalent to a URI query is where those requests belong).

4. Changes to RFC 3986

The interpretation rule discussed in Section 1 notwithstanding, this document alters ("updates") RFC 3986 itself only by specifying that the interpretation of URNs of the "urn:" scheme, may vary from that for other types of URIs. That might be implemented by, for example, inserting text at the end of Section 1.1.3 (of RFC 3986) similar to: [CREF2]Note in Draft: The above example suggested text opens the door to unbinding _all_ name-type URIs from the semantics of 3986 despite assertions elsewhere in this document that anything other then the "urn:" scheme is out of scope. In reviewing 3986, the example location and text seemed the best and most consistent way to modify the relevant section. That section definitely does not talk about individual schemes. WG participants, in reviewing this section and text, should note that IETF procedures have never required that a specification that "updates" a document provide modifying text nor that an author or WG working on a revision to the document thereby updated use that text. I've included it here because of discussion on the mailing list to the effect that this document was unclear about just how it updated 3986 -- the above is intended to make that crystal-clear. Alternate text that would not have those issues would be welcome. --JcK (-04)

The effect of the above is to remove URN semantics from the scope of RFC 3986. It makes no changes to the generic URI syntax, nor to the semantics of any other URI scheme. The 3986 syntax still applies to URNs as well as to other URI types. Even as regard to semantics, it has no practical effect for URNs defined in strict conformance to the prior URN specification [RFC2141] or the associated registration specification [RFC3406].

[CREF3]I think the paragraph that follows can safely be dropped and will do so in future versions (if any) unless there are comments to the contrary that explain what purpose it serves and any needed changes. --JcK (-04)

In particular (but without altering RFC 3986 in any way), the generic URI syntax for "queries" (strings starting with "?" and continuing to the end of the URI or to a "#"), and for "fragments" (strings starting with "#" and continuing to the end of the URI) is unchanged. For URNs, additional syntax is introduced to divide the URI "query" into two parts, referred to as "q-components" and "r-components". The syntax and general semantics of "fragments" (specified in RFC 3986 as scheme-independent) are unchanged, but a somewhat liberal interpretation may be needed in the context of URNs, so a fragment is referred to as an "f-component" as a term of convenience to highlight that distinction [RFC2141bis].

5. Actions Occurring in Parallel with this Specification

The basic URN syntax specification [RFC2141] was published well before RFC 3986 and therefore does not depend on it. The successor to that specification [RFC2141bis], fully spells out, or references documents that spell out, the semantics and any required within-field syntax of URNs. It uses great care about generic or implicit reference to any URI specification and delegates further details to specific namespaces.

[CREF4]Note in Draft: Perhaps this section can be dropped entirely. --JcK (04)

6. Acknowledgments

This specification was inspired by a search in the IETF URNBIS WG for an approach that would both satisfy the needs of persistent name-type identifiers and still fully conform to various readings and understandings of the specifications and intent of RFC 3986. That search lasted several years and considered many alternatives. Discussions with Leslie Daigle, Juha Hakala, Barry Leiba, Keith Moore, Andrew Newton, and Peter Saint-Andre during the last quarter of 2013 and the first quarter of 2014 were particularly helpful in arriving at the conclusion that a conceptual separation of notions of location-based identifiers (e.g., URLs) and the types of persistent identifiers represented by URNs was necessary. Juha Hakala provided useful explanations and significant working text about the needs of the library community and their perception of identifiers and consequent implications for URN structure. Peter Saint-Andre provided significant text in a pre-publication review. The author also appreciates the efforts of several people, notably Tim Berners-Lee, Leslie Daigle, Juha Hakala, Sean Leonard, Larry Masinter, Keith Moore, Julian Reschke, Lars Svensson, Henry S. Thompson, and Dale Worely, to challenge text and ideas and demand answers to hard questions. Whether they agree with the results or not, their insights have contributed significantly to whatever clarity and precision appears in the present document.

The specification was changed considerably and its focus narrowed after an extended discussion at the WG meeting during IETF 90 in July 2014 [IETF90-URNBISWG] and subsequent comments and clarifications on the mailing list [URNBIS-MailingList]. The contributions of all of the participants in those discussions, only some of whose names appear above, are gratefully acknowledged.

7. Contributors

Juha Hakala contributed considerable text, some of which was removed from later versions of the document to streamline it.

8. IANA Considerations

[CREF5]RFC Editor: Please remove the first paragraph below before publication.

This memo is not believed to require any action on IANA's part.

There is an existing (i.e. prior to the publication of this document) registry for "Uniform Resource Identifier (URI) Schemes" that already includes the "urn" scheme itself and a separate existing URN Namespace registry. None of the registrations that predate this specification have any specific dependencies on generic URI specifications. More information on this subject appears in [RFC2141bis] and documents referenced from it.

9. Security Considerations

As discussed in Section 1 above, this document is largely precautionary, providing an interpretation rule for the URI definition [RFC3986] when URNs are concerned. Some members of the community believe that rule (and hence this document) are unnecessary, at most reinforcing provisions already in that definition. Others believe that it restores the original URN definition [RFC2141], produced before RFC 3986 was adopted and not updated by it. Still others see this specification as making a necessary change to allow the semantics of URNs to be self-contained (as specified in other documents), relying on the generic URI syntax specification for syntax only.

Independent of which of those models is applicable, the specification should have no effect on Internet security unless the use of a definition, syntax, and semantics that are more clear reduces the potential for confusion and consequent vulnerabilities.

10. References

10.1. Normative References

[RFC2141] Moats, R., "URN Syntax", RFC 2141, DOI 10.17487/RFC2141, May 1997.
[RFC2141bis] Saint-Andre, P. and J. Klensin, "Uniform Resource Name (URN) Syntax", February 2016.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005.

10.2. Informative References

[DeterministicURI] Mazahir, O., Thaler, D. and G. Montenegro, "Deterministic URI Encoding", February 2014.

This is an expired document, cited for historical context only.

[IETF90-URNBISWG] IETF, "URN BIS Working Group Minutes", July 2014.
[RFC1738] Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, DOI 10.17487/RFC1738, December 1994.
[RFC2483] Mealling, M. and R. Daniel, "URI Resolution Services Necessary for URN Resolution", RFC 2483, DOI 10.17487/RFC2483, January 1999.
[RFC3401] Mealling, M., "Dynamic Delegation Discovery System (DDDS) Part One: The Comprehensive DDDS", RFC 3401, DOI 10.17487/RFC3401, October 2002.
[RFC3406] Daigle, L., van Gulik, D., Iannella, R. and P. Faltstrom, "Uniform Resource Names (URN) Namespace Definition Mechanisms", BCP 66, RFC 3406, DOI 10.17487/RFC3406, October 2002.
[ServiceRequests] Klensin, J., "Names are Not Locators and URNs are Not URIs, Appendix B", July 2014.

This is an expired document, cited for historical context only.

[URN-transition] Klensin, J. and J. Hakala, "Uniform Resource Name (URN) Namespace Registration Transition", Feburary 2016.
[URNBIS-MailingList] IETF, "IETF URN Mailing list", 2014.

Appendix A. Background on the URN - URI relationship

[CREF6]Note in Draft: I've slightly rewritten this Appendix, but suspect it may still be controversial. If it is, the WG should discuss whether the advantages of having the explanation justify the energy to figure out exactly what it should say or whether those advantages are few enough that it should just be dropped. --JcK (-04).
The Internet community now has many years of experience with both name-type identifiers and location-based identifiers (or "references" for those who are sensitive to the term "identifier" (a group that includes many members of the library and information science communities. The primary examples of these two categories are Uniform Resource Names (URNs [RFC2141] [RFC2141bis]) and Uniform Resource Locators (URLs) [RFC1738]). That experience leads to the conclusion that it is impractical to constrain URNs to the high-level semantics of URLs. The generic syntax for URIs [RFC3986] is adequately flexible to accommodate the perceived needs of URNs, but the specific semantics associated with the URI syntax definition -- what particular constructions "mean" and how and where they are constrained or interpreted -- appear to not be. Generalization from URLs to generic Uniform Resource Identifiers (URIs) [RFC3986], especially to name-based, high-stability, long-persistence, identifiers such as many URN namespaces, has failed because the assumed similarities do not adequately extend to all forms of, and requirements for, URNs.

Ultimately, locators, which typically depend on particular accessing protocols (protocols that are typically linked to the particular URI scheme) and a specification relative to some physical space or network topology, are simply different creatures from long-persistence, location-independent, object identifiers or abstract designators. Many of the constraints and interpretation rules that are appropriate for locators are either irrelevant to or interfere with the needs of resource names (at least of the "urn:" scheme) as a class. That was tolerable as long as the URN system didn't need additional capabilities (over those specified in RFC 2141) but experience since RFC 2141 was published has shown that they are, in fact, needed.

Appendix B. Change Log

[CREF7]RFC Editor: Please remove this appendix before publication.

B.1. Changes from draft-ietf-urnbis-urns-are-not-uris-00 (2014-04-07) to -01 (2014-07-03)

B.2. Changes from draft-ietf-urnbis-urns-are-not-uris-01 to draft-ietf-urnbis-semantics-clarif-00 (2014-08-25)

B.3. Changes from draft-ietf-urnbis-semantics-clarif-00 (2014-08-25) to -01

B.4. Changes from draft-ietf-urnbis-semantics-clarif-01 (2015-02-14) to -02

B.5. Changes from draft-ietf-urnbis-semantics-clarif-02 (2015-08-10) to -03

B.6. Changes from draft-ietf-urnbis-semantics-clarif-03 (2016-02-07) to -04

Author's Address

John C Klensin 1770 Massachusetts Ave, Ste 322 Cambridge, MA 02140 USA Phone: +1 617 245 1457 EMail: john-ietf@jck.com