Network Working Group | H. Van de Sompel |
Internet-Draft | Los Alamos National Laboratory |
Intended status: Informational | M. Nelson |
Expires: December 27, 2018 | Old Dominion University |
G. Bilder | |
Crossref | |
J. Kunze | |
California Digital Library | |
S. Warner | |
Cornell University | |
June 25, 2018 |
cite-as: A Link Relation to Convey a Preferred URI for Referencing
draft-vandesompel-citeas-03
A web resource is routinely referenced by means of the URI with which it is directly accessed. But cases exist where referencing a resource by means of a URI, different than that access URI, is preferred. This specification defines a link relation type that can be used to convey such a preference.
Please discuss this draft on the ART mailing list (<https://www.ietf.org/mailman/listinfo/art>).
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 27, 2018.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
A web resource is routinely referenced (e.g. linked, bookmarked) by means of the URI with which it is directly accessed. But cases exist where referencing a resource by means of a different URI is preferred, for example because the latter URI is intended to be more persistent over time. Currently, there is no link relation type to convey such alternative referencing preference; this specification addresses this deficit by introducing a link relation type intended for that purpose.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This specification uses the terms "link context" and "link target" as defined in [RFC8288]. These terms respectively correspond with "Context IRI" and "Target IRI" as used in [RFC5988]. Although defined as IRIs, in common scenarios they are also URIs.
Additionally, this specification uses the following terms:
By interacting with the access URI, the user agent may discover typed links. For such links, the access URI is the link context.
Despite sound advice regarding the design of Cool URIs [CoolURIs], link rot ("HTTP 404 Not Found") is a common phenomena when following links on the web. Certain communities of practice have introduced solutions to combat this problem that typically consist of:
This approach is, for example, used by:
In order for the investments in infrastructure involved in these approaches to pay off, and hence for links to effectively remain operational as intended, it is crucial that a resource be referenced by means of its reference URI. However, the access URI is where a user agent actually accesses the resource (e.g., it is the URI in the browser's address bar). As such, there is a considerable risk that the access URI instead of the reference URI is used for referencing [PIDs-must-be-used].
The link relation type defined in this specification allows to convey to user agents that the reference URI is the preferred URI for referencing.
Resource versioning systems often use a naming approach whereby:
For example, Wikipedia uses generic URIs of the form http://en.wikipedia.org/wiki/John_Doe and version URIs of the form https://en.wikipedia.org/w/index.php?title=John_Doe&oldid=776253882.
While the current version of a resource is accessed at the generic URI, some versioning systems adhere to a policy that favors linking and referencing a specific version URI. To express this using the terminology of Section 2, these policies intend that the generic URI is the access URI, and that the version URI is the reference URI. These policies are informed by the understanding that the content at the generic URI is likely to evolve over time, and that accurate links or references should lead to the content as it was at the time of referencing. To that end, Wikipedia's "Permanent link" and "Cite this page" functionalities promote the version URI, not the generic URI.
The link relation type defined in this specification allows to convey to user agents that the version URI is preferred over the generic URI for referencing.
A web user commonly has multiple profiles on the web, for example, one per social network, a personal homepage, a professional homepage, a FOAF profile [FOAF], etc. Each of these profiles is accessible at a distinct URI. But the user may have a preference for one of those profiles, for example, because it is most complete, kept up-to-date, or expected to be long-lived. As an example, the first author of this document has, among others, the following profile URIs:
Of these, from the perspective of the person described by these profiles, the first URI may be the preferred profile URI for the purpose of referencing because the domain is not under the custodianship of a third party. When an agent accesses another profile URI, such as http://public.lanl.gov/herbertv/, this preference for referencing by means of the first URI could be expressed.
The link relation type defined in this specification allows to convey to user agents that a profile URI - the reference URI - other than the one the agent is accessing - the access URI - is preferred for referencing.
When publishing on the web, it is not uncommon to make distinct components of a publication available as different web resources, each with their own URI. For example:
While each of these components are accessible at their distinct URI - the access URI - they often also share a URI assigned to the intellectual publication of which they are components - the reference URI.
The link relation type defined in this specification allows to convey to user agents that, for the purpose of referencing, the reference URI of the intellectual publication is preferred over an access URI of a component of the publication.
A link with the "cite-as" relation type indicates that, for referencing the link context, use of the URI of the link target is preferred over use of the URI of the link context. It allows the resource identified by the access URI (link context) to unambiguously link to its corresponding reference URI (link target), thereby expressing that the link target is preferred over the link context for the purpose of permanent citation.
The link target of a "cite-as" link SHOULD support protocol-based access as a means to ensure that applications that store them can effectively re-use them for access.
The link target of a "cite-as" link SHOULD provide the ability for a user agent to follow its nose back to the context of the link, e.g. by following redirects and/or links. This helps a user agent to establish trust in the target URI.
Because a link with the "cite-as" relation type expresses a preferred URI for the purpose of referencing, the access URI SHOULD only provide one link with that relation type. If more than one "cite-as" link is provided, the user agent may decide to select one (e.g. an HTTP URI over a mailto URI), for example, based on the purpose that the reference URI will serve.
Providing a link with the "cite-as" relation type does not prevent using the access URI for the purpose of referencing if such specificity is needed for the application at hand. For example, in the case of scenario Section 3.4 the access URI is likely required for the purpose of annotating a specific component of an intellectual publication. Yet, the annotation application may also want to appropriately include the reference URI in the annotation.
Applications can leverage the information provided by a "cite-as" link in a variety of ways, for example:
Some existing IANA-registered relationships intuitively resemble the relationship that "cite-as" is intended to convey. But a closer inspection of these candidates provided in the blog posts [identifier-blog], [canonical-blog], and [bookmark-blog] shows that they are not appropriate for various reasons and that a new relation type is required. The remainder of this section provides a summary of the detailed explanations provided in the referenced blog posts.
It can readily be seen that the following relation types are not fit for purpose:
Two existing IANA-registered relationships deserve closer attention and are discussed in the remainder of this section.
"bookmark" [W3C.REC-html5-20151028]: The link target provides a URI for the purpose of bookmarking the link context.
The intent of "bookmark" is closest to that of "cite-as" in that the link target is intended to be a permalink for the link context, for bookmarking purposes. The relation type dates back to the earliest days of news syndication, before blogs and news feeds had permalinks to identify individual resources that were aggregated into a single page. As such, its intent is to provide permalinks for different sections of an HTML document. It was originally used with HTML elements such as <div>, <h1>, <h2>, etc. and, more recently, HTML5 revised it to be exclusively used with the <article> element. Moreover, it is explictly excluded from use in the <link> element in HTML <head>, and, as a consequence, in the HTTP Link header that is semantically equivalent. For these technical and semantic reasons, the use of "bookmark" to convey the relationship intented by "cite-as" is not appropriate.
A more detailed justification regarding the inappropriatenss of "bookmark", including a thorough overview of its turbulent history, is provided in [bookmark-blog].
"canonical" [RFC6596]: The meaning of "canonical" is commonly misunderstood on the basis of its brief definition as being "the preferred version of a resource." The description in the abstract of [RFC6596] is more helpful and states that "canonical" is intended to link to a resource that is preferred over resources with duplicative content. A more detailed reading of [RFC6596] clarifies that the intended meaning is preferred for the purpose of content indexing. A typical use case is linking from each page in a multi-page magazine article to a single page version of the article provided for indexing by search engines: the former pages provide content that is duplicative to the superset content that is available at the latter page.
The semantics intended by "canonical" as preferred for the purpose of content indexing differ from the semantics intended by "cite-as" as preferred for the purpose of referencing. A further exploration of the various scenarios shows that the use of "canonical" is not appropriate to convey the semantics intended by "cite-as":
A more detailed justification regarding the inappropriatenss of "canonical", including examples, is provided in [canonical-blog].
Sections Section 6.1 through Section 6.4 show examples of the use of links with the "cite-as" relation type. They illustrate how the typed links can be used in a response header and/or response body.
PLOS ONE is one of many scholarly publishers that assigns DOIs to the articles it publishes. For example, https://doi.org/10.1371/journal.pone.0171057 is the persistent identifier for such an article. Via the DOI resolver, this persistent identifier redirects to http://journals.plos.org/plosone/doi?id=10.1371/journal.pone.0171057 in the plos.org domain. This URI itself redirects to http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0171057, which delivers the actual article in HTML.
The HTML article contains a <link> element with the "canonical" relation type pointing at itself, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167475. As per Section 5.2, this indicates that the article content at that URI should be indexed by search engines.
PLOS ONE can additionally provide a link with the "cite-as" relation type pointing at the persistent identifier to indicate it is the preferred URI for permanent citation of the article. Figure 1 shows the addition of the "cite-as" link both in the HTTP header and the HTML that results from an HTTP GET on the article URI http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167475.
HTTP/1.1 200 OK Link: <https://doi.org/10.1371/journal.pone.0171057> ; rel="cite-as" Content-Type: text/html;charset=utf-8 <html> <head> ... <link rel="cite-as" href="https://doi.org/10.1371/journal.pone.0171057" /> <link rel="canonical" href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167475" /> ... </head> <body> ... </body> </html>
Figure 1: Response to HTTP GET on the URI of a scholarly article
The preprint server arXiv.org has a versioning approach like the one described in Section 3.2:
A reader who accessed https://arxiv.org/abs/1711.03787 between 10 November 2017 and 23 January 2018, obtained the first version of the preprint. Starting 24 January 2018, the second version was served at that URI. In order to support accurate referencing, arXiv.org could implement the "cite-as" link to point from the generic URI to the most recent version URI. In doing so, assuming the existence of reference manager tools that consume "cite-as" links:
Figure 2 shows the header that arXiv.org would have returned in the first case, in response to a HTTP HEAD on the generic URI https://arxiv.org/abs/1711.03787.
HTTP/1.1 200 OK Date: Sun, 24 Dec 2017 16:12:43 GMT Content-Type: text/html; charset=utf-8 Link: <https://arxiv.org/abs/1711.03787v1> ; rel="cite-as" Vary: Accept-Encoding,User-Agent
Figure 2: Response to HTTP HEAD on the generic URI of the landing page of an arXiv.org preprint
If the access URI is the home page of John Doe, John can add a link with the "cite-as" relation type to it, as a means to convey that he would preferably be referenced by means of the URI of his FOAF profile. Figure 3 shows the response to an HTTP GET on the URI of John's home page.
HTTP/1.1 200 OK Content-Type: text/html;charset=utf-8 <html> <head> ... <link rel="cite-as" href="http://johndoe.example.com/foaf" type="text/ttl"/> ... </head> <body> ... </body> </html>
Figure 3: Response to HTTP GET on the URI of John Doe's home page
The Dryad Digital Repository at datadryad.org specializes in hosting and preserving scientific datasets. Each dataset typically consists of multiple resources. For example, the dataset "Data from: Climate, demography, and lek stability in an Amazonian bird" consists of an Excel spreadsheet, a csv file, and a zip file. Each of these resources have different content and are accessible at their respective URIs. In addition, the dataset has a landing page at https://datadryad.org/resource/doi:10.5061/dryad.5d23f.
Each of these resources should be permanently cited by means of the persistent identifier that was assigned to the entire dataset as an intellectual publication, i.e. https://doi.org/10.5061/dryad.5d23f. To that end, the Dryad Digital Repository can add "cite-as" links pointing from the URIs of each of these resources to https://doi.org/10.5061/dryad.5d23f. This is shown in Figure 4 for the csv file that is a component resource of the dataset, through use of the HTTP Link header.
HTTP/1.1 200 OK Date: Tue, 12 Jun 2018 19:19:22 GMT Last-Modified: Wed, 17 Feb 2016 18:37:02 GMT Content-Type: text/csv;charset=ISO-8859-1 Content-Length: 25414 Link: <https://doi.org/10.5061/dryad.5d23f> ; rel="cite-as" DATE,Year,PLOT/TRAIL,LOCATION,SPECIES CODE,BAND NUM,COLOR,SEX,AGE,TAIL,WING, TARSUS,NARES,DEPTH,WIDTH,WEIGHT 6/26/02,2002,DANTA,325,PIPFIL,969,B/O,M,AHY,80,63,16,7.3,3.9,4.1,14.4 ... 2/3/13,2013,LAGO,,PIPFIL,BR-5095,O/YPI,M,SCB,78,65.5,14.2,7.5,3.8,3.7,14.3
Figure 4: Response to HTTP GET on the URI of a csv file that is a component of a scientfic dataset
The link relation type below has been registered by IANA per Section 2.1.1 of [RFC8288]:
In cases where there is no way for the agent to automatically verify the correctness of the reference URI (cf. Section 4), out-of-band mechanisms might be required to establish trust.
If a trusted site is compromised, the "cite-as" link relation could be used with malicious intent to supply misleading URIs for referencing. Use of these links might direct user agents to an attacker's site, break the referencing record they are intended to support, or corrupt algorithmic interpretation of referencing data.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC4287] | Nottingham, M. and R. Sayre, "The Atom Syndication Format", RFC 4287, DOI 10.17487/RFC4287, December 2005. |
[RFC5988] | Nottingham, M., "Web Linking", RFC 5988, DOI 10.17487/RFC5988, October 2010. |
[RFC6249] | Bryan, A., McNab, N., Tsujikawa, T., Poeml, P. and H. Nordstrom, "Metalink/HTTP: Mirrors and Hashes", RFC 6249, DOI 10.17487/RFC6249, June 2011. |
[RFC6596] | Ohye, M. and J. Kupke, "The Canonical Link Relation", RFC 6596, DOI 10.17487/RFC6596, April 2012. |
[RFC8288] | Nottingham, M., "Web Linking", RFC 8288, DOI 10.17487/RFC8288, October 2017. |
[W3C.REC-html5-20151028] | Hickson, I., Berjon, R., Faulkner, S., Leithead, T., Doyle Navara, E., O'Connor, E. and S. Pfeiffer, "HTML5", World Wide Web Consortium Recommendation REC-HTML5-20141028, October 2014. |
Thanks for comments and suggestions provided by Martin Klein, Harihar Shankar, Peter Williams, John Howard, Mark Nottingham, Graham Klyne.