Network Working Group | J. Levine |
Internet-Draft | Taughannock Networks |
Intended status: Informational | June 11, 2015 |
Expires: December 13, 2015 |
Assigning Digital Object Identifiers to RFCs
draft-iab-doi-04
The Digital Object Identifier (DOI) is a widely used system that assigns unique identifiers to digital documents that can be queried and managed in a consistent fashion. We describe the way that DOIs are assigned to past and future RFCs.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 13, 2015.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The Digital Object Identifier (DOI) is a widely used system that assigns unique identifiers to digital documents that can be queried and managed in a consistent fashion. The structure of DOIs is defined by ISO 26324:2012 [ISO-DOI] and is implemented by a group of registration agencies coordinated by the International DOI Foundation.
Each DOI is associated with bibliographic metadata about the object, including one or more URIs where the object can be found. The DOI system also provides many features not relevant to RFCs, such as the ability to update the metadata after the DOI is assigned, and for organizations to maintain local caches of metadata, e.g., a university or corporate library that tracks its copies of purchased documents so subsequent users don't buy them again.
The wide use of DOIs suggests that even though RFCs can be downloaded directly from the IETF for free, organizations that use DOIs can have trouble locating documents that don't have DOIs. DOIs with metadata that points to the existing free online RFCs would make RFCs easier to find and use. Some scholarly publishers accept DOIs as references in published documents, and some versions of bibtex can automatically retrieve the bibliographic data for a DOI and format it. Hence DOIs would make RFCs easier to cite.
The benefits of DOIs apply equally to documents from all of the RFC submission streams, so all RFCs are assigned DOIs.
DOIs are an application of the handle system defined by RFCs [RFC3650], [RFC3651], and [RFC3652]. A DOI for an RFC might be
10.17487/rfc1149
The first part of a DOI is the number 10, which means a DOI within the handle system, a dot, and a unique number assigned to a publisher, in this case 17487. This part is the DOI prefix. Following that is a slash and a text string assigned by the publisher, called the DOI suffix.
Since the RFC Editor's series already have numbers, it is straightforward to use suffixes based on the existing numbers, DOIs use the familiar series names and numbers, e.g., rfc1149. (DOIs are case-insensitive.) DOIs are treated as opaque identifiers, so the reliable way to find the DOI for an RFC is to not to guess, but to look it up in the RFC index.
Although the handle system has its own protocol described in [RFC3652], the usual way to look up a DOI is to use web lookup. A proposed "doi:" URN was never widely implemented, so the standard way to look up a DOI is to use the public http proxy at http://dx.doi.org. The sample DOI above could be looked up at:
http://dx.doi.org/10.17487/rfc1149
Whenever a publisher assigns a DOI, it provides the bibliographic metadata for the object (henceforth called a document, since that is what they are in this context) to its registration agency which then makes it available to clients that look up DOIs. The document's metadata is typically uploaded to the registration agency in XML using an HTTP based API. Users can retrieve the metadata by fetching the DOI's URL and using standard HTTP content negotiation to request application/citeproc+json, application/rdf+xml, or other bibliographic formats.
Publishers have considerable flexibility as to what resides at the URI(s) that a DOI refers to. Sometimes it's the document itself, while for commercial publishers it's typically a page with the abstract and bibliographic information, and some way to buy the actual document. Since some RFCs are in multiple formats (e.g., Postscript and text), an appropriate URI is that of the RFC Editor's info page that has the document's abstract and links to the document(s) in various formats. Hence the URI above when requested as text/html redirects to:
https://www.rfc-editor.org/info/rfc1149
More information on the structure and use of DOIs is in the DOI Handbook [DOI-HB].
With DOIs assigned to each RFC, it is useful to include DOI information in the XML bibliography as a "seriesInfo" item, so that rendering engines can display it if desired. Online databases and indexes that include RFCs should be updated to include the DOI, e.g., the ACM Digital Library. (A practical advantage of this is that the DOI would link directly to the RFC Editor, rather than perhaps to a copy of an RFC behind a paywall.)
Since RFCs are immutable, existing RFCs still wouldn't mention their own DOIs within the RFC itself, but putting the DOIs into indexes would still provide value.
There are three phases to assigning DOIs to RFCs: getting a DOI prefix, retroactively assigning DOIs to existing documents, and updating the publication process to assign DOIs as new RFCs are published.
There are ten registration agencies [DOI-RA] that assign DOI prefixes. Most of them serve specialized audiences or limited geographic areas, but there are a few that handle scholarly and technical materials. The RFC Editor chose Crossref, an agency widely used by journal publishers. All registration agencies charge for DOIs to defray the cost of maintaining the metadata databases. The prices are fairly low, on the order of $660/year for membership, 15 cents per document deposit fees for a bulk upload of the backfile (the existing RFCs), and $1/per document to deposit them as they are published.
The RFC Editor's DOI prefix is 10.17487.
Other than paying the deposit fees, assigning DOIs to all of the existing RFCs was primarily a software problem. We updated the RFC Production Center's internal database to include a DOI field for each RFC, changed the schema for the XML index rfc-index.xml to include a DOI field, and updated the script that creates the index to include the DOI for each RFC. A specialized DOI submission script extracted the metadata for all of the RFCs from the XML index and submitted it to the registration agency using the agency's online API.
As RFCs are published, the publication software assigns a DOI to each new RFC. The submission script extracts the metadata for new RFCs from the XML index and submits the information for new RFCs to the registration agency.
The DOI agency requests that documents that are assigned DOIs in turn include DOIs when possible when referring to other organizations' documents. DOIs can be listed using the existing seriesInfo field in the xml2rfc reference entity, and authors are requested provide DOIs for non-RFC documents when possible. The RFC production center might add missing DOIs when it's easy to do so, e.g., when the same reference with a DOI has appeared in a prior RFC, or a quick online search finds the DOI. With DOIs in the xml2rfc reference databases, DOIs in references from citation libraries can appear in the RFCs automatically.
The RFC Style Guide will be updated to describe the rules for including DOIs in the References sections of RFCs.
Since it is usually possible to retrieve the bibliographic information for a document from its DOI (as bibtex can do, described above), it might also be worth adding this feature to xml2rfc, so a reference with only a DOI could be automatically fetched and expanded.
[DOI-HB] | International DOI Foundation, "DOI Handbook", April 2012. |
[DOI-RA] | International DOI Foundation, "DOI Registration Agencies", July 2013. |
[ISO-DOI] | International Organization for Standardization (ISO), "ISO 26324:2012 Information and documentation -- Digital object identifier system", 2012. |
[RFC3650] | Sun, S., Lannom, L. and B. Boesch, "Handle System Overview", RFC 3650, November 2003. |
[RFC3651] | Sun, S., Reilly, S. and L. Lannom, "Handle System Namespace and Service Definition", RFC 3651, November 2003. |
[RFC3652] | Sun, S., Reilly, S., Lannom, L. and J. Petrone, "Handle System Protocol (ver 2.1) Specification", RFC 3652, November 2003. |
Make the rest of everything present tense. Fix typos, note that RSE style guide will include use of DOIs.
Make everything present tense, minor adjustments to reflect reality.
Clarify submission process, multi-document DOIs. Note all streams treated the same. Remove unused reference.
DOI in the xml, not necessarily in the text
Use of DOI in RFCs section.