Internet-Draft The ARK URI scheme October 2020
Castelán Castro Expires 4 April 2021 [Page]
Workgroup:
Internet Engineering Task Force
Internet-Draft:
draft-ark-uri-scheme-latest
Published:
Intended Status:
Informational
Expires:
Author:
M.X. Castelán Castro
17beta

The ARK URI scheme

Abstract

This specification defines the Archival Resource Key (ARK) URI scheme that is especially suitable for persistent identifiers.

Persistent identifiers for latest version of this document: https://n2t.net/ark:21206/10015.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 April 2021.

Table of Contents

1. Introduction

The ARK (Archival Resource Key) identifier scheme is flexible, dereferenceable and especially suitable for persistent identifiers. A founding principle of the design of the ARK scheme is that persistence is a matter of service not conferred by any particular identifier scheme; ARK is designed to ease the task of achieving persistence. This document specifies the technical details of the ARK system as an URI and IRI scheme and does not elaborate at length on the design rationale of the ARK system; for that see [Kunze_ARK].

2. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [req_words].

The terms "identifier", "resource", "representation", "information resource" and "non-information resource" are used as described in [webarch]. For conciseness we use the term "referent" to mean the resource identified by an identifier. Note that identifiers are strings of characters, representations are strings of octets paired with an interpretation and resources are abstract concepts like the book "Alice's Adventures in Wonderland" by Lewis Carroll or Zermelo-Fraenkel set theory.

The notation used to describe syntax is that described in [ABNF] extended as follows: A literal preceeded by "~" matches any string that is equivalent when corresponding uppercase and lowercase codepoints in the range U+0000 to U+007F are taken as equivalent. The syntax is augmented with set difference indicated by the operator "-" whose precedence is between Alternative and Concatenation. All the syntactic terms defined in [ABNF] are referenced here.

3. Concepts

"ARK" stands for Archival Resource Key. The URI scheme defined in this document is named "ARK scheme". Every identifier that uses this scheme is called an "ARK". The ARK scheme is designed to ease the creation and maintenance of persistent and dereferenceable resource identifiers. An ARK may be used either to identify an information resource or a non-information resource. There are 3 forms of ARKs.

The following is an example of a Basic ARK: ark:12345/ax20315. "ark" is the URI scheme "12345" is the NAAN and "ax20315" is the Name. The Embedded ARK https://n2t.net/ark:12345/ax20315 corresponds to the above Basic ARK; being a web URI, it can potentially be accessed by any web browser without need for specific support for the ARK scheme.

A founding principle of the design of ARK is that persistence is a matter of the service provided by the resolver servicing a persistent identifier not conferred by the identifier scheme itself. Users MUST NOT automatically assume that any published ARK is a persistent identifier. Publishers of ARK that are commited to keep an ARK persistent SHOULD make this clear to the reader. For example, a publisher MAY state "Please use the persistent identifer ark:12345/ax20315 to reference this page".

Every piece of information included in an identifier is subject to become invalid or obsolete with time. An opaque identifier is one that includes no manifest information about what resource it identifiers. When a NAA allocates ARKs that are intended as persistent identifiers, those ARKs are RECOMMENDED to be opaque. The URI that an ARK resolves to (if any) MAY be non-opaque.

3.1. Name Assigning Authority (NAA)

ARKs are assigned by NAAs (Name Assigning Authority). Each NAA has an unique NAAN (Name Assigning Authority Number), a string of characters that is included within all the ARKs it allocates. The NAAN is included in ARKs to partition the ARK namespace and avoid collisions between ARKs assigned by different NAAs. The NAA assigns the Name of an ARK that identifies an individual resource. An agent that wants to obtain a NAAN in order to assign ARK identifiers MUST register with the ARK Maintenance Agency [ARK_agency]. A NAA MAY transfer control over its NAAN space to a successor organization to take care of the ARKs it assigned.

The NAAN 12345 is reserved in perpetuity to be used in examples and NAAN 99999 is reserved in perpetuity for invalid ARKs. Applications that handle ARKs SHOULD NOT handle 12345 in any special way and SHOULD recognize NAAN 99999 as being invalid; the same rationale as in [resrv_domains] applies.

The ARK Maintenance Agency keeps the authoritative public registry of all NAA registered along with relevant associated data. See [registry]. As of the time of writing of the present document, the official registry is in a simple plain-text-based format described in a comment at the beginning of the registry itself.

Once a NAAN is assigned to a real organization that requested it (as opposed to an assignment that was done on a technical mistake or a misunderstanding) this assignment is permanent and MUST be kept in the registry for as long as the ARK system operates, which is to say, into the indefinite future.

Every NAA MUST contact the ARK Maintenance Agence when necessary to keep up to date the prefix of its authoritative resolver and in the same way SHOULD keep up to date the other information kept in the registry. A NAA MUST notify the ARK Maintenance Agency if it expects to stop existing or stop operating a resolver for its NAAN indefinitely (i.e.: notification is not required for temporary downtime of its resolver).

3.2. Qualifiers

An ARK MAY include a qualifier after the Name. The qualifier can have a ComponentPath and a VariantPath. The ComponentPath is delimited by slashes and indicates hierarchical structure. For example, the ARK ark:12345/ax20315/edition1/chapter5 has VariantPath /edition1/chapter5. Publishing an ARK with a ComponentPath has the implication that the ARKs obtained by truncating the last segment of the ComponentPath and the previous slash is a container for the untruncated ARK. This semantic implication extends recursively until the ARK with no ComponentPath. Thus in the above example the hiearchical structure (from most general resource to least general) is

  1. ark:12345/ax20315
  2. ark:12345/ax20315/edition1
  3. ark:12345/ax20315/edition1/chapter5

These implied hierarchical semantics do not extend beyond the ARK with no ComponentPath. A string with no Name part like ark:12345 is not an ARK. If a NAA wants to allocate a ARK to refer to itself, it MUST do so by allocating a Name under its NAAN like for any other resource.

The VariantPath is delimited by periods and indicates a language version, media type version, or similar variant of the Basic ARK obtained by stripping the VariantPath. The order of components within a VariantPath is meaningless. ARKs that differ only in order of VariantPath components identify the same resource. Example: The following ARKs are variants of ark:12345/ax20315:

This specification does not define the concrete semantics of the VariantPath; a NAA SHOULD document the semantic of the ARK it assigns and SHOULD make this documentation accessible to users that dereference the ARK (for example, by including an hyperlink that points to the policy of the NAA in assinging ARKs which in turn describes the semantics of the VariantPath).

The use of qualifiers is entirely optional. It is RECOMMENDED that a Basic ARK without qualifiers is used to identify a generic resource (independent of media type and perhaps language). A qualifier MAY be used to identify specific variants that could be short-lived as the preferred media type and languages change in the span of decades and centuries.

3.3. Name Mapping Authority (NMA)

A Name Mapping Authority is an agent that provides a dereference service for a set of ARKs; this service MAY be an ARK resolver operated by the NMA (see below) or it MAY be any other suitable means. Example: A NMA could operate a library that lends physical copies of the books identified by the ARKs it services.

Ideally all NMAs that service a given ARK would provide the same service. This could fail to be the case in practice because of technical limitations or political reasons, for example:

3.4. Inflections

Inflections are variations of a Basic ARK obtained by adding a URI query. The question sign that introduces the URI query is part of the inflection. If there is no URI query, the inflection is considered the empty string. The inflections of an ARK are meant to provide information and services related to its referent. The ARK system reserves the inflection ?info to request metadata about the referent of an ARK and the association of the ARK with its referent, including any relevant persistence statement.

Example: If ark:12345/ax20315 is the ARK of a PDF document then ark:12345/ax20315?info and https://n2t.net/ark:12345/ax20315?info can be expected by the user to resolve to a web page with metainformation about the PDF document: Title, author, date of creation and last modification, a statement of persistence (if applicable) by the NAA or NMAH and others. The latter ARK can be entered in a web browser by an user seeking an assurance that the ARK https://n2t.net/ark:12345/ax20315 will resolve indefinitelly to this PDF document in order to use the ARK to cite the document in print.

A NMAH SHOULD implement the ?info inflection with the semantics described in this document. ?info MUST NOT be used for a different purpose. If a NMAH provides additional inflections, it SHOULD publicly document what they are and their meanings and make this information available to the users of its ARKs.

? and ?? are reserved for a possible standard meaning in a future revision of this specification. In prior drafts the inflection ? was recommended to provide metadata about the referent of the ARK and the inflection ?? was recommended to provide information about the assignment of the ARK to its referent including a persistence statement (if applicable). It is RECOMMENDED that a ARK resolver gives the inflections ? and ?? the same semantics as ?info until and if it is redefined in a future version.

4. Syntax

The syntax of the ARK scheme is described in the context of Internationalized Resource Identifiers (IRIs). For ARKs as URIs the only difference that only ASCII characters are allowed.

The core of the ARK system are Basic ARKs. Extended ARKs are the set of all strings allowed under the ark IRI scheme. Every Basic ARK is an Extended ARK. The syntax of the URI and IRI systems allow identifiers with the ark scheme that do not met the extended-ARK production rule; those character strings are not ARKs of any type; we call them pseudo-ARKs.

The production rules ifragment, iquery, iunreserved, pct-encoded and scheme are taken from [IRIs].

ARK = Extended-ARK / Embedded-ARK
Extended-ARK = Basic-ARK ["?" iquery] ["#" ifragment]
Basic-ARK = ~"ark:" ["/"] NAAN "/" Name [Qualifier]
NAAN = 1*(Base29-char / pct-encoded)
Base29-char = DIGIT / "b" / "c" / "d" / "f" /
              "g" / "h" / "j" / "k" / "m" /
              "n" / "p" / "q" / "r" / "s" /
              "t" / "v" / "w" / "x" / "z"
Name = 1*Name-char
Name-char = ARK-unreserved / "-" / pct-encoded / iunreserved
ARK-unreserved = DIGIT / ALPHA / "=" / "~" /
                 "*" / "+" / "@" / "_" / "$"
Qualifier = ComponentPath / VariantPath / ComponentPath VariantPath
ComponentPath = 1*("/" 1*Name-char)
VariantPath = 1*("." 1*Name-char)

It is RECOMMENDED that applications do not generate Extended ARKs longer than 255 Unicode codepoints. Where a Basic ARK or an Extended ARK is expected, applications MUST NOT impose a limit on length of less than 255 codepoints (that is, Basic ARKs and Extended ARKs of 255 codepoints or shorter MUST NOT be rejected by any conforming application on the basis of length). Applications MAY support only URIs and therefore reject Extended ARKs that include non-ASCII characters.

4.1. Embedded ARKs

Extended ARKs MAY be embedded as a part of another IRI (URIs are a subset of IRIs). The main application is to couple an extended ARK with a HTTP resolver to make the ARK dereferenceable by ordinary web tools without any additional requirement on the part to the user. Embedded ARKs MUST match the embedded-ARK production rule below. The production rules iauthority, isegment, isegment-nz, scheme and query are taken from [IRIs].

Embedded-ARK = prefix Extended-ARK
prefix = scheme ":" ["/" isegment-nz *("/" isegment)] /
         scheme "://" iauthority *("/" isegment)

An Extended ARK combined with the prefix of an ARK resolver is an Embedded ARK. Other specification MAY extend the set of Embedded ARKs. The set of Embedded ARK (as defined by the aggregate of all specifications of the Internet) is thus open ended.

5. Non-ASCII characters

As an IRI scheme, the ARK scheme allows for non-ASCII Unicode characters. It is RECOMMENDED that ARKs minted for new resources use only ASCII characters. Note that ARK normalization always percent-encodes non-ASCII characters. Security issues related to Unicode are mentioned in Section 8.

ARK normalization always percent-encodes non-ASCII characters, thus leaving a longer identifier. For example, ARK normalization maps the ARK ark:12345/4бф3х1 to ark:12345/4%D0%B1%D1%843%D1%851.

5.1. Rationale

Percent-encoded characters have long been allowed in the ARK system. Internationalized Resource Identifiers (IRIs) allow non-ASCII characters to be used transparently in IRIs via a mapping to percent-encoded characters. Applications widely implement this mapping; in specific, most web browsers. Forbidding non-ASCII characters in ARKs would have been a moot point because browsers would still allow non-ASCII characters in pseudo-ARKs via the transparent IRI-to-URI mapping. Thus, a decision was made to allow non-ASCII characters in this specification and recommend against them. The only way to reliably disallow non-ASCII characters where ARKs are expected would have been to forbid percent-encoded characters outside ASCII so that the IRI-to-URI mapping always yields invalid ARKs. However this would have broken backward compatibility with previous versions of the ARK scheme which allow percent-encoded characters without restriction.

5.2. Avoiding Latin characters

It may be desirable to avoid Latin characters in a a text written in a different script. In principle, resources in a fixed language that uses a script other than the Latin script could be assigned an opaque persistent identifier with characters in their native script. For example, a scientific journal that publishes articles in Russian language could assign persistent identifiers like ark:12345/4бф3х1 to its articles. This comes with an inconvenience for users of non-Cyrillic scripts; they will have more difficulty manually entering this ARK. Therefore, it is RECOMMENDED to avoid this practice.

Instead it is RECOMMENDED that a NAA that wants to avoid Latin characters in its identifiers mints ARKs from only decimal characters ("0"-"9"). Decimal characters are present in most keyboard layouts and are familiar to people around the world more so than the Latin script. The Latin characters in the "ark" substring at the start of ARKs is unavoidable as long as IRIs are used; IRIs do not allow for non-ASCII characters in the scheme. In hypertext, ARKs can be published with the "ARK" part transliterated into the native script, with the rest of the identifier linked to an Embedded ARK. For example, in Russian one can write "АРК: 12345/437719" where 12345/437719 is an hyperlink to https://n2t.net/12345/437719.

6. Normalization

Normalization is defined for Extended ARKs. Given an Extended ARK, the following algorithm produces an Exended ARK in normal form. The domain of this algorithm is only Extended ARKs as described in this specification. This algorithm is explicitly undefined for strings other than Extended ARKs.

  1. Convert the scheme to lowercase.
  2. If the ARK starts with "ark:/" then replace that portion with "ark:".
  3. Transform the substring other than the initial "ark:", the query (incl. question mark) and fragment (incl. hash symbol) as follows:

    1. Decode all percent-encoded characters that after decoding would match the ARK-unreserved production rule.
    2. Delete all instances of "-" (U+002D).
    3. Percent-encode all non-ASCII characters.
    4. If there is a VariantPath then separate it into individual matches of suffix, sort by lexicographical order according to codepoint without decoding any remaining percent-encoded characters, delete identical suffixes and join the remaining suffixes in that order; substitute the original VariantPath with this result.
  4. In the query and fragment: Decode all percent-encoded characters that match the unreserved rule in [URIs]. Percent-encode all non-ASCII characters.

Note that the normalization algorithm decodes all percent-encoded instances of "-" in the step of URI syntax-based normalization. Those hyphens are subsequently removed. Therefore, Basic ARKs that differ only by insertion or removal of "-" are equivalent.

ARKs are said to be equivalent if they have the same normal form.

Theorem 1. ARK equivalency is an equivalence relation.

Theorem 2. An extended ARK is a Basic ARK if and only if its normal form is a Basic ARK.

The set of Extended ARKs that have the same normal form identify exactly the same resource (this is part of the ARK system independent of any NAA-specific policy in assigning ARKs). Agents MUST NOT declare conflicting assignations for equivalent ARKs; doing so is an error.

7. Resolution

A resolver is an application accessible under a dereferenceable URI scheme that provides a suitable representation for ARKs under its scope. Resolvers MAY use any suitable URI scheme. This specification only describes HTTP resolvers. Other specifications MAY describe additional methods to resolve an ARK. A resolution request is the process of using an ARK resolver to dereference an ARK.

Every NAA MUST declare at time of registration at least 1 prefix under which it intends to run an authoritative ARK resolver for its NAAN. Every NAA MUST send a request to the ARK Maintenance Agency [ARK_agency] when necessary to keep the set of its authoritative resolvers up to date. The official list of allocated NAANs and their authoritative resolvers is [registry].

Any method of ARK resolution SHOULD be able to distinguish whether the representation obtained is a representation of the resource identified by the ARK or a representation related to the resource identified by the ARK. This distinction is made because it is necessary for resources referenced in the Semantic Web. See [cool_URIs].

7.1. HTTP resolvers

A HTTP resolver is one that is accessible through the http or https scheme. The prefix of an HTTP resolver MUST match the http-resolver-prefix production rule. A HTTP resolver MUST serve HTTP requests for URIs beginning with its corresponding prefix.

Given the semantics of the HTTP protocol, resolution is only directly applicable to Extended ARKs with no URI fragment. The fragment, if present, has semantics given by the media type of the response obtained (if any) for resolving the corresponding ARK without the fragment.

ARKs that contain non-ASCII characters must be percent-encoded before resolution because the request-uri in the HTTP protocol only allows URIs (not proper IRIs). The constraints on length limitations apply to the URI resulting after this percent-encoding.

http-resolver-prefix = (~"http" / ~"https") "://"
                       authority path-abempty
request-uri = prefix "/" request-core ["?" query]

URI queries are used for inflections; their semantics and requirements are described in Section 3.4.

Clients of the HTTP resolver MUST set request-core in the HTTP request to a Basic ARK. Servers MAY respond with an error status code for requests with a request-core that is not a Basic ARK.

A HTTP ARK resolver MUST treat equally all resolution requests for Extended ARKs with the same normal form with the exception that it MAY reject some Extended ARKs on the basis that they are too long.

The official resolver for the ARK system has prefix equal to https://n2t.net and is operated by [ARK_agency].

7.1.1. Allowable length limits

The following ARKs MUST NOT be rejected on the basis that they are too long:

  1. Any Extended ARK with a total length less or equal than 255 characters.
  2. Any Extended ARK that has a Basic ARK part with length less or equal than 255 characters whose inflection is empty, ?, ?? or ?info.

When a HTTP ARK resolver declines to serve a request for resolution on the basis of length it MUST reply with the HTTP status code 414.

Note that the length limit is with respect to the length of Extended ARKs, not the Embedded ARKs used to query an ARK resolver. Internal processing may differ provided these constraints is satisfied. Example: A resolution request for ark:12345/c3700931 must be treated the same as if it was for ark:12345/c370-0931 or ark:/12-345/c37-009-31--. A HTTP ARK resolver MAY return an error code for requests to resolve something that is not an Extended ARK.

7.1.2. Semantics of server response

If the request is for an Embedded ARK with no inflection, the reply of the resolver is to be interpreted according to the semantics of HTTP with the considerations specific to the ARK system described in this section. Note that these considerations do not apply in the case of an inflected ARK because then the request is not for the referent of the ARK, but for associated metadata instead as described in section Section 3.4.

  • The HTTP status code 301, 302, 307 and 308 signify that the referent of the ARK is available at the URI indicated by the Location header. If the Vary header is present in the response, then this location in specific to the parameters indicated by the semantics of the Vary header. A HTTP ARK resolver MUST use HTTP status code 302 or 307 instead of 301 or 308 because ARK resolves provide a temporary location for the referent of the ARK, not a permanent relocation.
  • The HTTP status code 303 signifies that a resource related to the referent of the ARK is tentatively available at the URI indicated by the Location header.
  • The HTTP status code 404 signifies that this resolver does not possess a location for the referent of the ARK.
  • The resolver SHOULD reply with HTTP status code 400 if the request-part is not a Basic ARK and the server is unwilling to process it. Note that this status code is not specific for the aforementioned condition; the HTTP semantics allow it to be used for other types of errors unrelated to the ARK system.

When resolution of an ARK results in a chain of redirects (HTTP status code 301, 302, 303, 307 and 308 MUST be recognized as redirects) followed by a success response which is not a redirect (HTTP status code 200, 204, 206, 226 and 304 MUST be recognized as success), if any redirection has status code 303, then the resource at the final location is considered related to the ARK resolved, otherwise the resource at the final location is the referent of the ARK resolved and the representation obtained is a representation of this referent. When a chain of redirects is followed by an error (HTTP status codes 400-599 MUST be recognized as error) this specification does not specify any semantics; therefore, it is unspecified whether the error is of the referent of the ARK or of the resolution of the ARK. Additional responses MAY be recognized as redirect and success or handled the same way as HTTP status code 303 provided this is consistent with the relevant specifications.

This specification does not define any semantics for HTTP request with an URI corresponding to a HTTP ARK resolver that is not an Embedded ARK. ARK resolvers MAY provide other services under request URIs that are not Embedded ARKs.

7.1.3. Reference resolution algorithm

The following algorithm MAY be used to resolve an ARK using a HTTP resolver. Other algorithms -whether custom or described in a specification- MAY be used instead. If a standard defines an additional resolution procedure it SHOULD follow the same intent as the reference resolution algorithm changing only technical details necessary to adapt to the respective protocols it employs.

The reference algorithm presented below is designed to distinguish between information resources and non-information resources identified by an ARK by making use of HTTP status codes as described in [cool_URIs].

  • The Extended ARK to be resolved.
  • The prefix of the ARK resolver. If none is specified by the user, the client that resolves the ARK SHOULD default to https://n2t.net.
  • How many redirection are to be tolerated. MUST be at least 5.
  • The HTTP method to use for resolution. MUST be either GET or HEAD.

The description of the ARK reference resolution algorithm follows.

Set URI to the Embedded ARK formed with the prefix of the ARK resolver specified and the Extended ARK to be resolved. Set max_redirects to the number specified by the user. Set method to the symbol GET or HEAD as specified by the user. Set state to the symbol direct. Then while max_redirects is 0 or more:

  • If URI is not an address with http or https scheme the algorithm ends with success. Otherwise send a HTTP request to the resource identified by URI using HTTP method method; if sending this request fails then return failure.
  • Dispatch based on the HTTP status code obtained:

    • If the HTTP status code was 301, 302, 303, 307 or 308 then set URI to the URI indicated in the Location HTTP header. If that HTTP header is missing or not a valid URI, then return failure. If the HTTP status was 303 then set state to the symbol related.
    • If the HTTP status code is 200, 204, 206, 226 or 304 then the algorithm finishes with success. If state is direct then the ARK is located at URI and the representation obtained is a representation of the resourced identified by the ARK. If state is related then URI identifies a resource related to the resource identified by the ARK and the representation obtained is related to the resource identified by the ARK.
    • If the HTTP status code is in the range 400 to 599 then return failure.
    • If the HTTP status code does not match any rule bove then the behavior is implementation-defined.
  • Decrement max_redirects by 1.

If the above loop ends because max_redirects reached a negative value, return failure.

7.1.4. Official ARK HTTP resolver

The ARK Maintenance Agency [ARK_agency] operates an ARK HTTP resolver at https://n2t.net/. This resolver can resolve any ARK that is globally resolvable by redirecting to the local ARK resolver as stated in [registry].

8. Security considerations

General security considerations of communication within computer networks apply. Ideally resolvers SHOULD be reachable via a secure means. For the case of HTTP resolvers this means using HTTP over TLS. The possibility of connecting securely to an HTTP resolver SHOULD be announced by using the https URI scheme in the NMAH. If the resolver is also available under plain HTTP directly over TCP then it SHOULD use HTTP Strict Transport Security (see [HSTS]) to direct users to contact the server securely in the future.

The ARK system allows for resolution of identifiers. Many of the security implications of DNS apply. As with any resolution system, a malicious agent can operate an ARK resolver and return undesired responses. Using any ARK resolver requires trust that it will return an honest answer or error message and not a malicious answer analogous to DNS hijacking. Using the ARK system in any way requires some trust in the ARK Maintenance Agency. There is little additional trust required in using the official ARK resolver which is operated by the ARK Maintenance Agency. It is RECOMMENDED that users use the official ARK resolver to resolve ARKs for which there is no particular reason to use another resolver.

8.1. Non-ASCII characters

The ARK scheme allows non-ASCII Unicode characters in the part assigned by NAAs. See [Unicode_security] and Section 8 in [IRIs] for security implications. The NAAN is always limited to ASCII characters. If a NAA allows a non-trusted party to assign ARKs under its NAAN it SHOULD limit the character set allowed to avoid homoglyph attacks and misplaced formatting characters. An application that displays ARKs can avoid most Unicode-related security problems by displaying ARKs in normalized form which only uses ASCII characters. Applications that expect an ARK and allow non-ASCII characters MUST be prepared for inputs with control or formatting characters inserted maliciously and either reject the input or percent-encode the problematic characters. The production rules of IRIs forbid characters in the range U+0000-U+001F, U+007F-U+009F which are control characters.

8.1.1. Bidi formatting characters

The IRI specification states in prose ([IRIs], p. 18): "IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, LRE, RLE, LRO, RLO, and PDF).". The set of bidirectional formatting characters is open-ended; therefore it is not possible to forbid all future bidirectional formatting characters in a fixed syntax other than by forbidding unallocated codepoints. For example, U+2066 (left-to-right isolate) and U+2067 (right-to-left isolate) were added in Unicode 6.3.0 after the IRI standard was written. Applications MUST avoid passing characters with unknown semantics to other applications. E.g: a program with a command-line interface that handles IRIs should avoid sending unescaped bidi formatting characters in IRIs to the terminal becuase they can garble the following text, unrelated to the IRI. Web software MAY place IRIs that can potentially contain formatting characters inside a bidi XHTML element to limit the effect of bidi formatting characters to the IRI.

9. URI scheme registration request

Scheme name: ark
Status: permanent
Applications/protocols that use this scheme name: Existing ARK resolvers including the central resolver https://n2t.net/. Existing NAAs registered in [registry].
Contact: Mario Xerxes Castelan Castro (Ksenia) regarding this specification; The ARK Maintenance Agency [ARK_agency] regarding the ARK system in general.
Change controller: ARK Maintenance Agency [ARK_agency].
References: This document.

10. References

[ABNF]
Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications", RFC 5234, DOI 10.17487/RFC5234, , <https://doi.org/10.17487/RFC5234>.
[ARK_agency]
Agency, A. M., "ARK Maintenance Agency web site", <https://arks.org/>.
[cool_URIs]
W3C, "Cool URIs for the Semantic Web", , <https://www.w3.org/TR/cooluris/>.
[HSTS]
Hodges, J., Jackson, C., and A. Barth, "HTTP Strict Transport Security (HSTS)", RFC 6797, DOI 10.17487/RFC6797, , <https://doi.org/10.17487/RFC6797>.
[IRIs]
Dürst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, , <https://doi.org/10.17487/RFC3987>.
[Kunze_ARK]
Kunze, K., "The ARK Identifier Scheme", , <https://n2t.net/ark:13030/c7cv4br18>.
[req_words]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, DOI 10.17487/RFC2119, , <https://doi.org/10.17487/RFC2119>.
[registry]
Agency, A. M., "Name Assigning Authority Number (NAAN) Registry", <https://n2t.net/e/pub/naan_registry.txt>.
[resrv_domains]
Cheshire, S. and M. Krochmal, "Special-Use Domain Names", RFC 6761, DOI 10.17487/RFC6761, , <https://doi.org/10.17487/RFC6761>.
[Unicode_security]
Davis, M. and M. Suignard, "Unicode Technical Report #36: Unicode Security Considerations, revision 15", , <http://www.unicode.org/reports/tr36/tr36-15.html>.
[URIs]
Berners-Lee, T., Fielding, R.T., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", RFC 3986, DOI 10.17487/RFC3986, , <https://doi.org/10.17487/RFC3986>.
[webarch]
W3C, "Architecture of the World Wide Web, Volume One", , <https://www.w3.org/TR/webarch/>.

Author's Address

Mario Xerxes Castelán Castro (Ksenia)
17beta