Internet DRAFT - draft-seantek-sha-uris
draft-seantek-sha-uris
Network Working Group S. Leonard
Internet-Draft Penango, Inc.
Intended Status: Informational September 25, 2015
Expires: March 28, 2016
URI Schemes for SHA-1 and SHA-256
draft-seantek-sha-uris-02
Abstract
This document registers Uniform Resource Identifier schemes for use
with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
SHA-256. The purpose is to identify data streams and content in a
simple, "drop-in" way within the URI infrastructure.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute working
documents as Internet-Drafts. The list of current Internet-Drafts is
at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 28, 2016.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Leonard Informational [Page 1]
Internet-Draft SHA URIs September 2015
1. Introduction
This document registers Uniform Resource Identifier schemes for use
with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
SHA-256. The purpose is to identify binary data streams in a simple,
"drop-in" way within the URI infrastructure. This document also
provides parallel means to identify Internet content or messages.
Frequently Internet-facing applications need to store or transmit
identifiers for wide-ranging types of content, including security
structures (certificates, public keys, authorization tokens),
executable code, and resource manifests (well-defined data formats
that serve to structure data streams, which may be significantly
larger and which are not self-identifiable). These applications
achieve greater interoperability by using a common syntax for these
identifiers; using URIs [RFC3986] suits their purposes well. Some of
the most important properties of URIs are that they are easy to
recognize by humans, and that they can be created using simple "copy-
and-paste" operations.
sha1:2FD4E1C6:7A2D28FC:ED849EE1:BB76E739:1B93EB12;43
sha1:2FD4E1C6-7A2D28FC~ED849EE1_BB76E739.1B93EB12
sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Figure 1: Example SHA URIs
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Data Stream Description, Syntax, and Semantics
The syntax of SHA-1 and SHA-256 URIs consists of the scheme ("sha1:"
or "sha256:"), followed by the hexadecimal encoding of the hash
value. Certain delimiters such as space ("%20"), tab ("%09"), newline
("%0D" or "%0A"), colon ":", period ".", tilde "~", hyphen "-", and
underscore "_" are permitted and do not affect equivalence. The
hexadecimal characters are case-insensitive. In spite of these
leniencies, this specification RECOMMENDS that generators emit
lowercase hexadecimal and use no delimiters. The identifier
identifies any data stream that, when hashed, produces the same hash
value.
Optionally, the scheme may be suffixed with ";" and the length of the
Leonard Informational [Page 2]
Internet-Draft SHA URIs September 2015
data stream in octets (8-bit units). If the data being identified is
a bit stream that is not a multiple of 8 bits, the extra suffix "b"
followed by "1" through "7" is to be appended. When the length suffix
is present, the identifier identifies any data stream of the
specified length that, when hashed, produces the same hash value.
[SHS] is only defined for messages (data streams) less than 2^64 bits
(2^61 octets). Thus, the largest length production that an
application would expect to encounter is ";2305843009213693951b7".
Behaviors beyond this length are undefined.
There is no way to distinguish between "bit streams" (that are
multiples of 8 bits) and "octet streams" as categories; this
ambiguity is intentional because [SHS] does not distinguish between
them. Furthermore, there is no way to truncate the hash value: a
parser that receives too few (or too many) hexadecimal digits MUST
NOT accept the hash value. The URI schemes defined in this document
are not intended specifically for human readability or speakability.
They are, however, designed for human "copy-and-paste-ability": the
ability of a human operator (or automated text-based operator, such
as a simple script) to feed such URIs into processes that can compare
them for exact equality. Based on the labeled input, a process can
generate appropriate representations for human comparison, such as
visual schemes [VISHASH]. The ABNF [RFC5234] is:
sha1uri = "sha1:" ( 40HEXDIG / 40*relaxed ) [ length ]
sha256uri = "sha256:" ( 64HEXDIG / 64*relaxed ) [ length ]
; HEXDIG is from [RFC5234]
; in the relaxed productions, there must be exactly
; 40 HEXDIG for sha1uri, and 64 HEXDIG for sha256uri
relaxed = HEXDIG / ":" / "." / "~" / "-" / "_" /
"%20" / "%0D" / "%0A" / "%09"
length = ";" wholenum [ bits ]
wholenum = "0" / ( %x31-39 *DIGIT )
; DIGIT is from [RFC5234]
bits = "b" ( %x31-37 ) ; 1 - 7
; "b" is not case-sensitive, but lowercase SHOULD be used
Figure 2: ABNF of SHA Data URIs
Other URI schemes such as ni: [RFC6920] include or incorporate
Leonard Informational [Page 3]
Internet-Draft SHA URIs September 2015
hashes. Software may be written to convert URIs between such schemes.
Although the bits of the resource may be equivalent, the URI
semantics may differ. The SHA-1 and SHA-256 URIs in this document
identify binary data streams (i.e., an ordered sequence of bits).
3. Message (Content) Description, Syntax, and Semantics
The URI schemes of Section 2 identify data streams; by design, they
lack means to include metadata such as filenames or Internet media
types. This section defines two additional URI schemes, "sha1msg:"
and "sha256msg:", to identify content that includes Internet message
headers.
The syntax of the message schemes is similar, but not identical, to
the data schemes. Length is optional; if present, it MUST be an
integral number of octets. Similar to mid: URIs [RFC2392], specific
content-parts may be identified by adding a slash "/" and the
(encoded) Content-ID of the part.
Implementations MUST generate and parse productions that conform to
[RFC5322] (Internet message), [RFC6532] (Internationalized Email
Headers), [RFC3030] (Binary MIME), and [RFC2045], [RFC2046],
[RFC2047], and [RFC2231] (collectively, MIME). Technically, they are
octet streams that include Internet message headers, specifically
MIME-conformant headers. The default Content-Transfer-Encoding is
"binary" (as the channel is binary-clean) and the default MIME-
Version is "1.0". These headers MAY be omitted from the production
unless they vary from these defaults.
While it would not be inaccurate to label the Content-Type of such a
production as "message/global", when a SHA Message URI is
dereferenced, the resource takes on the Content-Type of the Content-
Type header of the production, along with all other header metadata.
Any message type (such as HTTP response messages [RFC7230], Netnews
articles [RFC5536], or email messages [RFC5322]) is therefore
suitable for more-or-less direct identification by SHA Message URIs.
Implementers need to note that the encoding of headers is UTF-8
[RFC6532]. HTTP response messages [RFC7230], however, have
historically been ISO-8859-1. If an HTTP response message is used as
the message, characters outside the ASCII range MUST be re-encoded;
they SHOULD be encoded directly to UTF-8, but MAY be encoded via
other means, such as [RFC2047]. If the message is intended to convey
an enclosed HTTP response (as opposed to the content of an HTTP
response), it is appropriate to label the content with Content-Type:
message/http or application/http [RFC7230], where the full HTTP
response--including headers--is the content of the message. In such a
case, re-encoding SHALL NOT occur.
Leonard Informational [Page 4]
Internet-Draft SHA URIs September 2015
SHA Message URIs are similar to mid: and cid: URIs [RFC2392] in that
they uniquely identify a message or content. However, while Message-
IDs are assigned to and embedded into the content, SHA hashes are
intrinsic properties of the message (octet stream). Attempts to embed
the hash into the message would alter it. Furthermore, the semantics
of Message-ID are not defined when the top-level Content-Type is not
"message". In contrast, Content-ID is defined for all types of
content.
The ABNF [RFC5234] is:
; assumes productions in Figure 1 are defined
sha1msguri = "sha1msg:" ( 40HEXDIG / 40*relaxed )
[ msglength ] [ "/" content-id ]
sha256msguri = "sha256msg:" ( 64HEXDIG / 64*relaxed )
[ msglength ] [ "/" content-id ]
msglength = ";" wholenum
content-id = id-left "@" id-right
id-left = pchar-dot-atom-text
pchar-dot-atom-text = 1*pchar-atext *("." 1*pchar-atext)
; omits # % ? ^ ` { | } from RFC 5322 atext
; pct-encoded is from RFC 3986
; TODO: Argh, RFC 6532 extends atext to include UTF-8!
pchar-atext = ALPHA / DIGIT / "!" / "$" / "&" / "'" /
"*" / "+" / "-" / "/" / "=" / "_" / "~" /
pct-encoded
id-right = pchar-dot-atom-text / pchar-no-fold-literal
; does not include @ but this is just because it would
; look weird in the production
; TODO: Argh, should [ *dtext ] include %5B ... %5D,
; or should the dtext be interpreted (and de-quoted-paired)?
; TODO: Argh, RFC 6532 extends dtext to include UTF-8!
pchar-no-fold-literal = unreserved / sub-delims / ":" /
pct-encoded
Figure 2: ABNF of SHA Message URIs
Leonard Informational [Page 5]
Internet-Draft SHA URIs September 2015
4. Encoding and Interoperability
SHA-1 and SHA-256 URIs conform to [RFC3986]. The syntaxes described
in this document specifically conform to the path-rootless production
of the hier-part production of the URI production of Section 3 of
[RFC3986].
The characters representing the binary hash values in such URIs are
limited to hexadecimal, so no further encoding issues are raised
based on the identified content. Beware that the characters
representing a Content-ID in Message URIs reflect UTF-8 encoding
[RFC6532].
Future revisions may cover semantics of other URI-conformant
productions.
5. Security Considerations
The basic sha1 and sha256 URI schemes identify data streams, not
content in the Internet message sense. Supplementary information
about the data stream is expected to be provided by context.
If an application designer wishes to affix metadata (such as an
Internet media type or file modification date) permanently to a data
stream, the metadata and data stream should be concatenated into some
format, and hashed. Section 3 provides sha1msg and sha256msg URI
schemes that identify Internet message content.
Additional URI schemes are proposed for message content (as opposed
to using disambiguating parameters in the data URIs to indicate the
presence of parsable headers) for several reasons:
1. The data URIs are intended to be very lightweight, with minimal
room for implementation errors. The data URIs are usable without
any network access, such as when querying a local data store.
Parsing and interpreting Internet message headers carries a host
of security and interoperability ailments that are described in
the relevant standards and elsewhere; sha1 and sha256 URI
implementations do not need to worry about these hazards.
2. While the sha1 and sha256 URI schemes can (and are routinely
expected to) identify data streams that are canonicalized, the
sha1msg and sha256msg URI schemes are not designed with
canonicalization in mind. Internet message headers do not have
canonical forms.
Cryptographic hashes are designed to map variable length data streams
to fixed length outputs, with four additional properties:
Leonard Informational [Page 6]
Internet-Draft SHA URIs September 2015
1. Random distribution: A change, addition, or deletion of any bit to
the input message (data stream) will affect each and every bit of
the output (hash value) with equal probability.
2. Preimage resistance: Given a hash value, finding a corresponding
message is computationally infeasible.
3. Second-preimage resistance: Given a hash value and a first
message, finding a second message that has the same hash value is
computationally infeasible.
4. Collision resistance: Finding two messages that have the same hash
value is computationally infeasible.
As of the time of this document, reduced rounds of SHA-1 have been
cryptanalyzed [OPTJLOC], prompting the security community to migrate
to new hash algorithms [RFC6194]. SHA-256 has not yet been
cryptanalyzed.
The length qualifier is not intended to provide or augment the basic
security properties of the SHA-1 or SHA-256 hash algorithms. However,
an application SHOULD employ the length qualifier when it knows the
length in advance, because this qualifier constrains the set of
possible data streams.
Collisions exist with any hash algorithm. Consider an arbitrary data
stream of 20 octets and 1 bit (161 bits) and the SHA-1 algorithm. By
the pigeonhole principle, a collision (where two messages produce the
same hash value) must exist, because in the best case, each data
stream of 20 octets (160 bits) maps separately to each one of the
2^160 possible hash values. Therefore, the 2^160 + 1st message must
map to one of the hash values corresponding to at least one prior
message.
The probability that a collision exists for all data streams of 20
octets is virtually certain, as well as for data streams appreciably
less than 20 octets (birthday paradox). However, the probability that
the particular data stream of interest has the same hash as another
(malicious) data stream is harder to calculate. It is possible that a
trivial change to the message will result in the same hash; it is
also possible that no messages of the same length have the same hash.
The probability of the latter clearly diminishes, however, as the set
of candidate messages expands without bound.
The purpose of the optional length qualifier is not simply to reduce
the message space, since for all non-trivial messages of n bits the
message space of 2^n vastly exceeds the collision certainty
threshold. (For illustrative purposes, the author searched for SHA-1
Leonard Informational [Page 7]
Internet-Draft SHA URIs September 2015
collisions in 0-, 1-, 2-, 3-, and 4-octet data streams using a brute
force algorithm; [none were found] [actually the search is ongoing--
it looks like it will take 85 days]). Rather, the purpose is to
detect malicious underflow or overflow conditions before they happen.
If an attacker is feeding the recipient, the recipient may accept
data without bounds, waiting for the hash computation to complete. In
so doing, the attacker will waste resources in addition to
computation time, which may cause latent security errors (such as low
memory or disk conditions) to manifest.
Consider one application of these URI schemes: identifying security-
related objects such as PKIX certificates [RFC5280]. Although
[RFC5280] does not limit the size of a certificate, most certificates
are not appreciably greater than 16 kilobytes, and there are
protocol-related pressures to make certificates much smaller (such as
less than 4 kilobytes) to fit in fewer TCP segments. A certificate-
using application that accepts the URI schemes in this document might
reduce its attack surface by rejecting URIs of unspecified length
once the candidate certificate data exceeds a threshold (e.g., 64
kilobytes); for larger certificates, the application would require
that the length be specified.
A secondary purpose of specifying the length is to resist chosen-
prefix collision attacks [CHOSEN], which are attacks in which an
attacker picks two separate messages, and then appends different
values that results in the concatenated messages having the same hash
value. [CHOSEN] shows that Merkle-Damgard hash functions are
susceptible to this class of attacks. Chosen-prefix collision attacks
were successfully used against the MD5 hash algorithm [HCLASH]; both
SHA-1 and SHA-256 are Merkle-Damgard constructions.
6. IANA Considerations
IANA is requested to register the "sha1", "sha256", "sha1msg", and
"sha256msg" URI schemes in the Uniform Resource Identifier (URI)
Schemes registry using the templates below, which conform to the June
2015 URI Scheme Guidelines [RFC7595].
6.1. Assignment of sha1 URI Scheme
URI scheme name: sha1
Status: Permanent
Applications/protocols that use this URI scheme name:
General applicability. Some examples include security
applications and systems, database and forensic lookup
tools, and distributed peer-to-peer protocols.
Leonard Informational [Page 8]
Internet-Draft SHA URIs September 2015
Contact: Sean Leonard <dev+ietf@seantek.com>
Change controller: IETF
References: This document.
6.2. Assignment of sha256 URI Scheme
URI scheme name: sha256
Status: Permanent
Applications/protocols that use this URI scheme name:
General applicability. Some examples include security
applications and systems, database and forensic lookup
tools, and distributed peer-to-peer protocols.
Contact: Sean Leonard <dev+ietf@seantek.com>
Change controller: IETF
References: This document.
6.3. Assignment of sha1msg URI Scheme
URI scheme name: sha1msg
Status: Permanent
Applications/protocols that use this URI scheme name:
General applicability. Some examples include security
applications and systems, database and forensic lookup
tools, and distributed peer-to-peer protocols.
Contact: Sean Leonard <dev+ietf@seantek.com>
Change controller: IETF
References: This document.
6.4. Assignment of sha256msg URI Scheme
URI scheme name: sha256msg
Status: Permanent
Applications/protocols that use this URI scheme name:
General applicability. Some examples include security
Leonard Informational [Page 9]
Internet-Draft SHA URIs September 2015
applications and systems, database and forensic lookup
tools, and distributed peer-to-peer protocols.
Contact: Sean Leonard <dev+ietf@seantek.com>
Change controller: IETF
References: This document.
9. References
9.1. Normative References
[SHS] National Institute of Standards and Technology, "Secure
Hash Standard", Federal Information Processing Standard
(FIPS) 180-4, March 2012, <http://csrc.nist.gov/
publications/fips/fips180-4/fips-180-4.pdf>.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text",
RFC 2047, November 1996.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded
Word Extensions: Character Sets, Languages, and
Continuations", RFC 2231, November 1997.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission
of Large and Binary MIME Messages", RFC 3030, December
2000.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.
Leonard Informational [Page 10]
Internet-Draft SHA URIs September 2015
[RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
October 2008.
[RFC5536] Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews
Article Format", RFC 5536, November 2009.
[RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized
Email Headers", RFC 6532, February 2012.
[RFC7230] Fielding, R., Ed., and J. Reschke, Ed., "Hypertext
Transfer Protocol (HTTP/1.1): Message Syntax and Routing",
RFC 7230, June 2014.
[RFC7595] Thaler, D., Hansen, T., and T. Hardie, "Guidelines and
Registration Procedures for URI Schemes", BCP 35, RFC
7595, June 2015.
9.2. Informative References
[CHOSEN] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
Collisions for MD5 and Applications", International
Journal of Applied Cryptography Vol. 2, No. 4, 322-359,
2012,
<http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
doi:10.1504/IJACT.2012.048084.
[HCLASH] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
Collisions for MD5 and Colliding X.509 Certificates for
Different Identities", IACR EUROCRYPT 2007, Lecture Notes
in Computer Science 4515 1-22, 2007,
<http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
doi:10.1007/978-3-540-72540-4_1.
[OPTJLOC] Stevens, M., "New Collision Attacks on SHA-1 Based on
Optimal Joint Local-Collision Analysis", EUROCRYPT 2013,
Lecture Notes in Computer Science 7881 245-261, 2013,
<http://marc-stevens.nl/research/papers/EC13-S.pdf>,
doi:10.1007/978-3-642-38348-9_15.
[VISHASH] Hsiao, H., Lin, Y., Studer, A., Studer, C., Wang, K.,
Kikuchi, H., Perrig, A., Sun, H., and B. Yang, "A Study of
User-Friendly Hash Comparison Schemes," Computer Security
Applications Conference, 105-114, 2009,
<http://dl.acm.org/citation.cfm?id=1723224>,
doi:10.1109/ACSAC.2009.20.
[RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource
Locators", RFC 2392, August 1998.
Leonard Informational [Page 11]
Internet-Draft SHA URIs September 2015
[RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
Housley, R., and W. Polk, "Internet X.509 Public Key
Infrastructure Certificate and Certificate Revocation List
(CRL) Profile", RFC 5280, May 2008.
[RFC6194] Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security
Considerations for the SHA-0 and SHA-1 Message-Digest
Algorithms", RFC 6194, March 2011.
[RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B.,
Keranen, A., and P. Hallam-Baker, "Naming Things with
Hashes", RFC 6920, April 2013.
Leonard Informational [Page 12]
Internet-Draft SHA URIs September 2015
Appendix A. Hash-Generating Implementations and Case Choice
Although the URI schemes in this document are not case-sensitive for
the hexadecimal hash component, and permit many arbitrary delimiters,
Section 2 recommends generating lowercase with no delimiters. This
recommendation balances the need for flexible parsing with the need
for consistent output for intuitive inspection.
The following implementations emit lowercase hexadecimal by default:
shasum (Perl)
sha1sum / sha256sum (GNU coreutils)
OpenSSL dgst
Python sha1().hexdigest()
Ruby
Bouncy Castle (Java)
Node.js
Microsoft CryptoAPI applications (including CertUtil and dialogs)
[[TODO: find more lowercase emission]]
The following implementations emit uppercase hexadecimal by default:
Mozilla NSS applications (including Toolkit applications)
Apple Mac OS X Keychain Access
[[TODO: find more uppercase emission]]
Author's Address
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Leonard Informational [Page 13]