Network Working Group | A. Rundgren |
Internet-Draft | Independent |
Intended status: Standards Track | B. Jordan |
Expires: August 20, 2019 | Symantec Corporation |
S. Erdtman | |
Spotify AB | |
February 16, 2019 |
JSON Canonicalization Scheme (JCS)
draft-rundgren-json-canonicalization-scheme-05
Cryptographic operations like hashing and signing requires that the original data does not change during serialization or parsing. By applying the rules defined by the JSON Canonicalization Scheme (JCS), data provided in JSON [RFC8259] format can be exchanged "as is", while still being usable by secure cryptographic operations. JCS achieves this by building on the strict serialization formats for JSON primitives defined by ECMAScript [ES6], constraining JSON data to the I‑JSON [RFC7493] subset, and through a platform independent property sorting scheme.
The intended audiences of this document are JSON tool vendors, as well as designers of JSON based cryptographic solutions.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 20, 2019.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Cryptographic operations like hashing and signing requires that the original data does not change during serialization or parsing. One way of accomplishing this is converting the data into a format that has a simple and fixed representation like Base64Url [RFC4648], which is how JWS [RFC7515] addressed this issue.
Another solution is to create a canonical version of the data, similar to what was done for the XML Signature [XMLDSIG] standard. The primary advantage with a canonicalizing scheme is that data can be kept in its original form. This is the core rationale behind JCS. Put another way: by using canonicalization a JSON Object may remain a JSON Object even after being signed which simplifies system design, documentation and logging.
To avoid "reinventing the wheel", JCS relies on serialization of JSON primitives compatible with ECMAScript (aka JavaScript) beginning with version 6 [ES6], hereafter referred to as "ES6".
Seasoned XML developers recalling difficulties getting signatures to validate (usually due to different interpretations of the quite intricate XML canonicalization rules as well as of the equally extensive Web Services security standards), may rightfully wonder why JCS would not suffer from similar issues. The reasons are twofold:
In summary the JCS specification describes how serialization of JSON primitives compliant with ES6 combined with a deterministic property sorting scheme can be used for creating "Hashable" representations of JSON data intended for consumption by cryptographic methods.
JCS is compatible with some existing systems relying on JSON canonicalization such as JWK Thumbprint [RFC7638] and Keybase [KEYBASE].
For potential uses outside of cryptography see [JSONCOMP].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section describes the different issues related to creating a canonical JSON representation, and how they are addressed by JCS.
In order to serialize JSON data, one needs data that is adapted for JSON serialization. This is usually achieved by:
Irrespective of the method used, the data to be serialized MUST be compatible with I‑JSON [RFC7493], which implies the following:
An additional constraint is that parsed JSON String data MUST NOT be altered during subsequent serializations. For more information see Appendix E.
Note: although the Unicode standard offers a possibility combining certain characters into one, referred to as "Unicode Normalization" (https://www.unicode.org/reports/tr15/), such functionality MUST be delegated to the application layer.
The following subsections describe the steps required for creating a canonical JSON representation of the data elaborated on in the previous section.
Appendix A shows sample code for an ES6 based canonicalizer, matching the JCS specification.
Whitespace between JSON elements MUST NOT be emitted.
Assume that you parse a JSON object like the following:
{ "numbers": [333333333.33333329, 1E30, 4.50, 2e-3, 0.000000000000000000000000001], "string": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/", "literals": [null, true, false] }
If you subsequently serialize the parsed data using a serializer compliant with ES6's JSON.stringify(), the result would (with a line wrap added for display purposes only), be rather divergent with respect to representation of data:
{"numbers":[333333333.3333333,1e+30,4.5,0.002,1e-27],"string": "EURO$\u000f\nA'B\"\\\\\"/","literals":[null,true,false]}
The reason for the difference between the parsed data and its serialized counterpart, is due to a wide tolerance on input data (as defined by JSON [RFC8259]), while output data (as defined by ES6), has a fixed representation. As can be seen by the example, numbers are subject to rounding as well.
The following subsections describe serialization of primitive JSON data types according to JCS. This part is identical to that of ES6.
The JSON literals null, true, and false present no challenge since they already have a fixed definition in JSON [RFC8259].
For JSON String data (which includes JSON Object property names as well), each Unicode code point MUST be serialized as described below (also matching Section 24.3.2.2 of [ES6]):
Finally, the resulting sequence of Unicode code points MUST be enclosed in double quotes (").
Note: some JSON systems permit the use of invalid Unicode data including "lone surrogates" (e.g. U+DEAD). Since this leads to interoperability issues including broken signatures, occurrences of such data MUST cause the JCS algorithm to terminate with an error indication.
JSON Number data MUST be serialized according to Section 7.1.12.1 of [ES6] including the "Note 2" enhancement.
Due to the relative complexity of this part, the algorithm itself is not included in this document. However, the specification is fully implemented by for example Google's V8 [V8]. The open source Java implementation mentioned in Appendix G uses a recently developed number serialization algorithm called Ryu [RYU].
ES6 builds on the IEEE-754 [IEEE754] double precision standard for representing JSON Number data. Appendix B holds a set of IEEE-754 sample values and their corresponding JSON serialization.
Note: since NaN (Not a Number) and Infinity are not permitted in JSON, occurrences of such values MUST cause the JCS algorithm to terminate with an error indication.
Although the previous step indeed normalized the representation of primitive JSON data types, the result would not qualify as "canonical" since JSON Object properties are not in lexicographic (alphabetical) order.
Applied to the sample in Section 3.2.2, a properly canonicalized version should (with a line wrap added for display purposes only), read as:
{"literals":[null,true,false],"numbers":[333333333.3333333, 1e+30,4.5,0.002,1e-27],"string":"EURO$\u000f\nA'B\"\\\\\"/"}
The rules for lexicographic sorting of JSON Object properties according to JCS are as follows:
When a JSON Object is about to have its properties sorted, the following measures MUST be adhered to:
"" "a" "aa" "ab"
The rationale for basing the sorting algorithm on UTF-16 code units is that it maps directly to the string type in ECMAScript (featured in Web browsers and Node.js), Java and .NET. Systems using another internal representation of string data will need to convert JSON property name strings into arrays of UTF-16 code units before sorting. The conversion from UTF-8 or UTF-32 to UTF-16 is defined by the Unicode [UNICODE] standard.
Note: for the purpose of obtaining a deterministic property order, sorting on UTF-8 or UTF-32 encoded data would also work, but the result would differ and thus be incompatible with this specification. However, in practice property names rarely go outside of 7-bit ASCII making it possible sorting on the UTF-8 byte level and still be compatible with JCS. If this is a viable option or not depends on the environment JCS is supposed to be used in.
Finally, in order to create a platform independent representation, the result of the preceding step MUST be encoded in UTF-8.
Applied to the sample in Section 3.2.3 this should yield the following bytes here shown in hexadecimal notation:
7b 22 6c 69 74 65 72 61 6c 73 22 3a 5b 6e 75 6c 6c 2c 74 72 75 65 2c 66 61 6c 73 65 5d 2c 22 6e 75 6d 62 65 72 73 22 3a 5b 33 33 33 33 33 33 33 33 33 2e 33 33 33 33 33 33 33 2c 31 65 2b 33 30 2c 34 2e 35 2c 30 2e 30 30 32 2c 31 65 2d 32 37 5d 2c 22 73 74 72 69 6e 67 22 3a 22 e2 82 ac 24 5c 75 30 30 30 66 5c 6e 41 27 42 5c 22 5c 5c 5c 5c 5c 22 2f 22 7d
This data is intended to be usable as input to cryptographic methods.
This document has no IANA actions.
It is vital performing "sanity" checks on input data to avoid overflowing buffers and similar things that could affect the integrity of the system.
Building on ES6 Number serialization was originally proposed by James Manger. This ultimately led to the adoption of the entire ES6 serialization scheme for JSON primitives.
Other people who have contributed with valuable input to this specification include Scott Ananian, Richard Gibson, Bron Gondwana, John-Mark Gurney, Mike Jones, Mike Miller, Mark Nottingham, Mike Samuel, Jim Schaad, Robert Tupelo-Schneck and Michal Wadas.
[ES6] | Ecma International, "ECMAScript 2015 Language Specification" |
[IEEE754] | IEEE, "IEEE Standard for Floating-Point Arithmetic", August 2008. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC7493] | Bray, T., "The I-JSON Message Format", RFC 7493, DOI 10.17487/RFC7493, March 2015. |
[RFC8174] | Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. |
[RFC8259] | Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, December 2017. |
[UNICODE] | The Unicode Consortium, "The Unicode Standard, Version 10.0.0" |
[JSONCOMP] | A. Rundgren, ""Comparable" JSON - Work in progress" |
[KEYBASE] | "Keybase" |
[NODEJS] | "Node.js" |
[OPENAPI] | "The OpenAPI Initiative" |
[RFC4648] | Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. |
[RFC7515] | Jones, M., Bradley, J. and N. Sakimura, "JSON Web Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May 2015. |
[RFC7638] | Jones, M. and N. Sakimura, "JSON Web Key (JWK) Thumbprint", RFC 7638, DOI 10.17487/RFC7638, September 2015. |
[RYU] | Ulf Adams, "Ryu floating point number serializing algorithm" |
[V8] | Google LLC, "Chrome V8 Open Source JavaScript Engine" |
[XMLDSIG] | W3C, "XML Signature Syntax and Processing Version 1.1" |
Below is an example of a JCS canonicalizer for usage with ES6 based systems:
//////////////////////////////////////////////////////////// // Since the primary purpose of this code is highlighting // // the core of the JCS algorithm, error handling and // // UTF-8 generation were not implemented // //////////////////////////////////////////////////////////// var canonicalize = function(object) { var buffer = ''; serialize(object); return buffer; function serialize(object) { if (object === null || typeof object !== 'object' || object.toJSON != null) { ///////////////////////////////////////////////// // Primitive type or toJSON - Use ES6/JSON // ///////////////////////////////////////////////// buffer += JSON.stringify(object); } else if (Array.isArray(object)) { ///////////////////////////////////////////////// // Array - Maintain element order // ///////////////////////////////////////////////// buffer += '['; let next = false; object.forEach((element) => { if (next) { buffer += ','; } next = true; ///////////////////////////////////////// // Array element - Recursive expansion // ///////////////////////////////////////// serialize(element); }); buffer += ']'; } else { ///////////////////////////////////////////////// // Object - Sort properties before serializing // ///////////////////////////////////////////////// buffer += '{'; let next = false; Object.keys(object).sort().forEach((property) => { if (next) { buffer += ','; } next = true; /////////////////////////////////////////////// // Property names are strings - Use ES6/JSON // /////////////////////////////////////////////// buffer += JSON.stringify(property); buffer += ':'; ////////////////////////////////////////// // Property value - Recursive expansion // ////////////////////////////////////////// serialize(object[property]); }); buffer += '}'; } } };
The following table holds a set of ES6 compatible Number serialization samples, including some edge cases. The column "IEEE‑754" refers to the internal ES6 representation of the Number data type which is based on the IEEE-754 [IEEE754] standard using 64-bit (double precision) values, here expressed in hexadecimal.
|====================================================================| | IEEE-754 | JSON Representation | Comment | |====================================================================| | 0000000000000000 | 0 | Zero | |--------------------------------------------------------------------| | 8000000000000000 | 0 | Minus zero | |--------------------------------------------------------------------| | 0000000000000001 | 5e-324 | Min pos number | |--------------------------------------------------------------------| | 8000000000000001 | -5e-324 | Min neg number | |--------------------------------------------------------------------| | 7fefffffffffffff | 1.7976931348623157e+308 | Max pos number | |--------------------------------------------------------------------| | ffefffffffffffff | -1.7976931348623157e+308 | Max neg number | |--------------------------------------------------------------------| | 4340000000000000 | 9007199254740992 | Max pos integer (1) | |--------------------------------------------------------------------| | c340000000000000 | -9007199254740992 | Max neg integer (1) | |--------------------------------------------------------------------| | 4430000000000000 | 295147905179352830000 | ~2**68 (2) | |--------------------------------------------------------------------| | 7fffffffffffffff | | NaN (3) | |--------------------------------------------------------------------| | 7ff0000000000000 | | Infinity (3) | |--------------------------------------------------------------------| | 44b52d02c7e14af5 | 9.999999999999997e+22 | | |--------------------------------------------------------------------| | 44b52d02c7e14af6 | 1e+23 | | |--------------------------------------------------------------------| | 44b52d02c7e14af7 | 1.0000000000000001e+23 | | |--------------------------------------------------------------------| | 444b1ae4d6e2ef4e | 999999999999999700000 | | |--------------------------------------------------------------------| | 444b1ae4d6e2ef4f | 999999999999999900000 | | |--------------------------------------------------------------------| | 444b1ae4d6e2ef50 | 1e+21 | | |--------------------------------------------------------------------| | 3eb0c6f7a0b5ed8c | 9.999999999999997e-7 | | |--------------------------------------------------------------------| | 3eb0c6f7a0b5ed8d | 0.000001 | | |--------------------------------------------------------------------| | 41b3de4355555553 | 333333333.3333332 | | |--------------------------------------------------------------------| | 41b3de4355555554 | 333333333.33333325 | | |--------------------------------------------------------------------| | 41b3de4355555555 | 333333333.3333333 | | |--------------------------------------------------------------------| | 41b3de4355555556 | 333333333.3333334 | | |--------------------------------------------------------------------| | 41b3de4355555557 | 333333333.33333343 | | |--------------------------------------------------------------------| | becbf647612f3696 | -0.0000033333333333333333 | | |--------------------------------------------------------------------|
Notes:
Since the result from the canonicalization process (see Section 3.2.4), is fully valid JSON, it can also be used as "Wire Format". However, this is just an option since cryptographic schemes based on JCS, in most cases would not depend on that externally supplied JSON data already is canonicalized.
In fact, the ES6 standard way of serializing objects using JSON.stringify() produces a more "logical" format, where properties are kept in the order they were created or received. The example below shows an address record which could benefit from ES6 standard serialization:
{ "name": "John Doe", "address": "2000 Sunset Boulevard", "city": "Los Angeles", "zip": "90001", "state": "CA" }
Using canonicalization the properties above would be output in the order "address", "city", "name", "state" and "zip", which adds fuzziness to the data from a human (developer or technical support), perspective. Canonicalization also converts JSON data into a single line of text, which may be less than ideal for debugging and logging.
There are several issues associated with the JSON Number type, here illustrated by the following sample object:
{ "giantNumber": 1.4e+9999, "payMeThis": 26000.33, "int64Max": 9223372036854775807 }
Although the sample above conforms to JSON [RFC8259], applications would normally use different native data types for storing "giantNumber" and "int64Max". In addition, monetary data like "payMeThis" would presumably not rely on floating point data types due to rounding issues with respect to decimal arithmetic.
The established way handling this kind of "overloading" of the JSON Number type (at least in an extensible manner), is through mapping mechanisms, instructing parsers what to do with different properties based on their name. However, this greatly limits the value of using the JSON Number type outside of its original somewhat constrained, JavaScript context. The ES6 JSON object does not support mappings to JSON Number either.
Due to the above, numbers that do not have a natural place in the current JSON ecosystem MUST be wrapped using the JSON String type. This is close to a de-facto standard for open systems. This is also applicable for other data types that do not have direct support in JSON, like "DateTime" objects as described in Appendix E.
Aided by a system using the JSON String type; be it programmatic like
var obj = JSON.parse('{"giantNumber": "1.4e+9999"}'); var biggie = new BigNumber(obj.giantNumber);
or declarative schemes like OpenAPI [OPENAPI], JCS imposes no limits on applications, including when using ES6.
Due to the limited set of data types featured in JSON, the JSON String type is commonly used for holding subtypes. This can depending on JSON parsing method lead to interoperability problems which MUST be dealt with by JCS compliant applications targeting a wider audience.
Assume you want parse a JSON object where the schema designer assigned the property "big" for holding a "BigInteger" subtype and "time" for holding a "DateTime" subtype, while "val" is supposed to be a JSON Number compliant with JCS. The following example shows such an object:
{ "time": "2019-01-28T07:45:10Z", "big": "055", "val": 3.5 }
Parsing of this object can accomplished by the following ES6 statement:
var object = JSON.parse(JSON-data-featured-as-a-string);
After parsing the actual data can be extracted which for subtypes also involve a conversion step using the result of the parsing process (an ECMAScript object) as input:
... = new Date(object.time); // Date object ... = BigInt(object.big); // Big integer ... = object.val; // JSON/JS number
Canonicalization of "object" using the sample code in Appendix A would return the following string:
{"big":"055","time":"2019-01-28T07:45:10Z",val:3.5}
Although this is (with respect to JCS) technically correct, there is another way parsing JSON data which also can be used with ES6 as shown below:
// Currently required to make BigInt JSON serializable BigInt.prototype.toJSON = function() { return this.toString(); }; // JSON parsing using a "stream" based method var object = JSON.parse(JSON-data-featured-as-a-string, (k,v) => k == 'time' ? new Date(v) : k == 'big' ? BigInt(v) : v );
If you now apply the canonicalizer in Appendix A to "object", the following string would be generated:
{"big":"55","time":"2019-01-28T07:45:10.000Z","val":3.5}
In this case the string arguments for "big" and "time" have changed with respect to the original, presumable making an application depending on JCS fail.
The reason for the deviation is that in stream and schema based JSON parsers, the original "string" argument is typically replaced on-the-fly by the native subtype which when serialized, may exhibit a different and platform dependent pattern.
That is, stream and schema based parsing MUST treat subtypes as "pure" (immutable) JSON String types, and perform the actual conversion to the designated native type in a subsequent step. In modern programming platforms like Go, Java and C# this can be achieved with moderate efforts by combining annotations, getters and setters. Below is an example in C#/Json.NET showing a part of a class that is serializable as a JSON Object:
// The "pure" string solution uses a local // string variable for JSON serialization while // exposing another type to the application [JsonProperty("amount")] private string _amount; [JsonIgnore] public decimal Amount { get { return decimal.Parse(_amount); } set { _amount = value.ToString(); } }
In an application "Amount" can be accessed as any other property while it is actually represented by a quoted string in JSON contexts.
Note: the example above also addresses the constraints on numeric data implied by I-JSON (the C# "decimal" data type has quite different characteristics compared to IEEE-754 double precision).
Since the JSON Array construct permits mixing arbitrary JSON elements, custom parsing and serialization code must normally be used to cope with subtypes anyway.
The optimal solution is integrating support for JCS directly in JSON serializers (parsers need no changes). That is, canonicalization would just be an additional "mode" for a JSON serializer. However, this is currently not the case. Fortunately JCS support can be performed through externally supplied canonicalizer software, enabling signature creation schemes like the following:
A compatible signature verification scheme would then be as follows:
A canonicalizer like above is effectively only a "filter", potentially usable with a multitude of quite different cryptographic schemes.
Using a JSON serializer with integrated JCS support, the serialization performed before the canonicalization step could be eliminated for both processes.
The following Open Source implementations have been verified to be compatible with JCS:
There are (and have been) other efforts creating "Canonical JSON". Below is a list of URLs to some of them:
In contrast to JCS which is a serialization scheme, the listed efforts build on text level JSON to JSON transformations.
The JCS specification is currently developed at: https://github.com/cyberphone/ietf-json-canon.
The most recent "editors' copy" can be found at: https://cyberphone.github.io/ietf-json-canon.
JCS source code and test data is available at: https://github.com/cyberphone/json-canonicalization