json | N. Williams |
Internet-Draft | Cryptonector |
Intended status: Standards Track | August 19, 2014 |
Expires: February 20, 2015 |
JavaScript Object Notation (JSON) Text Sequences
draft-ietf-json-text-sequence-05
This document describes the JSON text sequence format and associated media type.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 20, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The JavaScript Object Notation (JSON) [RFC7159] is a very handy serialization format. However, when serializing a large sequence of values as an array, or a possibly indeterminate-length or never-ending sequence of values, JSON becomes difficult to work with.
Consider a sequence of one million values, each possibly 1 kilobyte when encoded -- roughly one gigabyte. It is often desirable to process such a dataset in an incremental manner: without having to first read all of it before beginning to produce results. Traditionally the way to do this with JSON is to use a “streaming” parser (see Section 1.1), but these are neither widely available, widely used, nor easy to use.
This document describes the concept and format of “JSON text sequences”, which are specifically not JSON texts themselves but are composed of JSON texts. JSON text sequences can be parsed (and produced) incrementally without having to have a streaming parser (nor encoder).
For the purposes of this document we shall classify JSON parsers as follows:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The ABNF [RFC5234] for the JSON text sequence format is as given in Figure 1.
JSON-sequence = *(1*RS JSON-text) RS = <given by RFC5234> JSON-text = <given by RFC7159>
Figure 1: JSON text sequence ABNF
In prose: any number of JSON texts, each preceded by one or more ASCII RS characters. Since ASCII RS is a control character it may only appear in JSON strings in escaped form, and since RS may not appear in JSON texts in any other form, RS unambiguously delimits every JSON text (except the final text in the sequence, which may be delimited by an external end-of-stream marker). Two or more RS characters in sequence do not denote “empty” nor missing JSON texts. JSON text sequence encoders MAY emit an RS after emitting a JSON text.
JSON text sequence parsers SHOULD NOT abort when RS terminates an incomplete JSON text. Such a situation may arise in contexts where append-writes to log files are truncated by the filesystem (e.g., due to a crash, or administrative process termination).
There exist applications which use a format not unlike this one, but using LF instead of RS as the separator, or even using no whitespace unless it is necessary for disambiguating JSON texts (numbers, booleans, null). JSON text sequence parsers MAY permit this, but JSON text sequence encoders SHOULD only use RS as the separator (as described above).
All the security considerations of JSON [RFC7159] apply.
There is no end of sequence indicator. This means that “end of file”, “end of transmission”, and so on, can be indistinguishable from a logical end of sequence. Applications where this matters should denote end of sequence by convention (e.g., Content-Length in HTTP).
The MIME media type for JSON text sequences is application/json-seq.
Type name: application
Subtype name: json-seq
Required parameters: n/a
Optional parameters: n/a
Encoding considerations: binary
Security considerations: See <this document, once published>, Section 3.
Interoperability considerations: Described herein.
Published specification: <this document, once published>.
Applicathttp://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xmlions that use this media type: JSON text sequences have been used in applications written with the jq programming language.
Phillip Hallam-Baker proposed the use of JSON text sequences for logfiles and pointed out the need for resynchronization. James Manger contributed the ABNF for resynchronization. Stephen Dolan created jq, which uses something like JSON text sequences (with LF as the separator between texts on output, and requiring only such whitespace as needed to disambiguate on input).
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC5234] | Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. |
[RFC7159] | Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 7159, March 2014. |