Network Working Group | P. Hallam-Baker |
Internet-Draft | Comodo Group Inc. |
Intended status: Informational | April 10, 2018 |
Expires: October 12, 2018 |
JBCD Container
draft-hallambaker-jbcd-container-01
This document is also available online at http://prismproof.org/Documents/draft-hallambaker-jbcd-container.html .
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 12, 2018.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document describes JBCD Container, a message and file syntax that allows a sequence of data frames to be represented with cryptographic integrity, signature and encryption enhancements to be constructed in an append only format. The format supports data integrity checks using digest chains and Merkle trees. The simplest supports efficient append only write operations and efficient read operations in either the forward or reverse direction. Support for efficient random-access reads may be provided through the use of binary trees or index records appended to the end of the file.
JBCD Container is a message and file syntax that allows a sequence of data frames to be represented with cryptographic integrity, signature, and encryption enhancements to be constructed in an append only format. JBCD Container was developed in response to needs that arose out of the design of the Mathematical Mesh [draft-hallambaker-jsonbcd] . It is built on the binary encodings of JSON data objects, JSON-B and JSON-C [draft-hallambaker-jsonbcd] . These requirements include:
The features supported by JBCD Container include:
Many file proprietary formats are in use that support some or all of these capabilities but only a handful have public, let alone open, standards. JBCD Container is designed to provide a superset of the capabilities of existing message and file syntaxes, including:
Attempting to make use of these specifications in a layered fashion would require at least three separate encoders and introduce unnecessary complexity.
Every data format represents a compromise between different concerns, in particular:
While the cost of storage of all types has declined rapidly over the past decades, so has the amount of data to be stored. JBCD Container represents a pragmatic balance of these considerations for current technology. In particular, since payload volumes are likely to be very large, memory and operational efficiency are considered higher priorities than data volume.
JBCD Container makes use of the following related standards and specifications.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] .
A JBCD Container consists of a series of JBCD Frames. Each Frame consists of a non-empty sequence of JBCD records.
A JBCD frame consists of a forward length indicator, the framed data and a reverse length indicator. The reverse length indicator is written out backwards to allow the frame to be read in the reverse direction:
[[This figure is not viewable in this format. The figure is available at http://prismproof.org/Documents/draft-hallambaker-jbcd-container.html.]]
JBCD Bidirectional Frame
When first reading an existing file, an application will typically read the first frame and the last frame (if the container has more than one frame). This allows the reader to quickly determine the format(s) used by the container, the number of frames in the container and the location of any index frames (if present).
The container format is designed to support creation of write-once and append-only file formats. Each frame SHOULD be written as an atomic operation.
The first frame in a container and the first record in a frame have special roles that are described in this document.
A key objective of the JBCD Container format is that the simplest possible reader be capable of reading any container file albeit with possibly reduced performance.
A Container MAY conform to one or more profiles. Conforming to a profile typically requires a writer to provide additional information when writing a file but does not require a reader to interpret it unless use of a feature (e.g. authentication) that depends on the additional information is required.
The following profiles are currently defined:
The use of Chain and Merkle Trees for integrity checks is described below.
The use of Tree and Index frames is described below.
The following profiles are currently defined:
Header Encoding format
Archive Index
Frame headers MAY contain content metadata parameters.
Payload data MAY be signed JSON Web Signature [RFC7515] .
Signatures are specified by the Signatures parameter in the content header. The data that the signature is calculated over is defined by the typ parameter of the Signature as follows.
If the typ parameter is absent, the value Payload is implied.
A frame MAY contain multiple signatures created with the same signing key and different typ values.
The use of signatures over chain and tree digest values permit multiple frames to be validated using a single signature verification operation.
Payload data MAY be encrypted using JSON Web Encryption [RFC7516] .
The payload data is encrypted under a session key whose encrypted value is specified by the EncryptedKey entry. The encryption key for the EncryptedKey is in turn specified by key exchange information provided in a JWE Recipients object that is placed in the frame header of either the frame that contains the encrypted payload data or an earlier frame whose file position is specified by a ExchangePosition entry.
Use of EncryptedKey entries allows a container to contain multiple data segments encrypted using the same key agreement parameters.
Complete
Incremental
An index may be appended to an existing file at any time. Since the use of bidirectional frames makes reading the last record is as efficient as reading the first, the last record in an indexed file is usually either the index itself or a pointer to the last index.
An index frame consists of a frame header
Use of index frames provides read access to any record in the file in O(1) operations but attempting to compiling a complete index with every write incurs an O(n) penalty on write for both operations and storage. Accordingly, random read access to a file while it is being written is better supported using an index tree.
Binary search is supported by means of the TreePosition parameter specified in the FrameHeader. This parameter specifies the value of the immediately preceding apex.
Calculation of the immediately preceding apex is most easily described by representing the array index in binary with base of 1 (rather than 0). An array index that is a power of 2 (2, 4, 8, 16, etc.) will be the apex of a complete tree. Every other array index has the value of the sum of a set of powers of 2 and the immediately preceding apex will be the value of the next smallest power of 2 in the sum.
For example, to find the immediately preceding apex for frame 5, we add 1 to get 6. 6 = 4 + 2, so we ignore the 2 and the preceding frame is 4.
The values of Tree Position are shown for the first 8 frames in figure xx below:
[[This figure is not viewable in this format. The figure is available at http://prismproof.org/Documents/draft-hallambaker-jbcd-container.html.]]
Merkle Tree Integrity check
An algorithm for efficiently calculating the immediately preceding apex is provided in Appendix C.
Contains a table of index, position pairs pointing to prior locations in the file.
Contains a list of IndexMeta entries. Each entry contains a metadata description and a list of frame indexes (not positions) of frames that match the description.
Frame sequences in a JWC container MAY be protected against a frame insertion attack by means of a digest chain, a binary Merkle tree or both.
A digest chain is simple to implement but can only be verified if the full chain of values is known. Appending a frame to the chain has O(1) complexity but verification has O(n) complexity:
[[This figure is not viewable in this format. The figure is available at http://prismproof.org/Documents/draft-hallambaker-jbcd-container.html.]]
Hash chain integrity check
The value of the chain digest for the the first frame (frame 0) is H(IV+H(Payload0)), where IV is an initialization vector consisting of a string of zero bytes and payloadn is the sequence of payload data bytes for frame n
The value of the chain digest for frame n is H(H(Payloadn-1 +H(Payloadn)), where A+B stands for concatenation of the byte sequences A and B.
The tree index mechanism describe earlier may be used to implement a binary Merkle tree. The value TreeDigest specifies the apex value of the tree for that node.
Appending a frame to the chain has O(log2n) complexity provided that the container format supports at least the binary tree index. Verifying a chain has O(log2 n) complexity, provided that the set of necessary digest inputs is known.
To calculate the value of the tree digest for a node, we first calculate the values of all the sub trees that have their apex at that node and then calculate the digest of that value and the immediately preceding local apex.
This format is as for the singleton container except that Frame 0 may be followed by any number of content frames
The container format is intended to be the basis of future work to support:
The container format is designed to be capable of supporting efficient random access to frames in containers considerably larger than the processing memory of the host computer without the need to pre-load indexes.
A combination of the following strategies is being considered:
While storage devices capable of storing tends of Tb of data with RAID redundancy are commonplace, it is generally desirable that there be at least as many CPU cores as disks. Thus, partitioning of data sets across multiple hosts becomes desirable for throughput even if a single host could handle the storage requirement.
In the types of applications envisaged in the Mesh, almost every data set may be reduced to collections that are bound to a single account. While it is obviously desirable that a user's mail messages (for example) be replicated across multiple machines to provide fault tolerance, fragmenting the copies of this data set across multiple machines should be avoided unless the data volumes are so large as to require it.
The encoding scheme is 64-bit clean throughout and thus supports containers and frames as large as 18 petabytes. Larger data volumes could be supported through use of 128-bit integer pointers but even if the technology to support such data volumes were developed, it is highly unlikely anyone would want to represent data sets anywhere near this size in a serial format.
Due to limitations in the design of the encryption schemes that may be used (e.g. AES-GCM), the maximum encrypted frame size is 64GB. While this is not currently a major concern for encryption of individual data files, it is easy to see situations in which an archive of encrypted files could exceed that amount. One possibility would be to define a modification to AES -GCM which caused the encryption key to be incremented by a fixed amount after encrypting a certain amount of data, though this might well present implementation challenges unless the maximum data block size was chosen to be deliberately small so as to force code paths to be exercised. Another possibility would be to limit the size of encrypted data frames by requiring the frame pointer to be no larger than 32 bits and require larger data items to be represented as a sequence of frames.
The container format deliberately avoids support for concurrent write operations. Should this be desirable, some mechanism must be provided to cache write fragments to an intermediate file and then consolidate them for writing to the master log.
The data payloads in all the following examples are identical, only the authentication and/or encryption is different.
For conciseness, the wire format is omitted for examples after the first, except where the data payload has been transformed, (i.e. encrypted).
Here the simple container:
f4 2c f0 2a 7b 0a 20 20 22 49 6e 64 65 78 22 3a 20 30 2c 0a 20 20 22 43 6f 6e 74 61 69 6e 65 72 54 79 70 65 22 3a 20 22 4c 69 73 74 22 7d 2c f4 f5 01 40 f0 0f 7b 0a 20 20 22 49 6e 64 65 78 22 3a 20 31 7d f1 01 2c 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b ... 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 40 01 f5 [EOF]
Figure 1
The header values are:
Frame 0
{ "ContainerHeader": { "Index": 0, "ContainerType": "List"}}
Figure 2
Frame 1
{ "ContainerHeader": { "Index": 1}}
Figure 3
Frame 0
{ "ContainerHeader": { "Index": 0, "PayloadDigest": " z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg_SpIdNs6c5H0NE8XYXysP-DGNKHfuwv Y7kxvUdBeoGlODJ6-SfaPg", "ChainDigest": " FEHy24Y6cLModDXWH31kVc2a3TdhjXPooKHpLAb2JbsO1YQnJolmowXAYHhkOGY0 kg3jrKNTjds0myf4Dw1sdg"}}
Figure 4
Frame 1
{ "ContainerHeader": { "Index": 1, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "ChainDigest": " 7JaijhBvQUOjBiO1_Zt6NtJil8iB0rW9HeM_4iYooc_AaAfutlF0LLVY6PO7INB- eztypyEqVzgMil9JkjtRGQ"}}
Figure 5
Frame 2
{ "ContainerHeader": { "Index": 2, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "ChainDigest": " wJZFYd61nntCJ0Bv80l6-Cn-sR2u3iD0zCRjOLxje8dsKIuUnP4X1mgeNenNDBdX ysrFs3vVAqkC-hfSAPF0Aw"}}
Figure 6
Frame 3
{ "ContainerHeader": { "Index": 3, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "ChainDigest": " RORNZxIcM23cZtXPh9vuHhkgiGa_O4a0ZiU0ku2OK4dB974clvh5F0VZsX7IwVBa yAG2nDTdqhyZ-qOnTRiumA"}}
Figure 7
Frame 0
{ "ContainerHeader": { "Index": 0, "TreePosition": 0, "PayloadDigest": " z4PhNX7vuL3xVChQ1m2AB9Yg5AULVxXcg_SpIdNs6c5H0NE8XYXysP-DGNKHfuwv Y7kxvUdBeoGlODJ6-SfaPg", "TreeDigest": " FEHy24Y6cLModDXWH31kVc2a3TdhjXPooKHpLAb2JbsO1YQnJolmowXAYHhkOGY0 kg3jrKNTjds0myf4Dw1sdg"}}
Figure 8
Frame 1
{ "ContainerHeader": { "Index": 1, "TreePosition": 0, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " fPTYagAvSDP_755jpFUs-Wq6cgvtr5vrFwW-E12vsrbq1ReNsGzp-V2XqzFPiWaU ckACPjegD7ioe1bGzxoWQQ"}}
Figure 9
Frame 2
{ "ContainerHeader": { "Index": 2, "TreePosition": 263, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " 7fyKKQNLGEeHX1oCsV8NtOdPm615SkDnM1vkcexx2tOuVd5kkZIdLdsWRCLic9lu TSsUN6D6_-c-8ftbhL9dJg"}}
Figure 10
Frame 3
{ "ContainerHeader": { "Index": 3, "TreePosition": 263, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " b9ca9Pv-6fxUg-V3ulOhhRngxebkZCxyDmWhQUYeADmSvvPbjMcNTUJxdDpKlMPr DBInSWMChinsc5s9Tv4byw"}}
Figure 11
Frame 4
{ "ContainerHeader": { "Index": 4, "TreePosition": 1398, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " g1hQeWJgDlNoTSGfMb6NhQk5-p6iaAI2_GiAhBM-F2Cp3UvJ7AR_bC2Drp5YElGX AzC2K5qZ30l7j2D-jqykFw"}}
Figure 12
Frame 5
{ "ContainerHeader": { "Index": 5, "TreePosition": 1398, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " p89BhjJAgMMoSrOmot6oaBGa6Dgz-zogZjZ9mm1Iz4yLHxm97nWAIBaZFiC1XkuC oP-tr3tag_rHoZhgQV8_PQ"}}
Figure 13
Frame 6
{ "ContainerHeader": { "Index": 6, "TreePosition": 2537, "PayloadDigest": " 8dyi62d7MDJlsLm6_w4GEgKBjzXBRwppu6qbtmAl6UjZDlZeaWQlBsYhOu88-ekp NXpZ2iY96zTRI229zaJ5sw", "TreeDigest": " HEA7EeUGfSjZqjmN3PDp0FVbnixBBXfSQAYm_rNPHVWJVMDu3SfmxKvN_yBTtMXk -Jad9cyXDKsecLNHLyoQWg"}}
Figure 14
public long PreviousFrame (long Frame) {
Figure 15
long x2 = Frame + 1;
Figure 16
long d = 1;
Figure 17
Figure 18
while (x2 > 0) {
Figure 19
if ((x2 & 1) == 1) {
Figure 20
return x2 == 1 ? (d / 2) - 1 : Frame - d;
Figure 21
}
Figure 22
d = d * 2;
Figure 23
x2 = x2 / 2;
Figure 24
}
Figure 25
return 0;
Figure 26
}
Figure 27
[draft-hallambaker-jsonbcd] | Hallam-Baker, P., "Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, JSON-D", Internet-Draft draft-hallambaker-jsonbcd-09, September 2017. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC7159] | Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March 2014. |
[RFC7515] | , "[Reference Not Found!]" |
[RFC7516] | Jones, M. and J. Hildebrand, "JSON Web Encryption (JWE)", RFC 7516, DOI 10.17487/RFC7516, May 2015. |
[BLOCKCHAIN] | Chain.com, "Blockchain Specification" |
[RFC5652] | Housley, R., "Cryptographic Message Syntax (CMS)", STD 70, RFC 5652, DOI 10.17487/RFC5652, September 2009. |
[ZIPFILE] | PKWARE Inc, "APPNOTE.TXT - .ZIP File Format Specification", October 2014. |