Internet DRAFT - draft-thierry-bulk-barf
draft-thierry-bulk-barf
Network Working Group P. Thierry
Internet-Draft Thierry Technologies
Intended status: Experimental August 7, 2013
Expires: February 8, 2014
BULK ARchive Format
draft-thierry-bulk-barf-00
Abstract
This specification describes a BULK format to pack together
independent pieces of data and metadata about them.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 8, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Thierry Expires February 8, 2014 [Page 1]
Internet-Draft BULK-BARF August 2013
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Format overview . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Conventions and Terminology . . . . . . . . . . . . . . . . 4
2. Guaranteed Backward Compatibility . . . . . . . . . . . . . . . 4
3. BULK archive namespace . . . . . . . . . . . . . . . . . . . . 5
3.1. Packing . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Stacking . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Payload with metadata . . . . . . . . . . . . . . . . . . . 6
3.4. BULK stream embedding . . . . . . . . . . . . . . . . . . . 6
3.5. Compressed data . . . . . . . . . . . . . . . . . . . . . . 6
3.6. Encrypted data . . . . . . . . . . . . . . . . . . . . . . 6
4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1. Normative References . . . . . . . . . . . . . . . . . . . 7
5.2. Informative references . . . . . . . . . . . . . . . . . . 7
Thierry Expires February 8, 2014 [Page 2]
Internet-Draft BULK-BARF August 2013
1. Introduction
1.1. Rationale
There are plenty of archives formats currently in use, from widely-
used and repurposed formats like ZIP (used for generic file archives
as well as Java deployment, ebooks and office documents) to legacy
formats like ARC or Z through moderately used formats enjoying a
stable niche, like tar, RAR or StuffIt.
A few archive formats actually make reuse of existing ones. Many
archive formats developped nowadays actually reuse ZIP without
modification and just dictate the tree structure inside the ZIP file.
The Unix world has long had a tradition of separation of concern,
thus using different formats for archiving (ar or tar) and
compression (gzip, bzip2, lzma or now xz), with compressed archives
named after the combination (foo.ar.gz, bar.tar.bz2, etc.). Debian
packages are actually ar files containing little uncompressed
metadata and a couple of compressed tar files.
But the problem remains that all these binary formats all define
completely ad hoc syntaxes, sometimes incredibly optimized but
narrowly tailored to their specific requirements. Many leave little
room for future extension, or in a contrived way (many formats are
actually extended by abusing an unused metadata field and cramming a
new ad hoc format in it).
Some of these formats have a few fixed- or limited-length fields that
became or will become obsolete in time. The ar format, for example,
suffers from the Year 2038 problem and cannot store long file names.
Various implementations have used different incompatible extensions
to store long file names.
So we propose yet another archive format, that uses an efficient but
extensible syntax, so that the format cannot fail to be extended or
modified for new use cases or constraints.
1.2. Format overview
A BARF file is basically a set of metadata fields followed by data
entries. Each entry consists of a set of metadata fields followed by
its content. The interesting property of using BULK is that any
portion of that structure is dynamic (no fixed metadata fields, and
an entry without metadata is serialized as its content, as with BULK,
the entry and its content cannot be confused with each other) and
anything can be enclosed in a BULK structure to add features.
Metadata fields are just a BULK expression, which means that any ad
Thierry Expires February 8, 2014 [Page 3]
Internet-Draft BULK-BARF August 2013
hoc or standard BULK vocabulary can be used in an efficient way as
metadata. Mutually incompatible metadata vocabularies could even be
stored alongside each other for legacy support, if need be.
The archive file can be compressed or encrypted by an outside tool
(producing a foo.barf.gz or bar.pgp file, for example), but so can
any individual BULK expression. The entire archive, internally to
the file, can be a BULK compression or encryption form, as well as
any metadata set, metadata field or entry. Almost any extension and
optimization can be retrofitted in this structure in a backward-
compatible way, like checksums, digital signature or access offsets
for random access.
This extends the use case of BARF archives outside of archives for
multiple files. An extensible image format could be based on a BARF
structure, allowing seamless transition from a simple format to a
full-featured one, whereas existing formats usually add complex
extensions that fail to be widely adopted (to add support for layers,
transparency, different compression or metadata). Although BARF
would probably be ill-suited for playable audio and video, it would
still provide a perfect fit for the storage of raw audio and video
for editing programs.
1.3. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Literal numerical values are provided in decimal or hexadecimal as
appropriate. Hexadecimal literals are prefixed with "0x" to
distinguish them from decimal literals.
BULK bytes sequences and expressions are described with the same
conventions than used in the BULK 1.0 specification [BULK1]
2. Guaranteed Backward Compatibility
This specification defines the notion of Guaranteed Backward
Compatibility (GBC). It applies to forms that carry a main payload
with additional metadata. A form that obeys the rules of GBC has the
type GBCForm.
A GBCForm has the shape "( Ref {arguments} {next}:Expr )". If the
payload of a GBCForm is readable without knowledge of that form, then
{next} MUST be that payload. Otherwise, {next} MUST be nil.
For example, a GBC-compliant checksum form could have the shape "(
Thierry Expires February 8, 2014 [Page 4]
Internet-Draft BULK-BARF August 2013
crc32c {crc}:Word32 {payload} )", where {crc} is the checksum of the
byte sequence {payload}. On the other hand, a GBC-compliant
encryption form, where obviously the payload is unreadable without
proper knowledge of the form, could have the shape "( encrypt
{payload} nil )".
3. BULK archive namespace
The archive namespace (mnemonic: "barf") is an official namespace
identified by the UUID
<urn:uuid:8beba7c6-c65d-5256-a2da-3763513953f3> (BULK, "Stack 'em.
Pack 'em. And rack 'em."). It provides a standard way to pack one
or more data elements together with metadata.
3.1. Packing
name "0x1" (mnemonic: "pack" )
shape "( pack {metadata}:Expr {entries} )"
This packs archive entries together as a form. {metadata} holds
metadata about the whole pack. In the context of {metadata},
"rdf:this-resource" designates the whole pack.
3.2. Stacking
name "0x2" (mnemonic: "stack" )
shape "( stack {metadata}:Expr {entries-metadata} ) {entries}"
This stacks archive entries together as a sequence, for the cases
where it is not appropriate for entries to belong to a single
expression. {metadata} holds metadata about the whole stack. In the
context of {metadata}, "rdf:this-resource" designates the whole
stack. {entries-metadata} MUST be a sequence of expressions of length
equal or inferior to the number of expressions in {entries}. Each
expression in {entries-metadata} holds metadata about a single entry
of the stack. In the context of such a metadata expression,
"rdf:this-resource" designates the described stack entry. By
default, the expression number N in {entries-metadata} describes the
expression number N in {entries}.
When the stack form is in the abstract yield, this has the property
that if the last entry is an Array, the actual payload constitutes
the end of the BULK stream. This can make it possible for BULK-
unaware programs to read and/or write that payload easily.
Stacking also makes the addition of a metadata-carrying entry or a
Thierry Expires February 8, 2014 [Page 5]
Internet-Draft BULK-BARF August 2013
metadata-less entry an append-only operation.
3.3. Payload with metadata
name "0x3" (mnemonic: "describe" )
shape "( describe {metadata} {payload}:Expr )"
This form associates arbitrary metadata with an arbitrary payload.
It is intended to constitute most entries in BARF archives. In the
context of {metadata}, "rdf:this-resource" designates the payload.
Type: "GBCForm"
3.4. BULK stream embedding
name "0x4" (mnemonic: "bulk-stream" )
shape "( bulk-streamm {payload} )"
This form makes it possible to include a complete BULK stream without
modification, as {payload}.
Type: "GBCForm"
3.5. Compressed data
name "0x5" (mnemonic: "compressed" )
shape "( compressed {method}:Expr {payload}:Array nil )"
This form encapsulates a compressed payload. This specification
doesn't define names to express a compression method.
Type: "GBCForm"
3.6. Encrypted data
name "0x6" (mnemonic: "encrypted" )
shape "( encrypted {method}:Expr {payload}:Array nil )"
This form encapsulates an encrypted payload. This specification
doesn't define names to express an encryption method.
Type: "GBCForm"
Thierry Expires February 8, 2014 [Page 6]
Internet-Draft BULK-BARF August 2013
4. Security Considerations
5. References
5.1. Normative References
[BULK1] Thierry, P., "Binary Uniform Language Kit 1.0",
draft-thierry-bulk-02 (work in progress), August 2013.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
5.2. Informative references
[ISO8601] "ISO 8601:2004 Data elements and interchange formats --
Information interchange -- Representation of dates and
times", 2004.
Author's Address
Pierre Thierry
Thierry Technologies
EMail: pierre@nothos.net
Thierry Expires February 8, 2014 [Page 7]