Internet DRAFT - draft-arcmedia-type
draft-arcmedia-type
Network Working Group S. Leonard
Internet-Draft Penango, Inc.
Intended Status: Standards Track M. Kerwin
Expires: April 30, 2015 October 27, 2014
The Archive Primary Media Type for File Archives
draft-arcmedia-type-00
Abstract
This document defines a new primary content-type to be known as
"archive", which defines a fundamental type of content with unique
presentational, hardware, and processing aspects.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 30, 2015.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Leonard & Kerwin Expires April 30, 2015 [Page 1]
Internet-Draft The archive Media Type for File Archives October 2014
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 2
2. Definition of an archive . . . . . . . . . . . . . . . . . . . 2
3. Consultation Mechanisms . . . . . . . . . . . . . . . . . . . 5
4. Encoding and Transport . . . . . . . . . . . . . . . . . . . . 5
5. Common Required and Optional Parameters . . . . . . . . . . . 7
6. Split Archives . . . . . . . . . . . . . . . . . . . . . . . . 7
7. Fragment Identifier Syntax . . . . . . . . . . . . . . . . . . 8
8. Piped-Composite Type Suffix Syntax . . . . . . . . . . . . . . 8
9. Security Considerations . . . . . . . . . . . . . . . . . . . 8
10. Normative References . . . . . . . . . . . . . . . . . . . . . 8
Appendix A. Expected Subtypes . . . . . . . . . . . . . . . . . . 9
1. Introduction
The purpose of this memo is to propose an update to [RFC2045] to
include a new primary content-type to be known as "archive".
[RFC2045] describes mechanisms for specifying and describing the
format of Internet Message Bodies via content-type/subtype pairs.
"archive" defines a fundamental type of content with unique
presentational, hardware, and processing aspects. Various subtypes
of this primary type are immediately anticipated, and will be covered
under separate documents.
1.1. Overview
This document will outline what an archive is, show examples of
archives, and discuss the benefits of grouping archives together.
This document is a discussion document for an agreed definition,
intended eventually to form a standard accepted extension to
[RFC2045].
1.2. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Definition of an archive
An archive primary media type identifies data that represents one or
more files [FILE] along with metadata. Archives are used to collect
Leonard & Kerwin Expires April 30, 2015 [Page 2]
Internet-Draft The archive Media Type for File Archives October 2014
multiple data files together into a single file for easier
portability and storage. Archive formats can provide many optional
services, including:
1. compression
2. encryption
3. authentication
4. backup
5. filesystem imaging
6. software packaging and distribution
7. volume-splitting (archive split into multiple contents)
8. block storage
Formats and techniques that perform one or more of these services
already exist under separate registrations. For example, the Content-
Encoding header can be used to compress Internet message content. The
distinguishing feature of the archive primary type is that these
services are integrated into the format itself, along with the
inclusion of file-specific metadata. Virtually all formats
contemplated under this primary type are designed to concatenate
multiple files into a single data stream, along with filenames and
other metadata. When an Internet-facing application handles content
labeled with this type, it SHOULD provide handling consistent with
the archive as a discrete data item. For example, an Internet mail
user agent would display an archive-labeled type with an archive
icon, possibly with a preview of the files contained therein (as
opposed to automatically traversing its contents, as it would for
multipart-labeled content).
Common operations include creating an archive, identifying files in
an archive, adding to an archive, backing up to an archive,
extracting an archive, restoring from an archive, deleting from an
archive, mounting and unmounting an archive, [[TODO: executing an
archive?]], and installing and uninstalling an archive.
* Creating: taking files from a filesystem and representing those
files in an archive.
* Identifying files: parsing an archive's format, extracting
information about files represented in the archive.
* Adding: parsing an archive's format, adding files or non-file data
to the archive. In virtually all cases, at least some part of
the archive's content will be modified (though perhaps only at
the end). Unlike, for instance, text media types, concatenating
two separate archive contents *never* yields a valid composite
archive.
Leonard & Kerwin Expires April 30, 2015 [Page 3]
Internet-Draft The archive Media Type for File Archives October 2014
* Backing up: taking some or all of a filesystem and representing the
filesystem in an archive, with the express intention of
recording the files as they exist in a source filesystem at the
time of backing up. For example, the compression, encryption,
and access control list (permissions) properties of the files
would be preserved.
* Extracting: parsing an archive's format, copying file data (or file
metadata) out of the archive into one or more files on a
destination filesystem. This operation implies that at least
some file metadata will be preserved, while other file metadata
may be adjusted or added to adapt to the local environment.
* Restoring: parsing an archive's format, copying file data out of
the archive into the destination filesystem, with the express
intention of recreating the files as they existed in a source
filesystem at the time of backing up. For example, the
compression, encryption, and access control list (permissions)
properties of the files would be preserved.
* Deleting: parsing an archive's format, removing file data (or
metadata) from the archive, requiring changes to the archive's
contents. Some archive formats permit orphan data in the archive
content; other formats require re-serializing some or all of the
archive.
* Mounting and unmounting: Mapping an archive's semantics directly to
a filesystem, so that the files represented in the archive can
be accessed using the filesystem's namespace with typical
filesystem APIs. Rather than being backed by a physical block
storage device, that part of the filesystem is backed by the
archive.
* Executing [[NB: this may be controversial; it is worth
discussing]]: Identifying executable semantics of an archive,
and causing code to execute.
* Installing and uninstalling [[NB: this may be controversial; it is
worth discussing]]: Treating the archive as a software package,
extracting certain contents in the archive and executing other
contents in the archive, according to some software packaging
protocol.
Leonard & Kerwin Expires April 30, 2015 [Page 4]
Internet-Draft The archive Media Type for File Archives October 2014
3. Consultation Mechanisms
Before proposing a subtype for the archive/* primary type, it is
suggested that the subtype author examine the definition (above) of
what an archive/* is and the listing (below) of what an archive/* is
not. Additional consultations with the authors of the existing
archive/* subtypes is also suggested.
4. Encoding and Transport
Unrecognized subtypes of archive SHOULD at a minimum be treated as
"archive/file". Like "application/octet-stream", the purpose of the
"archive/file" is to provide default handling; it does not represent
a particular archive format. Implementations SHOULD pass subtypes of
archive that they do not specifically recognize to a robust
general-purpose archive viewing application, if such an application
is available.
If default archive (archive/file) handling is not supported, it is
appropriate to treat the archive like "application/octet-stream".
Unless noted in the subtype registration, subtypes of archive SHALL
be assumed to contain binary data, implying a content encoding of
base64 for email and binary transfer for ftp and http.
The formal syntax for the subtypes of the model primary type SHOULD
look like this:
Type name:
archive
Subtype name:
xxxxxxxx
Required parameters:
none
Optional parameters:
TBD
Encoding considerations:
base64 encoding is recommended when transmitting archive/*
documents through MIME electronic mail.
Leonard & Kerwin Expires April 30, 2015 [Page 5]
Internet-Draft The archive Media Type for File Archives October 2014
Security considerations:
see Section 5 below
Interoperability considerations:
TBD
Published specification:
TBD
Applications that use this media type:
TBD
Fragment identifier considerations:
The considerations of this document, plus any extra syntaxes
not inconsistent with this document.
Additional information:
Deprecated alias names for this type:
(Include non-archive alias names,
such as those in application.)
Magic number(s): TBD
File extension(s): TBD
Macintosh file type code(s): TBD
See Appendix A for references to some of the expected subtypes.
Person and email address to contact for further information:
TBD
Intended usage: TBD (COMMON will be the most common)
Restrictions on usage: TBD
Author: TBD
Change controller: TBD
Provisional registration? (standards tree only): (Yes/No)
(Any other information that the author deems interesting may be
added below this line.)
Leonard & Kerwin Expires April 30, 2015 [Page 6]
Internet-Draft The archive Media Type for File Archives October 2014
The optional parameters consist of starting conditions and variable
values used as part of the subtypes.
5. Common Required and Optional Parameters
Unlike the text primary media type (for instance), virtually all
archive formats have been designed with almost all of the information
required for interpretation contained within the format. Therefore,
parameters are NOT RECOMMENDED; registrants are not expected to
register additional parameters.
Regrettably, not all archive formats are as "universal" or "complete"
as one might assume at first glance. This is because some archive
formats are very old or are based on older formats where backwards-
compatibility was a design goal; thus they were not designed with
transport across the Internet in mind. The ZIP file is an example:
although the modern ZIP supports Unicode [CITE], the default encoding
of ZIP filenames has always been Code Page 437. Since "archive"
contents are literally archives of computing history, sometimes
communicating the archive as-is, rather than updating the archive to
a more universal format, is necessary.
Implementations that are archive-type aware MUST support the
following parameters for maximum compatibility. At the same time, new
archives SHOULD NOT rely on these parameters for disambiguation; new
archives SHOULD be created in such a way that "universal"
interoperability is achieved with the archive's self-contained
information. [[TODO: code page--it's like charset but only applies to
certain strings in the archive, when the archive format is ambiguous;
do NOT attempt to apply this parameter as one would apply charset to
text/*. Endian-ness? Time/Y2K representation issues? Anything else?]]
6. Split Archives
Several archive formats (notably RAR and ZIP) support split archives.
A "split archive" is an archive that is stored in multiple files
(when stored as multiple files), or more generally, across multiple
storage media.
The ZIP format, for example, actually has two types of splits: "split
archive" and "spanned archive". A "split archive" is a standard ZIP
archive split over multiple files with the file extensions .z01,
.z02, etc.; the .zip file is the last file. A "spanned archive" is
the original format designed for use with swapping floppy disks. All
archive files have the same filename; the format uses volume labels
(presumably on floppy disks) to store disk numbers. Neither sub-
format is merely a naive division of the octet stream: each ZIP file
is parseable in its own right, and contains its own offset values.
Leonard & Kerwin Expires April 30, 2015 [Page 7]
Internet-Draft The archive Media Type for File Archives October 2014
The TAR format (or family of formats, including cpio and ustar) was
originally designed for streaming to and from tape devices, so
splitting is accomplished differently.
[[TODO: Consider how to label this content. archive/zip^01?
archive/zip; split=01? Something else? How shall 01 be associated
with 02, 03, etc., when the Content-Disposition: ; filename=""
parameter is "presentation-information" and may be separated from the
Content-Type header information?]]
7. Fragment Identifier Syntax
Because all archives represent files, archives can serve as virtual
filesystems. Respondents have noted that an archive's files can be
addressed by a fragment syntax that resembles a filesystem path. At
the same time, archives may record files in different ways (along
with different types of metadata), suggesting that a common baseline
with flexible extension points is more appropriate than a fixed
universal syntax. [[TODO: This will be explored in future drafts.
Note the similarities with this and the file: URI...]]
[[TODO: consider how to provide a fragment for content in the
archive. NB: most archives do NOT provide Content-Type/media type
information! So /foo.html being an HTML file is just an *assumption*,
and possibly a very wrong one at that. There is no IETF registry for
file extensions.]]
8. Piped-Composite Type Suffix Syntax
[[TODO: discuss tar piped through bzip2, gzip, etc. as a distinct
file format, rather than an application of the Content-Encoding:
header. Suggest common suffix like archive/tar|bzip2, where | is some
useful character but not + since + is for structured syntaxes.]]
9. Security Considerations
Archives represent files, file metadata, and filesystems; thus,
security issues loom large because archives can contain just about
anything. These concerns are magnified by the arbitrary transport of
such data across the Internet. [[TODO: complete.]]
10. Normative References
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Leonard & Kerwin Expires April 30, 2015 [Page 8]
Internet-Draft The archive Media Type for File Archives October 2014
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC6868] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, RFC
6838, January 2013.
Appendix A. Expected Subtypes
The following archive formats will be explored for registration as
subtypes along with this effort:
Archiving Only
TAR
Multipurpose (archiving, compression, encryption)
ZIP, ACE, RAR, 7-Zip, StuffIt, FreeArc
Software Packaging
MSI, RPM, JAR, XPI, CAB, CRX, APK
Disk Imaging
ISO, NRG, BIN/CUE, VMDK, WIM, PartImage, IMG/IMA/IMZ, DMG
Authors' Addresses
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Matthew Kerwin
Email: matthew@kerwin.net.au
URI: http://matthew.kerwin.net.au/
Leonard & Kerwin Expires April 30, 2015 [Page 9]