Internet DRAFT - draft-leibzon-content-digest-edigest
draft-leibzon-content-digest-edigest
Network Working Group W. Leibzon
Internet-Draft Elan Networks
Expires: January 11, 2006 July 10, 2005
Content-Digest and EDigest Header Fields
draft-leibzon-content-digest-edigest-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 11, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This document defines Content-Digest header field, which can be used
for including hash of MIME content body and header fields data and
can support several hash algorithms and canonicalization methods.
EDigest header field is also defined which allows to specify digest
information for external content part or hash of several content
parts joined together.
Requirements Language
Leibzon Expires January 11, 2006 [Page 1]
Internet-Draft Content-Digest and EDigest July 2005
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Content-Digest Header Field . . . . . . . . . . . . . . . . . 4
2.1 Algorithm ("a") parameter . . . . . . . . . . . . . . . . 4
2.2 Host Information ("i") parameter . . . . . . . . . . . . . 5
2.3 Header Field List ("h") parameter . . . . . . . . . . . . 5
2.4 Canonicalization Method ("c") parameter . . . . . . . . . 6
2.5 Canonicalized Data Size ("s") parameter . . . . . . . . . 7
2.6 Time Stamp ID ("t") parameter . . . . . . . . . . . . . . 8
2.7 Hash Data ("d") parameter . . . . . . . . . . . . . . . . 8
3. Creation of Content-Digest Header Field . . . . . . . . . . . 10
3.1 Header Fields Processing . . . . . . . . . . . . . . . . . 10
3.2 Content Body Data Processing . . . . . . . . . . . . . . . 13
3.3 Digest Hash Creation . . . . . . . . . . . . . . . . . . . 16
4. Digest Hash Verification Procedure for Content-Digest . . . . 19
5. EDigest Header Field . . . . . . . . . . . . . . . . . . . . . 21
5.1 Content URL ("u") parameter . . . . . . . . . . . . . . . 21
5.2 Creation of EDigest Header Field . . . . . . . . . . . . . 23
5.3 Verification of EDigest Header Field . . . . . . . . . . . 24
6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.1 Simple Content-Digest as Replacement for Content-MD5 . . . 25
6.2 Content-Digest used in Email Message . . . . . . . . . . . 25
6.3 Content-Digest used in HTTP Transmission . . . . . . . . . 27
6.4 EDigest used in Email . . . . . . . . . . . . . . . . . . 27
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30
8. Security Considerations . . . . . . . . . . . . . . . . . . . 31
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
9.1 Normative References . . . . . . . . . . . . . . . . . . . 32
9.2 Informative References . . . . . . . . . . . . . . . . . . 32
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 33
A. Collected Grammar . . . . . . . . . . . . . . . . . . . . . . 34
Intellectual Property and Copyright Statements . . . . . . . . 37
Leibzon Expires January 11, 2006 [Page 2]
Internet-Draft Content-Digest and EDigest July 2005
1. Introduction
For data transmission it is often desirable to be able to confirm
integrity of the data and make certain entire data has been
transmitted without modification and common method used for this is
to calculate and send cryptographic hash digest of the data. With
data transmissions involving MIME encapsulated data as used in SMTP,
HTTP and other protocols this can be accomplished with Content-MD5
header field defined in [RFC1544].
However Content-MD5 header is tied to MD5 hash algorithm and recent
research summarized in [draft-hoffman-hash-attacks-04] indicates that
it is vulnerable to certain collision attacks and with fast computers
the collision can be found in as fast as 4 hours. Additionally
Content-MD5 hash creation involves only message data where as
additional important information regarding message or MIME part can
also be contained in its header and data integrity of header fields
needs to be protected as well.
This document defines Content-Digest header field which provides
universal syntax for including hash of MIME or message data and
optionally its header fields with support for several hash algorithms
and canonicalization methods and other optional information regarding
digest creation. Additionally EDigest header field is also defined
which allows to include digest information for external MIME part or
hash of several parts joined together.
Leibzon Expires January 11, 2006 [Page 3]
Internet-Draft Content-Digest and EDigest July 2005
2. Content-Digest Header Field
Content-Digest is similar to Content-MD5 as it is a unique per-MIME
part (or message data) field specifying hash digest for entire mime
content and optionally including mime content header data. There can
be only one Content-Digest header field in any mime or message header
and it should be added by the originating user agent (this means it
is added by MUA acting as MSA as far as email routing is concerned).
Message relays and gateways are expressly forbidden from adding
Content-MD5 and adding them to messages already in transit.
Content-Digest header field has syntax similar to MIME fields and
consists of multiple parameters with values separated by ";" (i.e.
"param1=val1 ; param2 =val2"), please refer to Appendix A for the
exact ABNF syntax.
The start of the Content-Digest header field is always version
information parameter, which looks like this:
Content-Digest: v=1.0 ; ...
This document describes use of the Content-Digest header field with
version "1.0". Software that can understand "1.0" version of the
Content-Digest header field SHOULD also attempt to interpret header
fields that have the same major number "1" but different minor number
(i.e. like "1.1"). Interpretation of Content-Digest header field
with major number being anything other then "1" is not defined in
this document and software that does not otherwise know how to
interpret header field with different major number MUST NOT attempt
to evaluate Content-Digest header field and message processing should
be done as if no Content-Digest header field was present.
After the version information header field contains number of
additional parameters with parameter data ("d") being the only one
that is required and all other parameters being either optional for
processing or default value is assumed if the parameter is not
present. Each parameter is further specified in separate subsection
in this document:
2.1 Algorithm ("a") parameter
An algorithm for digest hash computation is specified by means of "a"
attribute. List of algorithms and possible values for "a" attribute
is below:
md5 - MD5 algorithm producing 128-bit hash as defined in [RFC1321]
Leibzon Expires January 11, 2006 [Page 4]
Internet-Draft Content-Digest and EDigest July 2005
sha1 - SHA1algorithm producing 160-bit hash as defined in
[RFC3174]
sha224 - SHA algorithm producing 224-bit hash as defined in
[RFC3874]
sha256 - SHA algorithm producing 256-bit hash as defined in
[FIPS180-2]
sha384 - SHA algorithm producing 384-bit hash as defined in
[FIPS180-2]
sha512 - SHA algorithm producing 512-bit hash as defined in
[FIPS180-2]
If algorithm is not specified and "a" attribute is absent in Content-
Digest header field, then it should be assumed to be default SHA1
algorithm.
An implementation confirming to this document MUST support SHA1
algorithm, SHOULD support MD5 and SHA224 algorithms and MAY support
SHA256, SHA384 and SHA512 and other algorithms. New algorithms maybe
introduced by other documents and do not require introduction of the
new version number for Content-Digest.
If algorithm specified in the "a" parameter of Content-Digest is
unknown to the software evaluating the header, it MUST NOT attempt to
evaluate Content-Digest header field and message processing should be
done as if no header field was present.
2.2 Host Information ("i") parameter
An optional "i" parameter maybe used to specify hostname of the
system adding Content-Digest header field. This is purely
information parameter and is not used in the processing and
evaluation of the digest header.
2.3 Header Field List ("h") parameter
MIME header fields that are included with content data for digest
hash computation are listed by means of "h" parameter of Content-
Digest. The field names are listed after the "h=" one by one
separated by ",". Group of field that have common starting name can
also be specified by using * ending, for example "h=Content-*" means
all header field names that start with Content- such as Content-Type,
Content-Transfer-Encoding, Content-ID and others. If '*' is not used
then the name MUST match in full field name specified in the header
up to ":", so "h=Content-Type" would match "Content-Type:" field name
Leibzon Expires January 11, 2006 [Page 5]
Internet-Draft Content-Digest and EDigest July 2005
header line but would not match "Content-Type-Extra:" header line.
If all field names are to be included then this is specified as
"h=*", but careful consideration must be given if that is desirable
as in some cases new header fields are added to the message or
specific mime parts while it is in transit.
The field names are not case-sensitive and "h=Content-Type" means
that even if the actual header field name in the MIME part is
"CONTENT-TYPE" it would be a match and similarly for "content-type"
or "cOnTeNt-TyPe" or any other variation of the same letters in
different case.
Note that MIME fields may match against the list in the "h" parameter
are not relative to the Content-Digest header field position in the
header and may appear both below and above it in the MIME and message
header. Actual Content-Digest header field is never included as part
of its own digest, even if Content-Digest name matches list of header
fields in the "h=" (such as when "h=Content-*" or "h=*" are used).
If there is no "h" parameter in Content-Digest header field then it
means no header field data is included in the digest and digest is
data is hash of content body only like it is with Content-MD5 header
field.
2.4 Canonicalization Method ("c") parameter
Canonicalization is a process of data transformation that makes
format of the data acceptable based on constraints imposed by
additional data processing functionality. In the case of digest
computation this describes the process of transforming data into
canonical form and actual hash computation is then done on the data
in this canonical form.
Canonicalization is most useful as way to insure that data hash can
be verified even if some small data conversion is done when message
is being transmitted. For example some intermediate message
processing software interpret and correct what they consider to be
header field problems such as case variations or too many white-space
characters between header field name and value of the field; in other
cases message processing software may remove trailing white-space
characters on any line or first or last empty lines in the message.
All such processing would normally result in not being able to verify
hash of original message content, but some canonicalization methods
can take this behavior into account and provide consistent format of
data for digest verification.
Note that doing canonicalization for digest computation does not mean
that such canonicalized data is actually transmitted. Conversion and
Leibzon Expires January 11, 2006 [Page 6]
Internet-Draft Content-Digest and EDigest July 2005
data transformation rules for data transmission are in fact covered
by content-transfer-encoding as specified in part 6 of [RFC2045]. As
it relates to canonicalization and digest computation, content-
transfer-encoding conversion should be done on original non-
canonicalized data after the digest hash has been computed and
appropriate Content-Digest header field added. And when digest is
being verified, the canonicalization and digest computation are done
after undoing any content-transfer-encoding.
Similar to support provided for multiple cryptographic algorithms,
Content-Digest provides supports using multiple canonicalization
processing methods with small set of methods being required to be
supported by all implementations. The canonicalization methods used
for header field processing and for content body are also different
and so "c" parameter value is composed of two separate parts - "a,b"
where 'a' specifies method used for header field data
canonicalization and 'b' specifies method used for body
canonicalization. The canonicalization methods that MUST be
supported are "bare", "simple", "nofws" for header fields processing
and "bare", "text", "nofws", plus special values of "mimeform" and
"none" for content body processing. If canonicalization method
specified in the "c" parameter of Content-Digest is unknown to the
software evaluating the header field, it MUST NOT attempt to evaluate
Content-Digest header field and message processing should be done as
if no Content-Digest header field was present.
If there is no "c" parameter specified in the Content-Digest header
field than it is assumed to be default "simple,mimeform" value. If
value of the "c" parameter is one keyword like "c=nofws", than when
doing canonicalization default "simple" method is to be used for
header fields canonicalization and for body data the canonicalization
method specified as value of "c" is to be used.
More information about canonicalization methods and canonicalization
process can be found in section 3 of this document.
2.5 Canonicalized Data Size ("s") parameter
Number of bytes (octet count) in the canonicalized data (as used for
computing hash digest) can optionally be included in the "s"
parameter. This is primarily informational field and can be used
during digest header verification as way to determine if content had
been modified. If the number in "s" does not match the number of
bytes of the canonicalized digest being verified then verifying
system SHOULD abort the processing and can choose to report an
extended error indicating that content has been changed and size does
not match.
Leibzon Expires January 11, 2006 [Page 7]
Internet-Draft Content-Digest and EDigest July 2005
There may also be some situations where being able to verify majority
of the data is sufficient. In such a case an application MAY try to
use size parameter and after doing canonicalization if the result is
larger then the original, cut the result to be exactly the number of
bytes as specified in "s" and then attempt to do the verification of
the digest. If it succeeds, such application should in some way
report that only part of the content was successfully verified and
may also optionally choose to discard the unverified part of the
message content data.
More information about how size parameter is used is found in section
4.
2.6 Time Stamp ID ("t") parameter
Optional "t" parameter is used for providing time-stamp information
on when digest hash was created. In EDigest this is also used as
unique identifier (unlike Content-Digest, multiple EDigest header
fields can exist in the same header).
The value of the "t" parameter is data based on ISO8601 time format
and consists of multiple digits of the form YYYYMMDDhhmmssxxxx where
YYYY is 4-digit year, MM is 2-digit month, DD is 2-digit day, hh is
2-digit hour, mm is 2-digit minute, ss is 2-digit seconds, xxx are
additional digits that maybe milliseconds or some other unique number
identifying specific header field. The number may well be less then
18 digits (14 is a lot more common) and for example may only contain
YYYMMDD. The time and date data used should be UTC with no locale
information. Some examples of "t" parameter as follows:
t=20050704142754 - corresponds to RFC2822 date
"Mon, 4 Jul 2005 14:27:54 -0500"
t=20050503 - corresponds to May 3, 2005
Note that the number specified in the "t" parameter is informational
only and should not be assumed to always be a time-stamp or
automatically interpreted as such by the application; automatic use
of this number should be limited to providing unique reference.
However, that this number usually contains timestamp maybe of use for
purposes of email debugging and forensics.
2.7 Hash Data ("d") parameter
Data parameter contains the actual digest hash data. Hash is
calculated using algorithm specified in 'a' parameter based on data
from content header fields (those that match listing in 'f'
parameter) and content body after applying appropriate
canonicalization as specified in "c" parameter. Resulting hash data
is converted into BASE64 encoding as specified in section 3 of
Leibzon Expires January 11, 2006 [Page 8]
Internet-Draft Content-Digest and EDigest July 2005
[RFC3548] with '=' pad symbol and placed after 'd='. If hash BASE64
hash data ends with '=' then the data MUST also be enclosed in double
quotes, i.e. d="...="
Hash data can be broken into multiple lines as specified in [RFC2822]
section 2.3.3 but its preferable that entire data parameter (starting
with 'd=') stay on one line in the header. It is also preferable
that data parameter be the last parameter of the Content-Digest
header. Use and placement of data is illustrated in more details in
the examples contained in section 6 of this document.
Leibzon Expires January 11, 2006 [Page 9]
Internet-Draft Content-Digest and EDigest July 2005
3. Creation of Content-Digest Header Field
Content-Digest field is created by the originating user agent which
starts transmission of the content and not by intermediate content
retransmission system. For email the originating user agent is an
MUA program or any other program acting as MSA and as such is the
originating agent in SMTP transmission. For HTTP, the originating
agent is an HTTP server that serves the content from its data storage
where it has been placed by the user or generates it on the fly (CGI
or similar), but not any kind of caching HTTP system which does not
actually generates the content by only retransmits the content
received from another web server. Other MIME transmission protocols
can also use Content-Digest using similar criteria to above in
deciding which system involved in transmission should be adding
Content-Digest header field.
The content transmission origination system (thereafter CTOS) that
wants to add Content-Digest header field should proceed as follows:
3.1 Header Fields Processing
First decision should be made on what data is to be used for digest
hash, which is based on local preferences and on how digest hash is
going to be used. Generally it is good idea to include only content-
specific header fields such as Content-Type but not transmission
header fields such as Connection in HTTP. This is because while
content-specific fields are not something that should change during
transmission, the other header fields may change if content is
retransmitted (such as by forwarding or other redirection system in
email or by caching proxy server in HTTP). Content-Transfer-Encoding
header field (which provides information on data transmission
encoding) is thus something should be included in the list of header
fields only if change of transit encoding by intermediate systems is
not allowed (which is not always true)
After list of header fields which are to be part of digest hash data
is ready, entire "h" attribute can be created. Consideration should
be given as to if use of "*" is appropriate to combine several fields
together because if new fields with same prefix are added by
intermediate retransmission systems this would cause failure in
digest verification (as such "h=*" should generally not be used
unless message transmission is point-point and no retransmission
systems are expected or allowed and use of "h=Content-*" is possible
only if Content-Transfer-Encoding is not specified or is not expected
to change).
Next canonicalization should be applied to the header fields data.
There are 3 header canonicalization processing methods defined by
Leibzon Expires January 11, 2006 [Page 10]
Internet-Draft Content-Digest and EDigest July 2005
this document - 'bare', 'simple' and 'nofws' and to show how they
differ an example will be helpful so it is assumed that as at the
beginning the content data header was:
Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-ID: <218F64C460.u314@example.com>
Content-Transfer-Encoding: 7bit
Content-Description: Collection Footer
And lets assume that for this example, the header fields to be
included are all of the above except Content-Transfer-Encoding and so
this is described with attribute "h=Content-Type,Content-ID,Content-
Description,MIME-Version".
Now the following is how canonical data form is calculated depending
on which canonicalization method is used (for each method it is
assumed that we start with empty canonical header form buffer):
BARE - In this canonicalization method header data is largely used as
is. The algorithm is: for each header field name listed in "h" in
the order the fields are listed, try to find one or more instances
of the matching field (full name exactly the same as listed up to
":" or if * is used, then field name up to * is the same) and then
entire header field line as is starting with field name itself and
up (but not including) the first letter of the next header field
in the header (including end of line characters) are added into
canonical data form buffer. For the example above, the result of
applying this method is canonical buffer data as follows:
Content-Type: text/plain;
charset="us-ascii"
Content-ID: <218F64C460.u314@example.com>
Content-Description: Collection Footer
MIME-Version: 1.0
SIMPLE - In this canonicalization method, common problems that are
encountered with transformation of the header fields are accounted
for and data is made to be consistent with what defined in ABNF
header field syntax in [RFC2822] except 8-bit data is not touched
(by RFC2822 there should not be any 8-bit data in the message and
mime header but unfortunately it does happen). The system for
choosing header fields and their order is the same as with 'bare'
but header fields data is not copied as-is to the canonical data
form buffer, instead the following is done for each header field:
Leibzon Expires January 11, 2006 [Page 11]
Internet-Draft Content-Digest and EDigest July 2005
1. If header field consists of multiple lines, the lines are
unfolded (procedure described in section 2.3.3 of [RFC2822]
and involves removal of CRLF pair) to become one long field
line. If they are any single line break characters CR or LF
they are also to be removed as well as any NULL (ASCII code 0)
characters. In above example the only header field that is
effected is "Content-Type" which consists of data in two
lines.
2. All multiple concurrent white-space characters (white-space is
WSP as defined in [RFC2822] section 2.2.2 and includes SP and
HTAB) are eliminated from the header field. In above example
this would effect double white space after "Content-Type:" and
double white space between "Collection" and "Footer".
3. The header field name itself is made to be entirely lowercase.
That means in header field name (start of header field line to
first ":") for each octet character with ASCII code 'a"
between 65 and 90 the character is replaced with character
with ASCII code a+32.
4. If there is a sequence of one or more WSP at the end, it is
removed.
5. A new CRLF character is added to the end of the newly
converted header field line.
The result of applying this method to the example given above is
canonical data block:
content-type: text/plain; charset="us-ascii"
content-id: <218F64C460.u314@example.com>
content-description: Collection Footer
mime-version: 1.0
NOFWS - In this canonicalization method (which name is abbreviation
for "No Free White Space"), only the alpha numeric characters of
data are used for digest. While that means that the core of the
content text is preserved and verified, but there maybe some
problems with this system as all spaces between words are lost.
This canonicalization method uses similar algorithm to 'simple'
with the following steps for data transformation:
1. If header field consists of multiple lines, the lines are
unfolded (procedure described in section 2.3.3 of [RFC2822]
and involves removal of CRLF pair) to become one long field
line. If they are any single line break characters CR or LF
they are also to be removed as well as any NULL (ASCII code 0)
Leibzon Expires January 11, 2006 [Page 12]
Internet-Draft Content-Digest and EDigest July 2005
characters. In above example the only header field that is
effected is "Content-Type" which consists of data in two
lines.
2. All octet characters with ASCII value less then 33 and more
then 126 are removed from header data.
3. The header field name itself is made to be entirely lowercase.
That means in header field name (start of header field line to
first ":") for each octet character with ASCII code 'a"
between 65 and 90 the character is replaced with character
with ASCII code a+32.
The result of applying this method to example above is canonical
data block as follows (note that " \" is used to indicate line
break for purposes of this document only since the result of using
above canonicalization method is one long line in without breaks):
content-type:text/plain;charset="us-ascii"content-id: \
<218F64C460.u314@example.com>content-description:Coll \
ectionFootermime-version:1.0
It is RECOMMENDED that default 'simple' canonicalization method be
used when content data is being transmitted to unknown recipient
across the Internet. This canonicalization method can deal with
common header data transformation by intermediate systems and does
not cause loss of content data. If it is important to make certain
data is received exactly the same as it was transmitted with no
modifications or reformatting of any kind, then 'bare'
canonicalization can be used but this should normally be reserved
only for known and pre-arranged data path transmission where it is
known to be safe. For cases when data transmission goes through
series of relays and it has been noticed that digest hash does not
verify as a result, then using 'nofws' can be considered but it
should be noted that it only provides verification of the text
symbols and is not secure enough for full data integrity protection.
3.2 Content Body Data Processing
Content body processing for digest hash creation may also involve
data transformation to canonical format depending on chosen
canonicalization method. There are 4 body data canonicalization
processing methods defined by this document - 'bare', 'text',
'mimeform', 'nofws' and 'none' and, as with to header fields
canonicalization, to show how they differ a simple example of data
before canonicalization is used (note that "\cr and \lf in the
example represent CR and LF characters):
Leibzon Expires January 11, 2006 [Page 13]
Internet-Draft Content-Digest and EDigest July 2005
\cr\lf
Happy 4th of July,\cr
\cr
Fireworks at pier 39 at 9:30pm, be there. \cr
\cr
Will
Now the following is how canonical body data is calculated depending
on which canonicalization method is used (for each method it is
assumed that we start with empty canonical body buffer):
BARE - In this canonicalization method body data is unchanged and
used 100% as-is.
TEXT - In this canonicalization method, some problems that are
encountered with transmission of text data are dealt with and its
made certain that data is consistent with canonical form of MIME
text/plain content-type object as described in section 4 of
[RFC2049] and with text message data format as described in
section 2 of [RFC2822]. This is done as follows:
1. All NULL (ASCII code 0) characters are removed and any single
CR or single LF character is replaced with CRLF pair (if CR is
already followed by LF, then neither is changed).
2. The data is examined to make certain that all lines are no
longer then 998 octet characters long (a line is defined as
continues stream of characters terminated by CRLF and starting
with either beginning of the data or with first character
after previous CRLF). If any line is longer then 998
characters then after 998's character a CRLF pair is inserted
and the procedure described in this step is repeated.
3. Any sequence of one or more white-space characters (white-
space is WSP as defined in [RFC2822] section 2.2.2 and
includes SP and HTAB) that are immediately followed CRLF is
removed.
4. If there is a sequence of one or more CRLF pairs at the start
of the data content (as left following after 3) it is removed.
The result of applying this method to example above is canonical
data block as follows:
Leibzon Expires January 11, 2006 [Page 14]
Internet-Draft Content-Digest and EDigest July 2005
Happy 4th of July,\cr\lf
\cr\lf
Fireworks at pier 39 at 9:30pm, be there.\cr\lf
\cr\lf
Will
NOFWS - This is another canonicalization primarily for text data and
in this canonicalization method only the alpha numeric characters
of data are left (this is less secure as far as data integrity but
the core information of the context text is still protected). The
canonicalization method is actually fairly simple and consists of
one step as follows:
* All NULL (ASCII code 0), CR (ASCII code 13), LF (ASCII code
10), HTAB (ASCII code 9), VTAB (ASCII code 11), FF (ASCII code
12) and SP (ASCII code 32) characters are removed.
The result of applying this method to example above is canonical
data block consisting of one line as follows:
Happy4thofJuly,Fireworksatpier39at9:30pm,bethere.Will
MIMEFORM - This is a special canonicalization method which is meant
to have data converted into MIME canonical form. As described in
section 4 of [RFC2049] MIME canonical form depends on the type of
data which is based on Content-Type and as far as this
canonicalization method, if data is text media type (based on
Content-Type: text/????) then TEXT canonicalization method is
used. For all other media types, BARE canonicalization method is
used. This is default canonicalization method for content data.
NONE - A special canonicalization value of "none" allows to specify
that body data is not part of digest hash (i.e. the
canonicalization process uses none of the data). This is used
with EDigest (extended form of Content-Digest header field which
is discussed further in section 5 of this document) when it is
desirable to create digest hash only for a group of specific
header fields.
For text content data the 'text' canonicalization is fairly flexible
to take care of common alterations with no security risks and If it
has been noticed that data transmission is likely to involve relays
that do such modifications that digest hash would no longer verifying
as a result, then using 'nofws' can also be considered. In cases
where it is very important to make certain data is received exactly
the same as it was transmitted with no modifications or reformatting
of any kind, 'bare' canonicalization can be used but this should
normally be reserved only for known and pre-arranged data path
Leibzon Expires January 11, 2006 [Page 15]
Internet-Draft Content-Digest and EDigest July 2005
transmission where it is known to be safe. If the content data is
not text and is not going to be transmitted as text (i.e. with 7bit
or quoted-printable content-transfer-encoding), then it is very
unlikely to be touched by any intermediate system and using 'bare'
canonicalization method is appropriate.
Based on above the CTOS should make certain to use appropriate
canonicalization. It is important to understand that default
'mimeform' depends on the Content-Type header field value and
defaults to 'text' for any text media type MIME content and to "bare"
otherwise and this works well for most cases. There may also be
number of other content-types which are not specifically identified
as text media type but that use text data and in those cases CTOS
should specifically select 'text' canonicalization method and specify
that in "c" parameter. Notice that Multipart and Message complex
mime types are also very often composed only of text components and
in such cases using 'text' canonicalization may also be appropriate
and will need to be specified in the "c" parameter.
3.3 Digest Hash Creation
After processing of content header and body as described above, the
result would be two data buffers with results of containing
canonicalized form of header fields and canonicalized data body. For
the actual data used for digest hash creation, these are joined
together with canonical header fields data going first and then
canonicalized data body being added to that.
Note that even if 'bare' body canonicalization is used for both
header fields data and content body processing, the result of there
canonical form being joined would not be the same as original MIME
content part as it would be missing line separating content header
and body. So for example if the original MIME content was:
Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-ID: <218F64C460.u314@example.com>
Content-Transfer-Encoding: 7bit
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
The result of using canonicalization and header processing described
by "c=bare,bare; h=content-type,content-id,mime-version" would be the
following data (with octet count 183) ready for hash creation:
Leibzon Expires January 11, 2006 [Page 16]
Internet-Draft Content-Digest and EDigest July 2005
Content-Type: text/plain;
charset="us-ascii"
Content-ID: <218F64C460.u314@example.com>
MIME-Version: 1.0
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
Where as result of using default "simple,mimeform" canonicalization
with same list of header fields (described by "h=content-
type,content-id,mime-version; c=simple,mimeform;" or just "h=content-
type,content-id,mime-version") would be:
content-type: text/plain; charset="us-ascii"
content-id: <218F64C460.u314@example.com>
mime-version: 1.0
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
The octet byte count of above data is 177 (the size calculation
includes newline CRLF symbols which thus increase add 2 bytes for
every line, even the empty ones). Calculation of the octet size of
canonicalized data is optional and done in order to be used as value
of "s" parameter of Content-Digest.
Once processing of content is complete and data is ready, the
cryptographic hash algorithm is applied to that data. The choice of
the hash algorithm should be made based on system policies and
security considerations in regards to the transmission. Default SHA1
algorithm is a good choice and offers sufficient security for most
cases and it is NOT RECOMMENDED that anything less secure that does
not result in at least 160-bit hash be used. The actual hash
creation is described other documents, please see [RFC1321] on how to
create MD5 hash, [RFC3174] for SHA1 hash and [FIPS180-2] for other
versions of SHA1 algorithm that produce more then 160-bit hash data.
The result of the entire of the process as described in this section
would be the following Content-Digest header field:
Content-Digest: v=1.0; h=content-type,content-id,mime-version;
c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
Note that since "a=sha1" and "c=simple,mimeform" are default, the
above can shortened to:
Leibzon Expires January 11, 2006 [Page 17]
Internet-Draft Content-Digest and EDigest July 2005
Content-Digest: v=1.0; h=content-type,content-id,mime-version;
d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
Now Content-Digest header is ready to be added into content part. It
is best to add Content-Digest below other MIME or message header
fields (some of which would have been part of data that went into
digest hash), but Content-Digest could be added into other parts of
the header as well. The above example, after Content-Digest is added
becomes:
Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-ID: <218F64C460.u314@example.com>
Content-Transfer-Encoding: 7bit
Content-Digest: v=1.0; h=content-type,content-id,mime-version;
c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
HTTP also has concept of "trailer" (which is data after content body
and consists of same types of fields as header), Content-Digest can
be placed in the trailer if desired, but note that list of fields in
"h" only represents fields in header and NOT in the trailer.
Leibzon Expires January 11, 2006 [Page 18]
Internet-Draft Content-Digest and EDigest July 2005
4. Digest Hash Verification Procedure for Content-Digest
Verification procedure for Content-Digest header follows largely the
same procedures as for creation of the header field. This is done as
follows:
1. First for content part being verified, the header is taken and
canonical version of that is produced following procedure
outlined in section 3.1 and based on header fields list found in
"h" parameter and header canonicalization method listed in "c"
parameter (note that canonical version must not include the
actual Content-Digest header field even if it would match based
on list in "h"). If there was no "h" parameter in the Content-
Digest header, then the result of this step is empty string.
2. Next canonical version of content body data is produced as
described in section 3.2 based on "c" parameter. This is added
to the data produced as a result of step 1
3. If parameter "s" is present in Content-Digest header field, then
the octet size of the data from step 2 is calculated. If this
size does not match value in parameter "s", then verifying system
has the following options:
1. Abort further processing and return an error indicating that
Content-Digest can not be verified and content has been
changed.
2. Cut down the number of bytes from the end of the data from
step 2 to so it matches the number of bytes in "s" before
proceeding to step 4.
4. Cryptographic hash is produced using algorithm listed in
parameter 'a' (sha1 if 'a" is not present) based on data from
step 2 (or step 3.2). This cryptographic hash data is compared
against data in "d" parameter of Content-Digest. If they match,
then result of verification is success, otherwise its a failure.
In regards to removing data from canonicalized content as indicated
in step 3.2 to match size parameter in Content-Digest header field,
this is something that should be done only in specific context where
it is believed that an intermediate system may exist that has added
extra data to the end of content during transmission. This happened
to be the case of email message that came through mail list (which
often add their own footer to the message) and so dropping the end of
the email message would allow to verify the original version.
However, one must be aware of the dangers of doing so as it means
only part of the message data is verified and this is a serious
Leibzon Expires January 11, 2006 [Page 19]
Internet-Draft Content-Digest and EDigest July 2005
security issue that can be exploited. It is therefore best that if
the verifying system chooses to verify only part of the content, that
it consider changing the entire message to only include part that has
been verified (optionally this may involve not removing the
unverified content part, but instead moving it into separate
attachment content data). It should be noted that since mail lists
add their footer to text messages, this method should not be
attempted if data content is of type other then text and for binary
data the Content-Digest verification should simply be considered to
have failed (as in step 3.1 above) if size of canonicalized content
does not match value of "s" parameter of Content-Digest.
Leibzon Expires January 11, 2006 [Page 20]
Internet-Draft Content-Digest and EDigest July 2005
5. EDigest Header Field
EDigest header field is very similar to Content-Digest (it can be
considered an extended form of Content-Digest) and also includes hash
digest data of the content, but unlike Content-Digest, it does not
have to be unique field for particular MIME part or attached to it.
As such the EDigest header field provides the following additional
functionality over Content-Digest:
1. It can be added by intermediate transport agents (including
message relays and gateways) and not only at the transmission
origin.
2. It can be in used in parts other then content header itself and
as such allows digest reference for MIME subpart of the message
and for externally located MIME part
3. It can provide digest hash that can be used to verify data of
several MIME parts together.
The syntax of Edigest field (for full syntax please refer to Appendix
A) is identical to Content-Digest and consists of all the same
parameters plus one more optional parameter "u".
5.1 Content URL ("u") parameter
The value of EDigest header field "u" parameter is URL data pointing
to the content which hash the digest header field is for. This URL
data is list of one or more URLs with each URL enclosed in "<" and
">" and separated by FWS - this is very similar to how URLs are
specified in References header field in email header.
Common use of URL parameter is when EDigest header field specifies
hash of MIME entity which is enclosed within another MIME entity or
message and its desirable to provide hash of the content directly in
this parent entity. In such a case "cid" (Content-ID as specified in
[RFC2392]) is used and specifies reference to unique id of the
content as is found in its Content-ID header field. An example of
such use is as follows:
Leibzon Expires January 11, 2006 [Page 21]
Internet-Draft Content-Digest and EDigest July 2005
Edigest: v=1.0; u="<cid:218C460.u314@example.com>";
h=content-type,message-id,mime-version;
a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
From: will@example.com
To: mary@example.net
Subject: Fireworks
Date: Mon, 4 Jul 2005 12:34:26 -0400
Message-ID: <will.123456789@example.com>
Mime-Version: 1.1
Content-Type: Multipart/Mixed; Boundary="NextPart"
This message is in MIME format. The first part should be
readable text, while the remaining parts are likely
unreadable without MIME-aware tools.
--NextPart
Content-Type: text/plain;
charset="US-ASCII"
MIME-Version: 1.0
Content-ID: <218C460.u314@example.com>
Content-Transfer-Encoding: 7bit
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
URL scheme "Cid" should be considered to default URL scheme, so
entering "cid:" is optional and parameter
'u=<cid:218C460.u314@example.com>' can also be expressed simply as
'u=<218C460.u314@example.com>'.
Value of "u" parameter needs to be a reference to unique content part
so no two content parts in the message can have the same Content-ID
even if those are subparts of "Multipart/Alternative" (in [RFC2392]
it is specified that in such a case content parts may have common
content-id for reference) for digest to be used with those parts.
For those cases where having common content-id for referencing to one
of the multiple parts within "Multipart/Alternative" is necessary for
an application, such common reference id should be to Content-ID
header field for actual Multipart/Alternative MIME part rather then
Content-ID of its subpart.
With digest "u" parameter it is also possible to specify more then
one content part, for example:
Edigest: v=1.0; h=content-type,content-id,mime-version;
u="<218C460.u314@example.com> <218C460.u315@example.com>";
a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
Leibzon Expires January 11, 2006 [Page 22]
Internet-Draft Content-Digest and EDigest July 2005
specifies that hash data in "d" is based on content data in both
<218C460.u314@example.com> and <<218C460.u315@example.com>.
This is used as replacement for having multiple digest header fields
added for individual content parts when all these content parts are
related and are not expected change individually for message during
transport. In such situations a hash for entire message could be an
option, but such hash would not verify if content parts are
rearranged or new content part is added to the message during
transport, where as hash data in EDigest header field with multiple
content parts listed in "u" would not be effected if new content is
added to the message or if existing parts are in any way rearranged.
Other uses of this could involve multiple external data components
(such as data content parts available on web server) which are
referenced from content parts in the message and which client is
expected to have downloaded as part of message verification and
presentation to the user.
5.2 Creation of EDigest Header Field
EDigest header field is created in a similar way as Content-Digest,
the differences are present only when the "u" parameter is to be
used. When EDigest is created referencing single content part in the
message, the same procedures as described section 3 are followed,
except that EDigest header field is not placed in the same content
part. Content-ID header field must be present in the header of
content-part in the message that is referenced in "u" parameter with
content-id URL.
It is also possible to reference stationary remote content located on
http, ftp or some other service. If such content is MIME, then "h"
parameter MUST be present and include at least one header field (such
as Content-Type). If remote content is not MIME then it is
considered binary (even if it is only text) and both header and body
canonicalization is to be set to 'bare' (i.e. "c=bare,bare") and 'h"
parameter MUST NOT be present.
When multiple URLs are listed as 'u' parameter value, then the
procedure to produce hash is as follows:
1. Follow procedure described in section 3.1for header of the 1st
content referenced in 'u' parameter. This will result in buffer
with canonicalized header fields data.
2. Follow procedure described in section 3.2 for content body data
of the 1st content referenced in 'u' parameter and the result
(canonicalized body content data) is added to data from step 1.
Leibzon Expires January 11, 2006 [Page 23]
Internet-Draft Content-Digest and EDigest July 2005
3. Follow procedure described in section 3.1for header of the second
2nd referenced in 'u' parameter. Add it to the end of result
from step 2.
4. Follow procedure described in section 3.2 content body data of
the 2nd content referenced in 'u' parameter. Add it to the end
of result from step 3.
5. ...
6. Follow procedure described in section 3.2 content body data of
the last content referenced in 'u' parameter. Add it to the end
of result from previous steps.
Hash algorithm is then used on the data from last step (the result of
adding canonicalized data from all content parts) and the result goes
into "d" parameter. Similarly "s" parameter is optionally added and
is octet count of all canonicalized data.
Note that with multiple content parts in "u", the same list of header
fields from "h" parameter is used and as such this list may have to
include names of header fields that are present in one content part
but not in another one in order to produce appropriate hash that
includes all necessary data.
5.3 Verification of EDigest Header Field
Verification of EDigest header field is done only if data for all
content parts referenced in "u" are available to verifying agent. If
that is not so, verification should be aborted with error message
indicating that some of the referenced data is not available. If the
data that is not available due to temporary dns error resolving
domain name from one of the URLs in "u" parameter, then verifying
agent may choose to delay verification and attempt it again at later
time.
The procedure for verification of EDigest header field is the same as
described in section 5.3 to produce the hash which is then compared
to hash in "d" parameter. If they match, then EDigest is
successfully verified, if they do not, then verification has failed.
Leibzon Expires January 11, 2006 [Page 24]
Internet-Draft Content-Digest and EDigest July 2005
6. Examples
6.1 Simple Content-Digest as Replacement for Content-MD5
In simple form without using "h" parameter, Contest-Digest header
field can easily be used as replacement for Content-MD5 and, as
majority of Content-Digest field parameters are optional or have
default values, this does not require much more space:
For the following small text content with Content-MD5 field:
Content-Type: text/plain; format=flowed
Content-MD5: vP5T2agfLQOCooDQF3lghA==
Test Message
The replacement of Content-MD5 with Content-Digest with md5 algorithm
would be as follows (notice that hash data is the same):
Content-Type: text/plain; format=flowed
Content-Digest: v=1.0; a=md5; d="vP5T2agfLQOCooDQF3lghA=="
Test Message
When MD5 algorithm is replaced with more secure SHA1 (default when
"a" is not present), the data would look as follows:
Content-Type: text/plain; format=flowed
Content-Digest: v=1.0; d="yH0loJWEwEDzv8U7VwGZWR3rELo="
Test Message
6.2 Content-Digest used in Email Message
The use of Content-Digest for email message which consists is
composed entirely of one content part is shown in section 3.3. Here
this is expanded to show example of MIME multi-part email message
with use of Content-Digest header both for particular email content
parts and for entire email message. In all cases the default "sha1"
algorithm is used (and "a" parameter is not specifically added).
Leibzon Expires January 11, 2006 [Page 25]
Internet-Draft Content-Digest and EDigest July 2005
From: will@example.com
To: mary@example.net
Subject: Fireworks
Date: Mon, 4 Jul 2005 12:34:26 -0400
Message-ID: <will.123456789@example.com>
Mime-Version: 1.1
Content-Type: MULTIPART/signed; Boundary="NextPart"
protocol="application/pkcs7-signature"; micalg=sha1
Content-Transfer-Encoding: 7bit
Content-Digest: v=1.0; i=mail.example.com; t=2005070412342601;
h=content-type,mime-version,message-id,date;
c=nofws; d="rNqZDKbZ4eFzs/6Z67ivfIA2JPs="
This message is in MIME format. The first part should be
readable text, while the remaining parts are likely
unreadable without MIME-aware tools.
--NextPart
Content-Type: text/plain; charset="US-ASCII"
MIME-Version: 1.0
Content-ID: <218C460.u314@example.com>
Content-Digest: v=1.0; h=content-type,content-id,mime-version;
a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs="
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
--NextPart
Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: BASE64
Content-Description: S/MIME Cryptographic Signature
Content-Disposition: attachment; filename="smime.p7s"
Content-ID: <218C460.u315@example.com>
Content-Digest: v=1.0; h=content-*; t=2005070412341200;
d="HlT99tyN/wczesmLuavpsr5qXbc="
MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG
SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw
....
Note that Content-Transfer-Encoding header field is included in the
digest hash data for last content part. While this may not be good
for text data, BASE64 is well known for being the only transfer-
encoding S/MIME signature and is not likely to ever be changed by
intermediate transmission systems. The actual canonicalized data
('bare' canonicalization is assumed by default since not specified)
that goes into hash digest computation IS NOT BASE64, but binary
Leibzon Expires January 11, 2006 [Page 26]
Internet-Draft Content-Digest and EDigest July 2005
8-bit data (since digest data is added based on original data before
applying of content-transfer-encoding rules). However the data used
for hash computation of Content-Digest in the mail message header
itself (identified by t=2005070412342601) would be based on
encapsulated and encoded MIME parts within it with content-transfer-
encoding applied and so in that case BASE64 encoded data is used (and
mail message content hash also includes data from header fields of
all message parts, including Content-Digest field with
t=2005070412341200).
6.3 Content-Digest used in HTTP Transmission
Content-Digest can be used as replacement for Content-MD5 for HTTP
and is used in the same way and only when entire content part data is
transmitted. Here is an example:
Date: Sun, 10 Jul 2005 15:02:03 GMT
Accept-Ranges: bytes
ETag: "8088c-13bfe-42d137fd-windows-1251"
Server: Apache/1.3.22 (Unix) mod_deflate/1.0.21 mod_accel/1.0.31
Vary: accept-charset, user-agent
Content-Length: 80894
Content-Type: text/html; charset=windows-1251
Content-Digest: v=1.0; i=www.example.com;
h=Content-Type,Last-Modified,ETag; c=bare;
c=bare; d="MpUuKLUmoKUapc4q2kMyw3XzEUo="
Last-Modified: Sun, 10 Jul 2005 15:00:13 GMT
<html><head><title>Hello World</title></head>
<body><h2>Hello World</h2></body>
In cases when partial content data is transmitted (transmission in
chunks) an HTTP instant digest maybe used for data integrity - please
see [RFC3230] regarding this complimentary concept of digest header
field specific to each connection. To be able to verify entire data
(rather then specific chunk), EDigest with "u" parameter pointing to
permanent location of the data can be included in the header of each
chunk with Content-Location header field also present in the same
header.
6.4 EDigest used in Email
Below is shown an example from 3.2, but with EDigest (with
t=2005070510302601) being used in email header to provide hash of
particular mime parts rather then entire message as a whole (as it
was with Content-Digest in example 3.2). The message after being
delivered is then manually resent to listserver which adds additional
mime part (mail list footer) and then mail list server ads new
Leibzon Expires January 11, 2006 [Page 27]
Internet-Draft Content-Digest and EDigest July 2005
EDigest field (with t=2005070413063001). Note that in email EDigest
header fields are typically prepended to the message as trace data,
which is different then Content-Digest fields that are added together
with other Content fields by message originator and usually appear
below them in content header.
EDigest: v=1.0; i=lserv.example.org; t=2005070413063001;
u="<218C460.u314@example.com> <218C460.u315@example.com>
<fl0332.k1@example.org>";
h="content-type,mime-version,content-id,content-digest,
content-originator"; d="MJkDZynIX7LCZ8LBO/KB2UGQmU0="
Received: from box.example.net (box.example.net [10.0.2.10])
by lserv.example.org (8.12.1/8.12.1)
with ESMTP id 4d343d31 for <family-list@example.org> ;
Mon, 04 July 2005 13:06:20
Resent-From: mary@example.net
Resent-To: family-list@example.org
Resent-Date: Mon, 4 Jul 2005 13:04:10 -0400
Received: from mail.example.com (mail.example.com [10.0.0.1])
by box.example.net (8.12.1/8.12.1)
with ESMTP id nmonpqrst1 for <maxy.example.net> ;
Mon, 04 July 2005 10:33:04 +0100
EDigest: v=1.0; i=mail.example.com; t=2005070510302601;
u="<218C460.u314@example.com> <218C460.u315@example.com>";
h=content-type,mime-version,content-id,content-digest;
d="COb/tgPpFD4JNS2vYelZAkk4aHU="
From: will@example.com
To: mary@example.net
Subject: Fireworks
Date: Mon, 4 Jul 2005 10:29:15 -0400
Message-ID: <will.123456789@example.com>
Mime-Version: 1.1
Content-Type: MULTIPART/mixed; Boundary="NextPart"
Content-Transfer-Encoding: 7bit
This message is in MIME format. The first part should be
readable text, while the remaining parts are likely
unreadable without MIME-aware tools.
--NextPart
Content-Type: text/plain; charset="US-ASCII"
MIME-Version: 1.0
Content-ID: <218C460.u314@example.com>
Content-Digest: v=1.0; h=content-type,content-id,mime-version;
a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs="
Happy 4th of July,
Fireworks at pier 39 at 9:30pm, be there.
Will
Leibzon Expires January 11, 2006 [Page 28]
Internet-Draft Content-Digest and EDigest July 2005
--NextPart
Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: BASE64
Content-Description: S/MIME Cryptographic Signature
Content-Disposition: attachment; filename="smime.p7s"
Content-ID: <218C460.u315@example.com>
Content-Digest: v=1.0; h=content-*; t=2005070412341200;
d="HlT99tyN/wczesmLuavpsr5qXbc="
MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG
SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw
....
--NextPart
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Originator: "Family List" <family-list@example.org>
Content-ID: <fl0332.k1@example.org>
Content-Digest: v=1.0 h=content-*; d="c4ZKJPGIqDAfn/SrjbF8jI5448k="
_______________________________________________
private family mailing list - family-list@example.org
Leibzon Expires January 11, 2006 [Page 29]
Internet-Draft Content-Digest and EDigest July 2005
7. IANA Considerations
Two header fields are to be registered as follows:
---------------------------------------------------------------------
Header field name:
Content-Digest
Applicable protocol:
MIME
Status:
provisional
Author/Change controller:
William Leibzon <william@elan.net>
Specification document(s):
This document
Related information:
none
---------------------------------------------------------------------
---------------------------------------------------------------------
Header field name:
EDigest
Applicable protocol:
MIME, mail
Status:
provisional
Author/Change controller:
William Leibzon <william@elan.net>
Specification document(s):
This document
Related information:
none
---------------------------------------------------------------------
Note to RFC Editor: this section may be removed on publication as an
RFC
Leibzon Expires January 11, 2006 [Page 30]
Internet-Draft Content-Digest and EDigest July 2005
8. Security Considerations
This document specifies a data integrity mechanism to protects MIME
data (including MIME header) from accidental modification while in
transit from origin to destination. Data integrity with Content-
Digest and Edigest is not a replacement for end-end messaging
security architecture such as S/MIME [RFC3851]or PGP [RFC3156] but
may supplement them. Addition of EDigest in automated way by message
transport agents maybe used as basis for building automated email
signing system.
Leibzon Expires January 11, 2006 [Page 31]
Internet-Draft Content-Digest and EDigest July 2005
9. References
9.1 Normative References
[RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
April 1992.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
[RFC2049] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Five: Conformance Criteria and
Examples", RFC 2049, November 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource
Locators", RFC 2392, August 1998.
[RFC2822] Resnick, P., "Internet Message Format", RFC 2822,
April 2001.
[RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1
(SHA1)", RFC 3174, September 2001.
[RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 3548, July 2003.
[RFC3874] Housley, R., "A 224-bit One-way Hash Function: SHA-224",
RFC 3874, September 2004.
9.2 Informative References
[FIPS180-2]
"US Federal Information Processing Standards Publication
180-2", August 2002, <http://csrc.nist.gov/publications/
fips/fips180-2/fips180-2.pdf>.
[RFC1421] Linn, J., "Privacy Enhancement for Internet Electronic
Mail: Part I: Message Encryption and Authentication
Procedures", RFC 1421, February 1993.
[RFC1544] Rose, M., "The Content-MD5 Header Field", RFC 1544,
November 1993.
[RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform
Leibzon Expires January 11, 2006 [Page 32]
Internet-Draft Content-Digest and EDigest July 2005
Resource Locators (URL)", RFC 1738, December 1994.
[RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler,
"MIME Security with OpenPGP", RFC 3156, August 2001.
[RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP",
RFC 3230, January 2002.
[RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail
Extensions (S/MIME) Version 3.1 Message Specification",
RFC 3851, July 2004.
[draft-hoffman-hash-attacks-04]
Hoffman, P., "Attacks on Cryptographic Hashes in Internet
Protocols", June 2005, <http://www.ietf.org/
internet-drafts/draft-hoffman-hash-attacks-04.txt>.
Author's Address
William Leibzon
Elan Networks
500 Laurelwood Rd, Suite 12
Santa Clara, California 95054
USA
Email: william@elan.net
Leibzon Expires January 11, 2006 [Page 33]
Internet-Draft Content-Digest and EDigest July 2005
Appendix A. Collected Grammar
This appendix contains the complete ABNF grammar for the Content-
Digest and EDigest header fields. For any grammar terms that are not
specifically defined below (such as CFWS and FWS), please refer to
the [RFC2822] document and its ABNF grammar definitions.
The ABNF grammar of Content-Digest header field is as follows:
Content-Digest = "Content-Digest" ":" FWS version parameters
version = "v=" version-number CFWS ";"
version-number = "1.0" / unknown-version
unknown-version = number-major "." number-minor
number-major = 1*(digit)
number-minor = 1*(digit)
parameters = *(CFWS ";" FWS parameter) CFWS data-parameter
*(CFWS ";" FWS parameter)
data-parameter = ";" FWS "d=" value
parameter = algorithm / headerfieldlist / canonicalization /
size / timestamp / hostinfo / undefined-parameter
; Matching of parameter names is case-insensitive
undefined-parameter = undefined-name "=" undefined-value
undefined-name = token
undefined-value = value
algorithm = "a=" algorithm-name
algorithm-name = "md5" / "sha1" / "sha224" / "sha256" /
"sha384" / "sha512" / undefined-value
; Matching of algorithm names is case-insensitive
canonicalization = "c=" [header-canonicalization ","]
body-canonicalization
; Matching of header and body canonicalization is case-insensitive
header-canonicalization = "bare" / "simple" /
"nofws" / undefined-value
Leibzon Expires January 11, 2006 [Page 34]
Internet-Draft Content-Digest and EDigest July 2005
body-canonicalization = "bare" / "text" /
"mimeform" / "nofws" / "none" / undefined-value
size = "l=" 1*(digit)
timestamp = "t=" timestamp-value
timestamp-value = 1*(digit) ["." 1*(digit)]
hostinfo = "i=" value
headerfieldlist = "h=" headerfield *("," headerfield)
headerfield = field-name
; Matching of header field names is case-insensitive
field-name = 1*ftext [ "*" ]
ftext = %d33-57 / ; Any character except
%d59-126 ; controls, SP, and ":"
digit = %d48-57 ; Numeric Digit
value = token / quoted-string
token = 1*<any ASCII CHAR except SPACE, CTLs, or tspecials>
tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" /
"\" / <"> /"/" / "[" / "]" / "?" / "="
; Must be in quoted-string to use within parameter values
The ABNF grammar of EDigest header field is as follows (anything not
defined here, please refer to Content-Digest grammar):
Leibzon Expires January 11, 2006 [Page 35]
Internet-Draft Content-Digest and EDigest July 2005
EDigest = "EDigest" ":" FWS version edigest-parameters
edigest-parameters = *(CFWS ";" FWS ed-parameter) CFWS data-parameter
*(CFWS ";" FWS ed-parameter)
ed-parameter = algorithm / headerfieldlist / canonicalization /
size / timestamp / hostinfo / urlinfo / undefined-parameter
; Matching of parameter names is case-insensitive
urlinfo = "u=" quoted-url / content-id
; content-id is as defined in RFC2392
quoted-url = %d34 urldata $d34
; quoted-url must be used if urldata contains tspecials characters
urldata = oneurl 0*(FWS oneurl)
oneurl = "<" value ">"
; value above is expected to be genericurl as defined in RFC1738 syntax
; but may also be content-id as defined in RFC2392
Leibzon Expires January 11, 2006 [Page 36]
Internet-Draft Content-Digest and EDigest July 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Leibzon Expires January 11, 2006 [Page 37]