Network Working Group Jacob Palme Internet Draft Stockholm University/KTH draft-ietf-mhtml-rev-04.txt Alexander Hopmann IETF status to be: Proposed standard Microsoft Corporation Revises: RFC 2110 Expires: May 1998 November 1997 MIME Encapsulation of Aggregate Documents, such as HTML (MHTML) Status of this Document This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract HTML [RFC 1866] defines a powerful means of specifying multimedia documents. These multimedia documents consist of a text/html root resource (object)and other subsidiary resources (image, video clip, applet, etc. objects) referenced by Uniform Resource Identifiers (URIs) within the text/html root resource. When an HTML multimedia document is retrieved by a browser, each of these component resources is individually retrieved in real time from a location, and using a protocol, specified by each URI. In order to transfer a complete HTML multimedia document in a single e- mail message, it is necessary to:- a) aggregate a text/html root resource and all of the subsidiary resources it references into a single composite message structure, and b) define a means by which URIs in the text/html root can reference subsidiary resources within that composite message structure. This document does both. It a) defines the use of a MIME multipart/related structure to aggregate a text/html root resource and the subsidiary resources it references, and b) specifies two MIME content-headers (Content-Base and Content-Location) that allow URIs in a multipart/related text/html root body part to reference subsidiary resources in other body parts of the same multipart/related structure. While initially designed to support e-mail transfer of complete multi- resource HTML multimedia documents, these conventions can also be employed by other transfer protocols such as HTTP and FTP to retrieve a complete multi-resource HTML multimedia document in a single transfer or for storage and archiving of complete HTML-documents. Differences between this and a previous version of this standard, which was published as RFC 2110, are summarized in chapter 13. Table of Contents 1. Introduction 2. Terminology 2.1 Conformance requirement terminology 2.2 Other terminology 3. Overview 4. The Content-Location and Content-Base MIME Content Headers 4.1 MIME content headers 4.2 The Content-Location Header 4.3 The Content-Base header 4.4 Encoding of URIs in MIME headers 5. Base URIs for resolution of relative URIs 6. Sending documents without linked objects 7. Use of the Content-Type "multipart/related" 8. Usage of Links to Other Body Parts 8.1 General principle 8.2 Resolution of URIs in text/html body parts 8.3 Use of the Content-ID header and CID URLs 8.4 Conformance requirement on receipt 9. Examples 9.1 Example of a HTML body without included linked objects 9.2 Example with an absolute URI to an embedded GIF picture 9.3 Example with a relative URI to an embedded GIF picture 9.4 Example with a relative URI and no BASE available 9.5 Example using a BASE on the Multipart 9.6 Example using CID URL and Content-ID header to an embedded GIF picture 10. Content-Disposition header 11. Character encoding issues and end-of-line issues 12. Security Considerations 13. Differences as compared to the previous version of this proposed standard in RFC 2110 14. Copyright 15. Acknowledgments 16. References 17. Author's Addresses Mailing List Information To write contributions Further discussion on this document should be done through the mailing list MHTML@SEGATE.SUNET.SE. Comments on less important details may also be sent to the editor, Jacob Palme . To subscribe To subscribe to this list, send a message to LISTSERV@SEGATE.SUNET.SE which contains the text SUB MHTML To unsubscribe To unsubscribe to this list, send a message to LISTSERV@SEGATE.SUNET.SE which contains the text UNS MHTML To access mailing list archives Archives of this list are available for bulk downloading by anonymous ftp from FTP://SEGATE.SUNET.SE/lists/mhtml/ The archives are available for browsing from HTTP://segate.sunet.se/archives/mhtml.html and in searchable format from http://www.reference.com/cgi-bin/pn/ listarch?list=MHTML@segate.sunet.se Finally, the archives are available by email. Send a message to LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of the archive files, and then a new message "GET " to retrieve the archive files. More information Information about the IETF work in developing this standard may also be available at URL: http://www.dsv.su.se/~jpalme/ietf/mhtml.html A collection of test messages is available at http://www.dsv.su.se/~jpalme/mimetest/MHTML-test-messages.html 1. Introduction There are a number of document formats (Hypertext Markup Language [HTML2], Portable Document format [PDF] and Virtual Reality Markup Language [VRML]) that specify documents consisting of a root resource and a number of distinct subsidiary resources referenced by URIs within that root resource. There is an obvious need to be able to send such multi-resource documents in e-mail [SMTP], [RFC822] messages. The standard defined in this document specifies how to aggregate such multi-resource documents in MIME-formatted [MIME1 to MIME5] messages for precisely this purpose. While this specification was developed to satisfy the specific aggregation requirements of multi-resource HTML documents, it may also be applicable to other multi-resource document representations linked by URIs. While this is the case, there is no requirement that implementations claiming conformance to this standard be able to handle any URI linked document representations other than those whose root is HTML. This aggregation into a single message of a root resource and the subsidiary resources it references may also be applicable to other protocols such as HTTP or FTP, or to the archiving of complete web pages as they appeared at a particular point in time. An informational RFC will be published as a supplement to this standard. The informational RFC will discuss implementation methods and some implementation problems. Implementors are strongly recommended to read this informational RFC when developing implementations of this standard. You can find it through URL http://www.dsv.su.se/~jpalme/ietf/mhtml.html. 2. Terminology 2.1 Conformance requirement terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [IETF-TERMS]. An implementation is not compliant if it fails to satisfy one or more of the MUST requirements for the protocols it implements. An implementation that satisfies all the MUST and all the SHOULD requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST requirements but not all the SHOULD requirements for its protocols is said to be "conditionally compliant." 2.2 Other terminology Most of the terms used in this document are defined in other RFCs. Absolute URI, See Relative Uniform Resource Locators [RELURL]. AbsoluteURI CID See Message/External Body Content-ID [MIDCID]. Content-Base See section 4.2 below. Content-ID See Message/External Body Content-ID [MIDCID]. Content-Location MIME message or content part header with the URI of the MIME message or content part body, defined in section 4.3 below. Content-Transfer- Conversion of a text into 7-bit octets as specified Encoding in [MIME1] chapter 6. CR See [RFC822]. CRLF See [RFC822]. Displayed text The text shown to the user reading a document with a web browser. This may be different from the HTML markup, see the definition of HTML markup below. Header Field in a message or content heading specifying the value of one attribute. Heading Part of a message or content before the first CRLFCRLF, containing formatted fields with attributes of the message or content. HTML See HTML 2 specification [HTML2]. HTML Aggregate HTML objects together with some or all objects, to objects which the HTML object contains hyperlinks. HTML markup A file containing HTML encodings as specified in [HTML] which may be different from the displayed text which a person using a web browser sees. For example, the HTML markup may contain "<" where the displayed text contains the character "<". LF See [RFC822]. MIC Message Integrity Codes, codes use to verify that a message has not been modified. MIME See the MIME specifications [MIME1 to MIME5]. MUA Messaging User Agent. PDF Portable Document Format, see [PDF]. Relative URI, See HTML 2 [HTML2] and RFC 1808[RELURL]. RelativeURI URI, absolute and See RFC 1866 [HTML2]. relative URL See RFC 1738 [URL]. URL, relative See Relative Uniform Resource Locators [RELURL]. VRML See Virtual Reality Markup Language [VRML]. 3. Overview An aggregate document is a MIME-encoded message that contains a root resource (object) as well as other resources that are required to represent that document (inline pictures, style sheets, applets, etc.). It is important to keep in mind that aggregate documents need to satisfy the differing needs of several audiences. This standard can also be used to send sets of linked documents which are not shown simultaneously, and where the user can use links to move between them. Mail sending agents might send aggregate documents as an encoding of normal day-to-day electronic mail. Mail sending agents might also send aggregate documents when a user wishes to mail a particular document from the web to someone else. Finally mail sending agents might send aggregate documents as automatic responders, providing access to WWW resources for non-IP connected clients. Also with other protocols such as HTTP or FTP, there may sometimes be a need to retrieve aggregate documents. Receiving agents also have several differing needs. Some receiving agents might be able to receive an aggregate document and display it just as any other text content type would be displayed. Others might have to pass this aggregate document to a browsing program, and provisions need to be made to make this possible. Finally several other constraints on the problem arise. It is important that it be possible for a document to be signed and for it to be transmitted and displayed without breaking the message integrity (MIC) check that is part of the signature. 4. The Content-Location and Content-Base MIME Content Headers 4.1 MIME content headers In order to resolve URI references to resources in other body parts, two MIME content headers are defined, Content-Location and Content-Base. Both of these headers can occur in any message or content heading, and will then be valid within this heading and over its immediate content. If they occur in multipart or message headings, they apply to its body parts only in that they can be used to derive a base for relative URIs within those body parts, and only if no such base is provided in the body part itself or in multipart or message headings closer in scope to the body part. These two headers may occur on any message or content heading, but their usage for handling hyperlinks between body parts in a message SHOULD only occur between body parts within the same multipart/related structure. At present only those URIs which are URLs are affected by these headers, but it is anticipated that in future other forms of URIs maybe affected. The syntax for these headers is, using the syntax definition tools from [RFC822]: content-location = "Content-Location:" ( absoluteURI | relativeURI ) content-base = "Content-Base:" absoluteURI where URI is restricted to the syntax for URLs as defined in Uniform Resource Locators [URL] until IETF specifies other kinds of URIs. 4.2 The Content-Location Header A Content-Location header specifies an URI that labels the content of a body part in whose heading it is placed. Its value CAN be an absolute or a relative URI. Any URI or URL scheme may be used, but use of non-standardized URI or URL schemes might entail some risk that recipients cannot handle them correctly. An The Content-Location header can be used to indicate that the data sent under this heading is also retrievable, in identical format, through normal use of this URI. If used for this purpose, it must contain an absolute URI or be resolvable, through a Content-Base header, into an absolute URI. In this case, the information sent in the message can be seen as a cached version of the original data. An URI in a Content-Location header need not refer to an resource which is globally available for retrieval using this URI (after resolution of relative URIs). However, URI-s in Content-Location headers (if absolute, or resolvable to absolute URIs) SHOULD still be globally unique. A Content-Location header can also be used to label a resource which is not retrievable by some or all recipients of a message. For example a Content-Location header may label an object which is only retrievable using this URI in a restricted domain, such as within a company-internal web space. A Content-Location header can even contain a fictitious URI. Such an URI need not be globally unique. There MUST only be a single Content-Location header in each message or content heading, whose value is a single URI. Note, however, that both one Content-Location and one Message-ID or Content-ID header are allowed in a message or content heading. In such a case, these will indicate two different, equally valid references to a body part, and either of them may be used to refer to this body part. Example of a multipart/related structure containing body parts with both Content-Location and Content-ID labels: Content-Type: "multipart/related"; boundary="boundary-example"; type="text/html" --boundary-example Content-Type: text/html; charset=US-ASCII ... ... ... ... ... ... ... ... --boundary-example Content-Type: image/gif Content-ID: <97116092511xyz*foo.bar.net> Content-Location: fiction1/fiction2 --boundary-example Content-Type: image/gif Content-ID: <97116092811xyz*foo.bar.net> Content-Location: fiction1/fiction3 --boundary-example-- 4.3 The Content-Base header A Content-Base header provides a base for resolving relative URIs occurring in other header fields in the same content heading, relative URIs occurring in other header fields nested within its content that lack their own base, or relative URIs occurring in body parts nested within its content that do not contain an embedded base specification - for example, an HTML BASE element. The value of a Content-Base header MUST be an absolute URI. Example showing which Content-Base is valid where: Content-Type: "multipart/related"; boundary="boundary-example"; type="text/html"; start= ; A Content-Base header is allowed here, and can be used ; for resolution of relative URL-s in Part 1 and Part 2, ; if these did not have any absolute base of their own. ; However, both part 1 and part 2 below have an absolute ; base, in part 1 through an absolute Content-Location header, ; in part 2 through a Content-Base header, and thus a Content- ; base up here would not be used for resolution of relative ; URLs within the body parts 1 and 2. --boundary-example Part 1: Content-Type: text/html; charset=US-ASCII Content-ID: Content-Location: http://www.ietf.cnri.reston.va.us/foo1.bar1 ; Since this Content-Location contains an absolute URL, it ; does not need to be resolved using any Content-Base header. ; A combination of a Content-Location with a relative URL ; and a Content-Base with an absolute URL would also be valid, ; as well as only a Content-Location with a relative URL ; and resolved through the Content-Base in the surrounding ; multipart heading. --boundary-example Part 2: Content-Type: text/html; charset=US-ASCII Content-ID: Content-Location: foo2.bar2 ; The Content-Base below applies to ; this relative URI Content-Base: http://www.ietf.cnri.reston.va.us/frames/ To top window --boundary-example-- 4.4 Encoding of URIs in MIME headers 4.4.1 Handling of URIs containing inappropriate characters Some documents may contain URIs with characters that are inappropriate for an RFC 822 header, either because the URI itself has an incorrect syntax according to [URL] or the URI syntax standard has been changed to allow characters not previously allowed in MIME headers. These URIs cannot be sent directly in a message header. There are two approaches that can be taken when encountering such a URI as the text to be placed in a Content-Location or Content-Base header: (a) In some situations, an implementation might be able to replace the URI with one that can be sent directly. This might be accomplished, for example, by using the encoding method of [URL] to replace inappropriate characters within the URI with ones encoded using the "%nn" encoding. This replacement MUST in that case be done both in the header and in the text/html body part that contains the URI references the header. Since the change is done in both places, a receiving agent need not decode it, and MUST NOT decode the [URL]- encoding before matching URIs to body parts. (b) The URI might be encoded using the method described in [MIME3]. This replacement MUST only be done in the header, not in the HTML text. Receiving clients must decode the [MIME3] encoding in the heading before comparing URIs in body text to URIs in Content-Location headers. With method (b), the charset parameter value "US-ASCII" SHOULD be used if the URI contains no octets outside of the 7-bit range. If such octets are present, the correct charset parameter value (derived e.g. from information about the HTML document the URI was found in) SHOULD be used. If this cannot be safely established, the value "UKNOWN-8BIT" [RFC 1428] MUST be used. Note, that for the matching of URIs in text/html body parts to URIs in Content-Location headers, the value of the charset parameter is irrelevant, but that it may be relevant for other purposes, and that incorrect labeling MUST, therefore, be avoided. Warning: Irrelevance of the charset parameter may not be true in the future, if different character encodings of the same non-English filename are used in HTML. Caution should be taken in using method (a), since, in general, this encoding cannot be applied safely to characters that are used for reserved purposes within the URI scheme. In addition, changing the HTML body which contains the URI might invalidate a message integrity check. For these reasons, this method SHOULD only be used if it is performed in cooperation with the author/owner of the documents involved. 4.4.2 Folding of long URIs Since MIME header fields have a limited length and long URIs can result in Content-Location and Content-Base headers that exceed this length , Content-Location and Content-Base headers may have to be folded. Encoding as discussed in clause 4.4.1 MUST be done before such folding. This MUST include encoding of space characters, if any. After that, the folding can be done, using the algorithm defined in [URLBODY] section 3.1. 5. Base URIs for resolution of relative URIs Relative URIs inside the contents of MIME body parts are resolved relative to a base URI using the methods for resolving relative URIs described in [RELURL]. In order to determine this base URI, the first-applicable method in the following list applies. (a) There is a base specification inside the MIME body part containing the relative URI which resolves relative URIs into absolute URIs. For example, HTML provides the BASE element for this purpose. (b) There is a Content-Base header (as defined in section 4.2), in the immediately surrounding content heading, specifying the base to be used. (c) There is a Content-Location header in the immediately surrounding heading of the body part which contains an absolute URI. This URI can serve as a base in the same way as a requested URI can serve as a base for relative URIs within a file retrieved via HTTP [HTTP]. (d) Step (b) and (c) can be repeated recursively to find a suitable Content-Base or Content-Location header in a surrounding multi-part and message heading. Note, that a base from an absolute Content-Location in an inner heading takes precedence over a base from a Content-Base or a Content-Location in a surrounding heading. (e) When the methods above do not yield an absolute URI, a base URL of "this_message:/" MUST be employed. This base URL has been defined for the sole purpose of resolving relative references within a multipart/related structure when no other base URI exists. This is also described in other words in section 8.2 below. 6. Sending documents without linked objects If a text/html resource (object) is sent without subsidiary resources , to which it is linked, it MAY be sent by itself. In this case, embedding it in a multipart/related structure is not necessary. Such a text/html resource may contain no URIs, or URIs which the recipient is expected to retrieve (if possible) via a URI specified protocol. Although not normal, a text/html resource may be sent with unresolvable links, for example when two authors exchange drafts of unfinished resources. Inclusion of URIs referencing resources which the recipient has to retrieve via an URI specified protocol may not work for some recipients. This is because not all e-mail recipients have full internet connectivity, or because URIs which work for a sender will not work for a recipient. This occurs, for example, when an URI refers to a resource within a company-internal network that is not accessible from outside the company. Note that text/html resources containing URIs that reference resources that a recipient cannot retrieve MAY be sent, although this is discouraged. For example, two persons developing a new Web page may exchange incomplete versions of that page. 7. Use of the Content-Type "multipart/related" If a message contains one or more MIME body parts containing URIs and also contains as separate body parts, resources, to which these URIs (as defined, for example, in HTML 2.0 [HTML2]) refer, then this whole set of body parts (referring body parts and referred-to body parts) SHOULD be sent within a multipart/related structure as defined in [REL]. Even though Content-Location and Content-Base headers can occur in a message that lacks an associated a multipart/related structure, this standard only covers their use for resolution of URIs between body parts inside a single multipart/related structure. This standard does not cover URIs from one multipart/related structure to another multipart/related structure in a message containing multiple multipart/related objects either in parallel or nested one within the other. When the start body part of a multipart/related structure is an atomic object, such as a text/html resource, it SHOULD be employed as the root resource of that multipart/related structure. When the start body part of a multipart/related structure is a multipart/alternative structure, and that structure contains at least one alternative body part which is a suitable atomic object, such as a text/html resource, then that body part SHOULD be employed as the root resource of the aggregate document. Implementors are warned, however, that some receiving agents treat multipart/alternative as if it had been multipart/mixed (even though MIME [MIME1] requires support for multipart/alternative). [REL] specifies that a type parameter is mandatory in a "Content-Type: multipart/related" header, and requires that it be employed to specify the type of the multipart/related start object. Thus, the type parameter value shall be "multipart/alternative", when the start part is of "Content-type multipart/alternative", even if the actual root resource is of type "text/html". In addition, if the multipart/related start object is not the first body part in a multipart/related structure, [REL] further requires that its Content-ID MUST be specified as the value of a start parameter in the "Content-Type: multipart/related" header. When rendering a resource in a multipart/related structure, URI references within that resource can be satisfied by body parts within the same multipart/related structure. This is useful: (a) For those recipients who only have email but not full Internet access. (b) For those recipients who for other reasons, such as firewalls or the use of company-internal links, cannot retrieve URI referenced resources via URI specified protocols. Note, that this means that you can, via e-mail, send text/html objects which includes URIs which the recipient cannot resolve via HTTP or other connectivity-requiring URIs. (c) To send a document whose content is preserved even if the resources to which embedded URIs refer are later changed or deleted. (d) For resources which are not available for protocol based retrieval. (e) To speed up access. When a sending MUA sends objects which were retrieved from the WWW, it SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into some other URI form prior to transmitting them. This will allow the receiving MUA to both verify MICs included with the message, as well as verify the documents against their WWW counterpoints, if this is appropriate. In certain cases this will not work - for example, if a resource contains URIs as parameters to objects and applets. In such a case, it might be better to rewrite the document before sending it. This problem is discussed in more detail in the informational RFC which will be published as a supplement to this standard. This standard does not cover the case where a resource in a multipart/related structure contains URIs that reference MIME body parts outside of the current multipart/related structure or in other MIME messages, even if methods similar to those described in this standard are used. Implementors who employ such URIs are warned that receiving agents implementing this standard may not be able to process them. Within a multipart/related structure, each body part MUST have, if assigned, a different Content-ID header value and a Content-Location header values which resolves to a different URI. Two body parts in the same multipart/related structure can have the same relative Content-Location header value, only if when resolved to absolute URIs in combination with Content-Base header values, they are then different. 8. Usage of Links to Other Body Parts 8.1 General principle A body part, such as a text/html body part, may contain URIs that reference resources which are included as body parts in the same message -- in detail, as body parts within the same multipart/related structure. Often such URI linked resources are meant to be displayed inline to the viewer of the referencing body part; for example, objects referenced with the SRC attribute of the IMG element in HTML 2.0 [HTML2]. New elements and attributes with this property are proposed in the ongoing development of HTML (examples: applet, frame, profile, OBJECT, classid, codebase, data, SCRIPT). A sender might also want to send a set of HTML documents which the reader can traverse, and which are related with the attribute href of the A element. In order to send such messages, there is a need to specify how a URI in one body part can reference a resource in another body part. 8.2 Resolution of URIs in text/html body parts The resolution of URIs in text/html body parts is performed in the following way: (a) Unfold multiple line header values according to [URLBODY]. Do NOT however translate character encodings of the kind described in [URL]. Example: Do not transform "a%2eb/c%20d" into "a/b/c d". (b) Remove all MIME encodings, such as content-transfer encoding and header encodings as defined in MIME part 3 [MIME3] Do NOT however translate character encodings of the kind described in [URL]. Example: Do not transform "a%2eb/c%20d" into "a/b/c d". (c) Try to resolve all relative URIs in the HTML content and in Content-Location headers using the procedure described in chapter 5 above. The result of this resolution can be an absolute URI, or a fictitious absolute URI with the base "this_message:/" as specified in chapter 5. (d) For each referencing URI in a text/html body part, compare the value of the referencing URI after resolution as described in (a) and (b), with the URI derived from Content-ID and Content-Location headers for other body parts within the same Multipart/related structure. If the strings are identical, octet by octet, then the referencing URI references that body part. This comparison will only succeed if the two URIs are identical. This means that if one of the two URIs to be compared was a fictitious absolute URI with the base"this_message:/", the other must also be such a fictitious absolute URI, and not resolvable to a real absolute URI. (e) If (d) fails, try to retrieve the URI referenced resource hyperlink through ordinary Internet lookup. Resolution of URIs of the URL-types "mid" or "cid" to other content-parts, outside the same multipart/related structure, or in other separately sent messages, is not covered by this standard, and is thus neither encouraged nor forbidden. 8.3 Use of the Content-ID header and CID URLs When CID (Content-ID) URLs as defined in [URL] and [MIDCID] are used to reference other body parts, they MUST only be matched against Content-ID header values, and not against Content-Location header with CID: Ö values. Thus, even though the following two headers are identical in meaning, only Content-ID value will be matched, and the Content-Location value will be ignored. Content-ID: Content-Location: CID: foo@bar.net Note: Content-IDs MUST be globally unique [MIME1]. It is thus not permitted to make them unique only within a message or within a single multipart/related structure. 8.4 Conformance requirement on receipt An e-mail system which claims conformance to this standard MUST support receipt of multipart/related structures (as defined in section 7) with URIs referencing body parts using both the Content-Location (as defined in section 8.2) and the Content-ID method (as defined in section 8.3). 9. Examples Warning: If there is a contradiction between the explanatory text and the examples in this standard, then the explanatory text, not the examples are normative. 9.1 Example of a HTML body without included linked objects The first example is the simplest form of an HTML email message. This message does not contain an aggregate HTML object, but simply a message with a single HTML body part. This body part contains a URI but the messages does not contain the resource referenced by that URI. To retrieve the resource referenced by the URI the receiving client would need either IP access to the Internet, or an electronic mail web gateway. From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: text/html; charset=US-ASCII

Hi there!

An example of an HTML message.

Try clicking here.

9.2 Example with an absolute URI to an embedded GIF picture From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example"; type="text/html"; start= --boundary-example Content-Type: text/html;charset=US-ASCII Content-ID: ... text of the HTML document, which might contain a URI referencing a resource in another body part, for example through a statement such as: IETF logo --boundary-example Content-Location: http://www.ietf.cnri.reston.va.us/images/ietflogo.gif Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A etc... --boundary-example-- 9.3 Example with a relative URI to an embedded GIF picture From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example"; type="text/html" --boundary-example Content-Base: http://www.ietf.cnri.reston.va.us/ Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE ... text of the HTML document, which might contain a URI referencing a resource in another body part, for example through a statement such as: IETF logo Example of a copyright sign encoded with Quoted-Printable: =A9 Example of a copyright sign mapped onto HTML markup: ¨ --boundary-example Content-Location: http://www.ietf.cnri.reston.va.us/images/ietflogo.gif Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A etc... --boundary-example-- 9.4 Example with a relative URI and no BASE available From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example"; type="text/html" --boundary-example Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE ... text of the HTML document, which might contain a URI referencing a resource in another body part, for example through a statement such as: IETF logo Example of a copyright sign encoded with Quoted-Printable: =A9 Example of a copyright sign mapped onto HTML markup: ¨ --boundary-example Content-Location: ietflogo.gif Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A etc... --boundary-example-- 9.5 Example using a BASE on the Multipart From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example"; type="text/html" Content-Base: http://www.ietf.cnri.reston.va.us/ --boundary-example Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE ... text of the HTML document, which might contain a URI referencing a resource in another body part, for example through a statement such as: IETF logo Example of a copyright sign encoded with Quoted-Printable: =A9 Example of a copyright sign mapped onto HTML markup: ¨ --boundary-example Content-Location: ietflogo.gif Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A etc... --boundary-example-- 9.6 Example using CID URL and Content-ID header to an embedded GIF picture From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example"; type="text/html" --boundary-example Content-Type: text/html; charset=US-ASCII ... text of the HTML document, which might contain a URI referencing a resource in another body part, for example through a statement such as: IETF logo --boundary-example Content-Location: CID:something@else ; this header is disregarded Content-ID: Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A etc... --boundary-example-- 10. Content-Disposition header Note the specification in [REL] on the relations between Content-Disposition and multipart/related. 11. Character encoding issues and end-of-line issues For the encoding of characters in HTML documents and other text documents into a MIME-compatible octet stream, the following mechanisms are relevant: - HTML [HTML2], [HTML-I18N] as an application of SGML [SGML] allows characters to be denoted by character entities as well as by numeric character references (e.g. "Latin small letter a with acute accent" may be represented by "á" or "á") in the HTML markup. - HTML documents, in common with other documents of the MIME Content-Type "text", can be represented in MIME using one of several character encodings. The MIME Content-Type "charset" parameter value indicates the particular encoding used. For the exact meaning and use of the "charset" parameter, please see [MIME2] chapter 4. Note that the "charset" parameter refers only to the MIME character encoding. For example, the string "á" can be sent in MIME with "charset=US-ASCII", while the raw character "Latin small letter a with acute accent" cannot. The above mechanisms are well defined and documented, and therefore not further explained here. In sending a message, all the above mentioned mechanisms MAY be used, and any mixture of them MAY occur when sending the document in MIME format. Receiving user agents (together with any Web browser they may use to display the document) MUST be capable of handling any combinations of these mechanisms. Also note that: - Any documents including HTML documents that contain octet values outside the 7-bit range need a content-transfer-encoding applied before transmission over certain transport protocols [MIME1, chapter 5]. - The MIME standard [MIME2] requires that e-mailed documents of "Content-Type: Text/Ö MUST be in canonical form before a Content-Transfer-Encoding is applied, i.e. that line breaks are encoded as CRLFs, not as bare CRs or bare LFs or something else. This is in contrast to [HTTP] where section 3.6.1 allows other representations of line breaks. Note that this might cause problems with integrity checks based on checksums, which might not be preserved when moving a document from the HTTP to the MIME environment. If a document has to be converted in such a way that a checksum based message integrity check becomes invalid, then this integrity check header SHOULD be removed from the document. Other sources of problems are Content-Encoding used in HTTP but not allowed in MIME, and charsets that are not able to represent line breaks as CRLF. A good overview of the differences between HTTP and MIME with regards to Content-Type: "text" can be found in [HTTP], appendix C. If the original document has line breaks in the canonical form (CRLF), then the document SHOULD remain unconverted so that integrity check sums are not invalidated. A provider of HTML documents who wants his documents to be transferable via both HTTP and SMTP without invalidating checksum integrity checks, should always provide original documents in the canonical form with CRLF for line breaks. Some transport mechanisms may specify a default "charset" parameter if none is supplied [HTTP, MIME1]. Because the default differs for different mechanisms, when HTML is transferred through e-mail, the charset parameter SHOULD be included, rather than relying on the default. 12. Security Considerations Some Security Considerations include the potential to send someone an object, and claim that it is represented by a particular URI (by giving it a Content-Location header). There can be no assurance that a WWW request (like HTTP or FTP) for that same URI would normally result in that same object. It might be unsuitable to cache the data in such a way that the cached data can be used for retrieval of this URI from sources other than body parts included in the same multipart/related structure as the Content-Location header. Because of this problem, receiving User Agents SHOULD not cache this data in the same way that data that was retrieved through an HTTP or FTP request might be cached. URIs, especially File URIs, may in their name contain company-internal information, which may then inadvertently be revealed to recipients of documents containing such URIs. One way of implementing messages with URI linked body parts is to handle the linked body parts in a combined mail and WWW proxy server. The mail client is only given the start body part, which it passes to a web browser. This web browser requests the linked parts from the proxy server. If this method is used, and if the combined server is used by more than one user, then methods must be employed to ensure that body parts of a message to one person is not retrievable by another person. Use of passwords (also known as tickets or magic cookies) is one way of achieving this. Note that some caching WWW proxy servers may not distinguish between cached objects from email and HTTP, which may be a security risk. In addition, by allowing people to mail aggregate objects, we are opening the door to other potential security problems that until now were only problems for WWW users. For example, some HTML documents now either themselves contain executable content (JavaScript) or contain links to executable content (The "INSERT" specification, Java). It would be exceedingly dangerous for a receiving User Agent to execute content received through a mail message without careful attention to restrictions on the capabilities of that executable content. Some WWW applications hide passwords and tickets (access tokens to information which may not be available to anyone) and other sensitive information in hidden fields in the web documents or in on-the-fly constructed URIs. If a person gets such a document, and forwards it via email, the person may inadvertently disclose sensitive information. 13. Differences as compared to the previous version of this proposed standard in RFC 2110 The specification has been changed to show that the formats described do not only apply to multipart MIME in email, but also to multipart MIME transferred through other protocols such as HTTP or FTP. In order to agree with [RELURL], Content-Base headers in multipart Content-Headings can now be used to resolve relative URIs in their component parts, but only if no base URI can be derived from the component part itself. Base URIs in inner headings, both in Content- Base and Content-Location headers, have precedence over base URIs in outer multipart headings. Specification has been added that a Content-Heading cannot contain more than one Content-Location header. A section 4.4.1 has been added, specifying how to handle the case of sending a body part whose URI does not agree with the correct URI syntax. The handling of relative and absolute URIs for matching between body parts have been merged into a single description, by specifying that relative URIs which cannot be resolved otherwise should be handled as if they had been given imaginary URL "this_message:/". 14. Copyright Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 15. Acknowledgments Harald T. Alvestrand, Richard Baker, Isaac Chan, Dave Crocker, Martin J. Duerst, Lewis Geer, Roy Fielding, Ned Freed, Al Gilman, Paul Hoffman, Andy Jacobs, Richard W. Jesmajian, Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin Nicol, Martyn W. Peck, Pete Resnick, Nick Shelness, Jon Smirl, Einar Stefferud, Jamie Zawinski, Steve Zilles and several other people have helped us with preparing this document. I alone take responsibility for any errors which may still be in the document. 16. References Ref. Author, title --------- -------------------------------------------------------- [CONDISP] R. Troost, S. Dorner: "Communicating Presentation Information in Internet Messages: The Content-Disposition Header", RFC 1806, June 1995. [HOSTS] R. Braden (editor): "Requirements for Internet Hosts -- Application and Support", STD-3, RFC 1123, October 1989. [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, & M. Duerst: "Internationalization of the Hypertext Markup Language". RFC 2070, January 1997. [HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0", RFC 1866, November 1995. [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996. [MD5] R. Rivest: "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [MIDCID] E. Levinson: Message/External-Body Content-ID Access"Message/External-Body Content-ID and Message-ID Uniform Resource Locators", draft-ietf-mhtml-cid-v2- 00.txt, July 1997. [MIME1] N. Freed, N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, December 1996. . [MIME-IMB] N. Freed & N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bedies". RFC 2045, November 1996. [MIME2] N. Freed, N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, December 1996. [MIME3] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, December 1996. [MIME1] N. Borenstein & N. Freed: "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and ing and Describing the Format of Internet Message Bodies", RFC 1521, Sept 1993. [MIME4] N. Freed, J. Klensin, J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", RFC 2048, January 1997. [MIME5] "Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples", RFC 2049, December 1996. [NEWS] M.R. Horton, R. Adams: "Standard for interchange of USENET messages", RFC 1036, December 1987. [PDF] Tim Bienz and Richar Cohn: "Portable Document Format Reference Manual", Addison-Wesley, Reading, MA, USA, 1993, ISBN 0-201-62628-4. [REL] Edward Levinson: "The MIME Multipart/Related"multipart/related" Content-Type", draft-ietf-mhtml-re-v2-00.txt, September 1997. [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC 1808, June 1995. [RFC822] D. Crocker: "Standard for the format of ARPA Internet text messages." STD 11, RFC 822, August 1982. [SGML] ISO 8879. Information Processing -- Text and Office - Standard Generalized Markup Language (SGML), 1986. [SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC 821, August 1982. [URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform Resource Locators (URL)", RFC 1738, December 1994. [URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME External-Body Access-Type", RFC 2017, October 1996. [VRML] Gavin Bell, Anthony Parisi, Mark Pesce: "Virtual Reality Modeling Language (VRML) Version 1.0 Language Specification." May 1995, http://www.vrml.org/Specifications/. [IETF-TERMS] S. Bradner: Key words for use in RFCs to Indicate Requirements Levels. RFC 2119, March 1997. 17. Author's Addresses For contacting the editors, preferably write to Jacob Palme rather than Alex Hopmann. Jacob Palme Phone: +46-8-16 16 67 Stockholm University and KTH Fax: +46-8-783 08 29 Electrum 230 Email: jpalme@dsv.su.se S-164 40 Kista, Sweden Alex Hopmann Email: alexhop@microsoft.com Microsoft Corporation 3590 North First Street Suite 300 San Jose CA 95134 Working group chairman: Einar Stefferud