None | A. Barth |
Internet-Draft | I. Hickson |
Expires: August 09, 2011 | Google, Inc. |
February 05, 2011 |
Media Type Sniffing
draft-ietf-websec-mime-sniff-02
Many web servers supply incorrect Content-Type header fields with their HTTP responses. In order to be compatible with these servers, user agents consider the content of HTTP responses as well as the Content-Type header fields when determining the effective media type of the response. This document describes an algorithm for determining the effective media type of HTTP responses that balances security and compatibility considerations.
Please send feedback on this draft to websec@ietf.org.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 09, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The HTTP Content-Type header field indicates the media type of an HTTP response. However, many HTTP servers supply a Content-Type that does not match the actual contents of the response. Historically, web browsers have tolerated these servers by examining the content of HTTP responses in addition to the Content-Type header field to determine the effective media type of the response.
Without a clear specification of how to "sniff" the media type, each user agent implementor was forced to reverse engineer the behavior of the other user agents and to develop their own algorithm. These divergent algorithms have lead to a lack of interoperability between user agents and to security issues when the server intends an HTTP response to be interpreted as one media type but some user agents interpret the responses as another media type.
These security issues are most severe when an "honest" server lets potentially malicious users upload files and then serves the contents of those files with a low-privilege media type (such as text/plain or image/jpeg). (Malicious servers, of course, can specify an arbitrary media type in the Content-Type header field.) In the absence of media type sniffing, this user-generated content would not be interpreted as a high-privilege media type, such as text/html. However, if a user agent does interpret a low-privilege media type, such as image/gif, as a high-privilege media type, such as text/html, the user agent has created a privilege escalation vulnerability in the server. For example, a malicious user might be able to leverage content sniffing to mount a cross-site script attack by including JavaScript code in the uploaded file that a user agent treats as text/html.
This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent implementors with the security constraints. The algorithm has been constructed with reference to content sniffing algorithms present in popular user agents, an extensive database of existing web content, and metrics collected from implementations deployed to a sizable number of users [BarthCaballeroSong2009].
WARNING! Whenever possible, user agents SHOULD NOT employ a content sniffing algorithm. However, if a user agent does employ a content sniffing algorithm, the user agent SHOULD use the algorithm in this document because using a different content sniffing algorithm than servers expect causes security problems. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a user agent believes the content to be HTML (and thus privileged to execute any scripts contained therein), an attacker might be able to steal the user's authentication credentials and mount other cross-site scripting attacks.
Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
The explicit media type metadata information associated with sequence of octets depends on the protocol that was used to fetch the octets.
For octets received via HTTP, the Content-Type HTTP header field, if present, indicates the media type. Let the official-type be the media type indicted by the HTTP Content-Type header field, if present. If the Content-Type header field is absent or if its value cannot be interpreted as a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then there is no official-type.
For octets fetched from the file system, user agents should use platform-specific conventions (e.g., operating system file extension/type mappings) to determine the official-type.
For octets fetched over some other protocols, e.g. FTP, there is no type information.
Note: Comparisons between media types, as defined by MIME specifications, are done in an ASCII case-insensitive manner. [RFC2046]
+-------------------------------+--------------------------------+ | Bytes in Hexadecimal | Textual Representation | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 | | 3b 20 63 68 61 72 73 65 74 3d | | | 49 53 4f 2d 38 38 35 39 2d 31 | | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=iso-8859-1 | | 3b 20 63 68 61 72 73 65 74 3d | | | 69 73 6f 2d 38 38 35 39 2d 31 | | +-------------------------------+--------------------------------+ | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=UTF-8 | | 3b 20 63 68 61 72 73 65 74 3d | | | 55 54 46 2d 38 | | +-------------------------------+--------------------------------+
The user agent MUST use the following algorithm to determine the sniffed-type of a sequence of octets:
...then jump to the "text or binary" section below.
This section defines the *rules for distinguishing if a resource is text or binary*.
+----------------------+--------------+ | Bytes in Hexadecimal | Description | +----------------------+--------------+ | FE FF | UTF-16BE BOM | | FF FE | UTF-16LE BOM | | EF BB BF | UTF-8 BOM | +----------------------+--------------+
...then let the sniffed-type be "text/plain" and abort these steps.
+-------------------------+ | Binary Data Byte Ranges | +-------------------------+ | 0x00 -- 0x08 | | 0x0B | | 0x0E -- 0x1A | | 0x1C -- 0x1F | +-------------------------+
+-------------------+-------------------+-----------------+------------+ | Mask in Hex | Pattern in Hex | Sniffed Type | Security | +-------------------+-------------------+-----------------+------------+ | FF FF FF DF DF DF | WS 3C 21 44 4F 43 | text/html | Scriptable | | DF DF DF DF FF DF | 54 59 50 45 20 48 | | | | DF DF DF FF | 54 4D 4C _> | | | | Comment: <!DOCTYPE HTML | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 48 54 4D 4C | text/html | Scriptable | | FF | _> | | | | Comment: <HTML | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 48 45 41 44 | text/html | Scriptable | | FF | _> | | | | Comment: <HEAD | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 53 43 52 49 | text/html | Scriptable | | DF DF FF | 50 54 _> | | | | Comment: <SCRIPT | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 49 46 52 41 | text/html | Scriptable | | DF DF FF | 4d 45 _> | | | | Comment: <IFRAME | +-------------------+-------------------+-----------------+------------+ | FF FF DF FF FF | WS 3C 48 31 _> | text/html | Scriptable | | Comment: <H1 | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF FF | WS 3C 44 49 56 _> | text/html | Scriptable | | Comment: <DIV | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 46 4f 4e 54 | text/html | Scriptable | | FF | _> | | | | Comment: <FONT | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 54 41 42 4c | text/html | Scriptable | | DF FF | 45 _> | | | | Comment: <TABLE | +-------------------+-------------------+-----------------+------------+ | FF FF DF FF | WS 3C 41 _> | text/html | Scriptable | | Comment: <A | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 53 54 59 4c | text/html | Scriptable | | DF FF | 45 _> | | | | Comment: <STYLE | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 54 49 54 4c | text/html | Scriptable | | DF FF | 45 _> | | | | Comment: <TITLE | +-------------------+-------------------+-----------------+------------+ | FF FF DF FF | WS 3C 42 _> | text/html | Scriptable | | Comment: <B | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF DF DF | WS 3C 42 4f 44 59 | text/html | Scriptable | | FF | _> | | | | Comment: <BODY | +-------------------+-------------------+-----------------+------------+ | FF FF DF DF FF | WS 3C 42 52 _> | text/html | Scriptable | | Comment: <BR | +-------------------+-------------------+-----------------+------------+ | FF FF DF FF | WS 3C 50 _> | text/html | Scriptable | | Comment: <P | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | WS 3C 21 2d 2d _> | text/html | Scriptable | | Comment: <!-- | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | WS 3C 3f 78 6d 6c | text/xml | Scriptable | | Comment: <?xml (Note the case sensitivity and lack of trailing _>) | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF | 25 50 44 46 2D | application/pdf | Scriptable | | Comment: The string "%PDF-", the PDF signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | 25 21 50 53 2D 41 | application/ | Safe | | FF FF FF FF FF | 64 6F 62 65 2D | postscript | | | Comment: The string "%!PS-Adobe-", the PostScript signature. | +-------------------+-------------------+-----------------+------------+ | FF FF 00 00 | FE FF 00 00 | text/plain | n/a | | Comment: UTF-16BE BOM | +-------------------+-------------------+-----------------+------------+ | FF FF 00 00 | FF FE 00 00 | text/plain | n/a | | Comment: UTF-16LE BOM | +-------------------+-------------------+-----------------+------------+ | FF FF FF 00 | EF BB BF 00 | text/plain | n/a | | Comment: UTF-8 BOM | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | 47 49 46 38 37 61 | image/gif | Safe | | Comment: The string "GIF87a", a GIF signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | 47 49 46 38 39 61 | image/gif | Safe | | Comment: The string "GIF89a", a GIF signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | 89 50 4E 47 0D 0A | image/png | Safe | | FF FF | 1A 0A | | | | Comment: The PNG signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF | FF D8 FF | image/jpeg | Safe | | Comment: A JPEG SOI marker followed by a octet of another marker. | +-------------------+-------------------+-----------------+------------+ | FF FF | 42 4D | image/bmp | Safe | | Comment: The string "BM", a BMP signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF 00 00 | 52 49 46 46 00 00 | image/webp | Safe | | 00 00 FF FF FF FF | 00 00 57 45 42 50 | | | | FF FF | 56 50 | | | | Comment: "RIFF" followed by four bytes, followed by "WEBPVP". | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF | 00 00 01 00 | image/vnd. | Safe | | | | microsoft.icon | | | Comment: A Windows Icon signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF | 4F 67 67 53 00 | application/ogg | Safe | | Comment: An Ogg audio or video signature. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF 00 00 | 52 49 46 46 00 00 | audio/wave | Safe | | 00 00 FF FF FF FF | 00 00 57 41 56 45 | | | | Comment: "RIFF" followed by four bytes, followed by "WAVE". | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF | 1A 45 DF A3 | video/webm | Safe | | Comment: The WebM signature [TODO: Use more octets?] | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF FF FF | 52 61 72 20 1A 07 | application/ | Safe | | FF | 00 | x-rar-compressed| | | Comment: A RAR archive. | +-------------------+-------------------+-----------------+------------+ | FF FF FF FF | 50 4B 03 04 | application/zip | Safe | | Comment: A ZIP archive. | +-------------------+-------------------+-----------------+------------+ | FF FF FF | 1F 8B 08 | application/ | Safe | | | | x-gzip | | | Comment: A GZIP archive. | +-------------------+-------------------+-----------------+------------+ [TODO: MP3 audio.]
The table used by the above algorithm is:
User agents MAY support additional types if necessary, by implicitly adding to the above table. However, user agents SHOULD NOT not use any other patterns for types already mentioned in the table above because this could then be used for privilege escalation (where, e.g., a server uses the above table to determine that content is not HTML and thus safe from cross-site scripting attacks, but then a user agent detects it as HTML anyway and allows script to execute). In extending this table, user agents SHOULD NOT introduce any privilege escalation vulnerabilities.
Note: The column marked "security" is used by the algorithm in the "text or binary" section, to avoid sniffing text/plain content as a type that can be used for a privilege escalation attack.
This section defines whether a sequence of n octets *matches the signature for H.264*.
If n is less than 4, then the sequence does not match the signature for H264 and abort these steps.
Let box-size be the value of the first four octets, interpreted as a 32 bit unsigned, little-endian integer.
If n is less than box-size or if box-size is not evenly divisible by 4, then the sequence does not match the signature for H264 and abort these steps.
If octets 5 through 8 (inclusive) of the sequence are not 0x66 0x74 0x79 0x70 (the ASCII string "ftyp"), then the sequence does not match the signature for H264 and abort these steps.
For each i from 2 to box-size/4 - 1 (inclusive):
The sequence does not match the signature for H264.
This section defines the *rules for sniffing images specifically*.
If the official-type is "image/svg+xml", then let the sniffed-type be the official-type (an XML type) and abort these steps.
If the first octets match one of the signatures in Section 5 for one of the following media types, then let the sniffed-type be the corresponding media type and abort these steps:
Otherwise, let the sniffed-type be the official-type and abort these steps.
This section defines the *rules for sniffing videos specifically*.
If the first octets match one of the signatures in Section 5 for one of the following media types, then let the sniffed-type be the corresponding media type and abort these steps:
Otherwise, let the sniffed-type be the official-type and abort these steps.
This section defines the *rules for sniffing fonts specifically*.
TODO
Otherwise, let the sniffed-type be the official-type and abort these steps.
+----------------------+------------------------------------+---------+ | Bytes in Hexadecimal | Requirement | Comment | +----------------------+------------------------------------+---------+ | 72 73 73 | Let the sniffed-type be | rss | | | "application/rss+xml" and abort | | | | these steps. | | +----------------------+------------------------------------+---------+ | 66 65 65 64 | Let the sniffed-type be | feed | | | "application/atom+xml" and abort | | | | these steps. | | +----------------------+------------------------------------+---------+ | 72 64 66 3A 52 44 46 | Continue to the next step in this | rdf:RDF | | | algorithm. | | +----------------------+------------------------------------+---------+
If none of the octet sequences above match the octets in s starting at pos, then let the sniffed-type be "text/html" and abort these steps.
For efficiency reasons, implementations might wish to implement this algorithm and the algorithm for detecting the character encoding of HTML documents in parallel.
[BarthCaballeroSong2009] | Barth, A., Caballero, J. and D. Song, "Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves", 2009. |