HTTPbis | K. Morgan |
Internet-Draft | C. Brunhuber |
Intended status: Standards Track | IAEA |
Expires: December 4, 2014 | June 2, 2014 |
H2EZ: HTTP/2 Header Compression
draft-morgan-http2-header-compression-00
This specification defines the format and compression of HTTP header fields in HTTP/2. The compression is based on EZFLATE, which is a token-based DEFLATE algorithm for secure compression within encrypted communication channels.
Discussion of this draft takes place on the HTTPBIS working group mailing list (ietf-http-wg@w3.org), which is archived at http://lists.w3.org/Archives/Public/ietf-http-wg/.
Working Group information can be found at http://tools.ietf.org/wg/httpbis/;
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 4, 2014.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
[TODO: A significant portion of text the 'introduction' and 'security considerations' sections, is completely lifted from the HPACK Internet Draft; re-write or give credit somehow.]
In HTTP/1.1 (see [HTTP-p1]), header fields are encoded without any form of compression. As web pages have grown to include dozens to hundreds of requests, the redundant header fields in these requests now measurably increase latency and unnecessarily consume bandwidth (see [PERF1] and [PERF2]).
SPDY [SPDY] initially addressed this redundancy by compressing header fields using the DEFLATE format [DEFLATE], which proved very effective at efficiently representing the redundant header fields. However, that approach exposed a security risk as demonstrated by the CRIME attack (see [CRIME]).
A key observation of the CRIME and BREACH attacks is that they exploited the DEFLATE algorithm as described in RFC 1951 [DEFLATE], not the DEFLATE format (also described in RFC 1951).
EZFLATE [EZFLATE], is a token-based DEFLATE compression algorithm for secure compression within encrypted communication channels. The key feature of the EZFLATE algorithm (and primary difference to the algorithm described in RFC 1951), is that LZ77 <length, backward distance> tuples may only reference a duplicated token (octet string) occurring in a previous block, up to 32K input bytes before, if the current token exactly matches in length and octet values. As shown in Section 3.1, "Take a Bite out of CRIME", of [EZFLATE], this forces potential CRIME-like attackers to guess n-character secrets n characters at a time rather than one character at a time, making the search space no worse than a full brute-force attack.
This document defines 1) the formatting of HTTP/2 header fields, and 2) how EZFLATE is used to compress HTTP/2 header fields. Finally, this document also includes implementation recommendations.
In HTTP/1.1 [HTTP-p1] headers are defined as field-name, field-value pairs (see Section 3.2, "Header Fields" of [HTTP-p1]). The field-name and field-value are separated from each other by a single ':' character followed by optional whitespace. Each pair is separated by the two octet sequence CRLF (carriage return, line feed). The end of the header block is marked by an empty line followed by CRLF.
In HTTP/2, the headers are still field-name, field-value pairs except that each are length-prefixed octet strings to simplify parsing. In other words, the field-name and field-value are separately length-prefixed. In this way no separator is necessary between the field-name and field-value and no separator is necessary between consecutive pairs. The end of the header block is naturally defined as the end of a HTTP/2 HEADERS or PUSH_PROMISE frame with the END_HEADERS flag set [HTTP2].
Each field-name and field-value are encoded as an ASN.1 octet string [X.208-88][X.209-88]. This simply means the octets which compose a field-name or field-value are length-prefixed with a variable-length integer which indicates the exact number of octets composing the original field-name or field-value.
For ASN.1 encoded octet-strings, the length prefix is composed of at least a single octet, and optionally N additional octets, where N is exactly specified by the first octet as follows. If the length of the octets to be length-prefixed is less than 0x80 (128), then only a single octet is required for the length prefix and so the length-prefix octet is encoded as exactly the length (big endian) of the octets to be length-prefixed. If the length of the octets to be length-prefixed is greater than or equal to 0x80 (128), then the most significant bit of the first length-prefix octet is set (i.e. 0x80), and the remaining 7 bits specify the number of additional octets N (big endian) which will be added to encode the length, as an unsigned integer (big endian), of the octets to be length-prefixed .
The following are examples of computing the length prefix for converting various octet strings to ASN.1 octet strings.
The ASCII string "accept-encoding" is to be converted to an ASN.1 octet string. The length of the string is 15 octets. Since 15 is less than 128, the length prefix is simply a single octet with the value 15 (0x0f).
Hex dump of the resulting ASN.1 octet string:
0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding
The ASCII string "gzip;q=1.0, identity;q=0.5, *;q=0" is to be converted to an ASN.1 octet string. The length of the string is 33 octets. Since 33 is less than 128, the length prefix is simply a single octet with the value 33 (0x21).
Hex dump of the resulting ASN.1 octet string:
0000000 2167 7a69 703b 713d | 312e 302c 2069 6465 | !gzip;q=1.0, ide 0000010 6e74 6974 793b 713d | 302e 352c 202a 3b71 | ntity;q=0.5, *;q 0000020 3d30 | | =0
The string of octets 0x00 0x01 ... 0x7d 0x7f is to be converted to an ASN.1 octet string. The number of octets is 128. Since 128 equals 128, the first octet of the length prefix has the most significant bit set to indicate the length prefix is longer than one octet. The remaining seven bits encode the number of additional octets necessary to represent the value 128 as a big endian unsigned integer. Since encoding 128 requires one octet, the seven remaining bits encode the value 1 (0x01). The second octet encodes the value 128 (0x80).
Hex dump of the resulting ASN.1 octet string:
0000000 8180 0001 0203 0405 | 0607 0809 0a0b 0c0d | ................ 0000010 0e0f 1011 1213 1415 | 1617 1819 1a1b 1c1d | ................ 0000020 1e1f 2021 2223 2425 | 2627 2829 2a2b 2c2d | .. !"#$%&'()*+,- 0000030 2e2f 3031 3233 3435 | 3637 3839 3a3b 3c3d | ./0123456789:;<= 0000040 3e3f 4041 4243 4445 | 4647 4849 4a4b 4c4d | >?@ABCDEFGHIJKLM 0000050 4e4f 5051 5253 5455 | 5657 5859 5a5b 5c5d | NOPQRSTUVWXYZ[\] 0000060 5e5f 6061 6263 6465 | 6667 6869 6a6b 6c6d | ^_`abcdefghijklm 0000070 6e6f 7071 7273 7475 | 7677 7879 7a7b 7c7d | nopqrstuvwxyz{|} 0000080 7e7f | | ~.
The string of octets 0x00 0x01 ... 0xfe 0xff 0x00 0x01 is to be converted to an ASN.1 octet string. The number of octets is 258. Since 258 is greater than 128, the first octet of the length prefix has the most significant bit set to indicate the length prefix is longer than one octet. The remaining seven bits encode the number of additional octets necessary to represent the value 258 as a big endian unsigned integer. Since encoding 258 requires two octets, the seven remaining bits encode the value 2 (0x02). The second and third octets encode the value 258 (0x01 0x02).
Hex dump of the resulting ASN.1 octet string:
000000 8201 0200 0102 0304 | 0506 0708 090a 0b0c | ................ 000010 0d0e 0f10 1112 1314 | 1516 1718 191a 1b1c | ................ 000020 1d1e 1f20 2122 2324 | 2526 2728 292a 2b2c | ... !"#$%&'()*+, 000030 2d2e 2f30 3132 3334 | 3536 3738 393a 3b3c | -./0123456789:;< 000040 3d3e 3f40 4142 4344 | 4546 4748 494a 4b4c | =>?@ABCDEFGHIJKL 000050 4d4e 4f50 5152 5354 | 5556 5758 595a 5b5c | MNOPQRSTUVWXYZ[\ 000060 5d5e 5f60 6162 6364 | 6566 6768 696a 6b6c | ]^_`abcdefghijkl 000070 6d6e 6f70 7172 7374 | 7576 7778 797a 7b7c | mnopqrstuvwxyz{| 000080 7d7e 7f80 8182 8384 | 8586 8788 898a 8b8c | ................ 000090 8d8e 8f90 9192 9394 | 9596 9798 999a 9b9c | ................ 0000a0 9d9e 9fa0 a1a2 a3a4 | a5a6 a7a8 a9aa abac | ................ 0000b0 adae afb0 b1b2 b3b4 | b5b6 b7b8 b9ba bbbc | ................ 0000c0 bdbe bfc0 c1c2 c3c4 | c5c6 c7c8 c9ca cbcc | ................ 0000d0 cdce cfd0 d1d2 d3d4 | d5d6 d7d8 d9da dbdc | ................ 0000e0 ddde dfe0 e1e2 e3e4 | e5e6 e7e8 e9ea ebec | ................ 0000f0 edee eff0 f1f2 f3f4 | f5f6 f7f8 f9fa fbfc | ................ 000100 fdfe ff00 01 | | ................
EZFLATE is itself agnostic to tokenization methods and delimiter sets. Appropriate tokenization rules and delimiter sets depend on the type of data to be compressed. For HTTP/2 header data, each field-name and field-value are tokenized separately.
Since header field names are typically static, it is not beneficial to tokenize field names. Therefore field names are fed to the EZFLATE compressor as a single token, including the ASN.1 length prefix.
{ ' ', '\t', ';', ',' }
Figure 1
Field values, on the other hand, have varying values of staticity ranging from very static (e.g. :scheme, user-agent) to semi-static (e.g. accept, accept-encoding) to very dynamic (e.g. :path, date). Field values MAY be tokenized with following set of delimiters:
Special token delimiters for header values containing a path:
{ '/', '?', '&', '=' }
Figure 2
The ASN.1 length prefix MAY be treated as a delimiter and therefore treated as a separate token.
Finally, note that per Section 3.2 of [EZFLATE], a sequence of consecutive delimiters are to be treated as a single unit (e.g. as one token). For examle, see Figure 3 and Figure 4, where the consecutive delimiter characters ',' and ' ', are treated as a single token ", ".
The following is an example of converting an HTTP header name-value pair into ASN.1 length prefixed octet strings (see Section 2 and then tokenizing the octet string based on the tokenization rules.
accept-encoding: gzip;q=1.0, identity;q=0.5, *;q=0
The following header is to be encoded: Section 2.1.1 and Section 2.1.1).
Hex dump of the header converted to ASN.1 octet strings:
0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding 0000010 2167 7a69 703b 713d | 312e 302c 2069 6465 | !gzip;q=1.0, ide 0000020 6e74 6974 793b 713d | 302e 352c 202a 3b71 | ntity;q=0.5, *;q 0000030 3d30 | | =0
Hex dump of the header name token:
0000000 0f61 6363 6570 742d | 656e 636f 6469 6e67 | .accept-encoding
Token 1:
0000010 2167 7a69 70 | | !gzip
Token 2:
0000010 3b | | ;
Token 3:
0000010 713d | 312e 30 | q=1.0
Token 4:
0000010 | 2c 20 | ,
Figure 3
Token 5:
0000010 | 69 6465 | ide 0000020 6e74 6974 79 | | ntity
Token 6:
0000020 3b | | ;
Token 7:
0000020 713d | 302e 35 | q=0.5
Token 8:
0000020 | 2c 20 | ,
Figure 4
Token 9:
0000020 | 2a | *
Token 10:
0000020 | 3b | ;
Token 11:
0000020 | 71 | q 0000030 3d30 | | =0
The DEFLATE [DEFLATE] format has a fixed maximum length of 258 octets for matches (see Section 3.2.5 of [DEFLATE]). As such, tokens with length L greater than 258 octets SHOULD be sub-tokenized into sub-tokens as follows. The number of sub-tokens N MUST be N = (L + 257) / 258 (integer division). If N > 2, the first N - 2 sub-tokens MUST be 258 octets in length. In all cases, the final two sub-tokens MUST have length greater than or equal to 129 (258 / 2). Let R be the remaining length for the final two tokens. The length of the first remaining token is L[N - 1] = R - R / 2 and the length of the second remaining token is L[N] = R / 2 (where the index is 1..N). Alternatively, tokens with length L greater than 258 octets may simply be emitted as literals.
Consider the following examples:
N = (1292 + 257) / 258 = 1549 / 258 = 6 L1 = 258 L2 = 258 L3 = 258 L4 = 258 R = 1292 - 4 * 258 = 260 L5 = 260 - 260 / 2 = 130 L6 = 260 / 2 = 130
A token T, 1292 octets in length, is to be sub-tokenized.
N = (516 + 257) / 258 = 773 / 258 = 2 R = 516 L1 = R - R / 2 = 258 L2 = R / 2 = 258
A token T, 516 octets in length, is to be sub-tokenized.
N = (259 + 257) / 258 = 516 / 258 = 2 R = 259 L1 = R - R / 2 = 130 L2 = R / 2 = 129
A token T, 259 octets in length, is to be sub-tokenized.
An EZFLATE compressor can act as an oracle to an attacker probing the compression context. However, the attacker can only find out if the guess of the entire token is correct or not.
Still, an attacker could take advantage of this limited information for breaking low-entropy secrets using a brute-force attack. A server usually has some protections against such brute-force attack. Here, the attack would target the client, where it would be harder to detect. The attack would be even more dangerous if the attacker is able to prevent the traffic generated by its brute-force attack from reaching the server.
To offer protection against such type of attacks, and endpoint MUST use non-compressed DEFLATE blocks (see Section 3.2.4 of [DEFLATE]) to prevent the compression of any header field whose value contains a secret which could be put at risk by a brute-force attack.
Roberto Peon and Hervé Ruellan.
[EZFLATE] | Morgan, K. and C. Brunhuber, "EZFLATE: Token-based DEFLATE Compression", Internet-Draft draft-morgan-ezflate, June 2014. |
[X.208-88] | CCITT, "Recommendation X.208: Specification of Abstract Syntax Notation One (ASN.1)", January 1998. |
[X.209-88] | CCITT, "Recommendation X.209: Specification of Basic Encoding Rules for Abstract Syntax Notation One", January 1998. |
[CRIME] | Rizzo, J. and T. Duong, "The CRIME Attack", September 2012. |
[DEFLATE] | Deutsch, P., "DEFLATE Compressed Data Format Specification version 1.3", RFC 1951, May 1996. |
[HTTP-p1] | Fielding, R. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", Internet-Draft draft-ietf-httpbis-p1-messaging-26, February 2014. |
[HTTP2] | Belshe, M., Peon, R. and M. Thomson, "Hypertext Transfer Protocol version 2", Internet-Draft draft-ietf-httpbis-http2-10, February 2014. |
[SPDY] | Belshe, M. and R. Peon, "SPDY Protocol", Internet-Draft draft-mbelshe-httpbis-spdy-00, February 2012. |