Internet-Draft | Retrofit Structured Fields for HTTP | January 2022 |
Nottingham | Expires 10 July 2022 | [Page] |
This specification defines how a selection of existing HTTP fields can be handled as Structured Fields.¶
This note is to be removed before publishing as an RFC.¶
Status information for this document may be found at https://datatracker.ietf.org/doc/draft-nottingham-http-structure-retrofit/.¶
information can be found at https://mnot.github.io/I-D/.¶
Source for this draft and an issue tracker can be found at https://github.com/mnot/I-D/labels/http-structure-retrofit.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 10 July 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Structured Field Values for HTTP [STRUCTURED-FIELDS] introduced a data model with associated parsing and serialisation algorithms for use by new HTTP field values. Header fields that are defined as Structured Fields can realise a number of benefits, including:¶
However, a field needs to be defined as a Structured Field for these benefits to be realised. Many existing fields are not, making up the bulk of header and trailer fields seen in HTTP traffic on the Internet.¶
This specification defines how a selection of existing HTTP fields can be handled as Structured Fields, so that these benefits can be realised -- thereby making them Retrofit Structured Fields.¶
It does so using two techniques. Section 2 lists compatible fields -- those that can be handled as if they were Structured Fields due to the similarity of their defined syntax to that in Structured Fields. Section 3 lists mapped fields -- those whose syntax needs to be transformed into an underlying data model which is then mapped into that defined by Structured Fields.¶
While implementations can parse and serialise Compatible Fields as Structured Fields subject to the caveats in Section 2, a sender cannot generate mapped fields from Section 3 and expect them to be understood and acted upon by the recipient without prior negotiation. This specification does not define such a mechanism.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
HTTP fields with the following names can usually have their values handled as Structured Fields according to the listed parsing and serialisation algorithms in [STRUCTURED-FIELDS], subject to the listed caveats.¶
The listed types are chosen for compatibility with the defined syntax of the field as well as with actual Internet traffic (see Appendix A). However, not all instances of these fields will successfully parse. This might be because the field value is clearly invalid, or it might be because it is valid but not parseable as a Structured Field.¶
An application using this specification will need to consider how to handle such field values. Depending on its requirements, it might be advisable to reject such values, treat them as opaque strings, or attempt to recover a structured value from them in an ad hoc fashion.¶
Note the following caveats:¶
HTTP parameter names are case-insensitive (as per Section 5.6.6 of [HTTP]), but Structured Fields require them to be all-lowercase. Although the vast majority of parameters seen in typical traffic are all-lowercase, compatibility can be improved by force-lowercasing parameters when encountered.¶
Empty and whitespace-only field values are considered errors in Structured Fields. For compatible fields, an empty field indicates that the field should be silently ignored.¶
Some ALPN tokens (e.g., h3-Q43
) do not conform to key's syntax. Since the final version of HTTP/3 uses the h3
token, this shouldn't be a long-term issue, although future tokens may again violate this assumption.¶
These Dictionary-based fields consider the key to be case-insensitive, but Structured Fields requires keys to be all-lowercase. Although the vast majority of values seen in typical traffic are all-lowercase, compatibility can be improved by force-lowercasing these Dictionary keys when encountered.¶
Content-Length is defined as a List because it is not uncommon for implementations to mistakenly send multiple values. See Section 8.6 of [HTTP] for handling requirements.¶
Only the delta-seconds form of Retry-After is supported; a Retry-After value containing a http-date will need to be either converted into delta-seconds or represented as a raw value.¶
Some HTTP fields can have their values represented in Structured Fields by mapping them into its data types and then serialising the result using an alternative field name.¶
For example, the Date HTTP header field carries a string representing a date:¶
Its value is more efficiently represented as an integer number of delta seconds from the Unix epoch (00:00:00 UTC on 1 January 1970, minus leap seconds). Thus, the example above would be mapped as:¶
As in Section 2, these fields are unable to represent values that are not parseable, and so an application using this specification will need to how to support such values. Typically, handling them using the original field name is sufficient.¶
Each field name listed below indicates a replacement field name and a means of mapping its original value into a Structured Field.¶
The following field names (paired with their replacement field names) have values that can be represented as Structured Fields by considering the original field's value as a string.¶
For example, a Location field could be represented as:¶
The following field names (paired with their replacement field names) have values that can be represented as Structured Fields by parsing their payload according to Section 7.1.1.1 of [RFC7231] and representing the result as an integer number of seconds delta from the Unix Epoch (00:00:00 UTC on 1 January 1970, minus leap seconds).¶
For example, an Expires field could be represented as:¶
Please add the following note to the HTTP Field Name Registry:¶
The "Structured Type" column indicates the type of the field as per RFC8941, if any, and may be "Dictionary", "List" or "Item". A prefix of "*" indicates that it is a retrofit type (i.e., not natively Structured); see [this specification].¶
Then, add a new column, "Structured Type", with the values from Section 2 assigned to the nominated registrations, prefixing each with "*" to indicate that it is a retrofit type.¶
Then, add the following field names into the HTTP Field Name Registry, with the corresponding Structured Type as indicated, a status of "permanent" and referring to this document:¶
Section 2 identifies existing HTTP fields that can be parsed and serialised with the algorithms defined in [STRUCTURED-FIELDS]. Variances from other implementations might be exploitable, particularly if they allow an attacker to target one implementation in a chain (e.g., an intermediary). However, given the considerable variance in parsers already deployed, convergence towards a single parsing algorithm is likely to have a net security benefit in the longer term.¶
Section 3 defines alternative representations of existing fields. Because downstream consumers might interpret the message differently based upon whether they recognise the alternative representation, implementations are prohibited from generating such fields unless they have negotiated support for them with their peer. This specification does not define such a mechanism, but any such definition needs to consider the implications of doing so carefully.¶
To help guide decisions about compatible fields, the HTTP response headers captured by the HTTP Archive https://httparchive.org in September 2021 (representing more than 528,000,000 HTTP exchanges) were parsed as Structured Fields using the types listed in Section 2, with the indicated number of successful header instances, failures, and the resulting failure rate:¶
accept 9,099 / 34 = 0.372%* accept-encoding 116,708 / 58 = 0.050%* accept-language 127,710 / 95 = 0.074%* accept-patch 281 / 0 = 0.000% accept-ranges 289,341,375 / 7,776 = 0.003% access-control-allow-credentials 36,159,371 / 2,671 = 0.007% access-control-allow-headers 25,980,519 / 23,181 = 0.089% access-control-allow-methods 32,071,437 / 17,424 = 0.054% access-control-allow-origin 165,719,859 / 130,247 = 0.079% access-control-expose-headers 20,787,683 / 1,973 = 0.009% access-control-max-age 9,549,494 / 9,846 = 0.103% access-control-request-headers 165,882 / 503 = 0.302%* access-control-request-method 346,135 / 30,680 = 8.142%* age 107,395,872 / 36,649 = 0.034% allow 579,822 / 281 = 0.048% alt-svc 56,773,977 / 4,914,119 = 7.966% cache-control 395,402,834 / 1,146,080 = 0.289% connection 112,017,641 / 3,491 = 0.003% content-encoding 225,568,224 / 237 = 0.000% content-language 3,339,291 / 1,744 = 0.052% content-length 422,415,406 / 126 = 0.000% content-type 503,950,894 / 507,133 = 0.101% cross-origin-resource-policy 102,483,430 / 799 = 0.001% expect 0 / 53 = 100.000%* expect-ct 54,129,244 / 80,333 = 0.148% host 57,134 / 1,486 = 2.535%* keep-alive 50,606,877 / 1,509 = 0.003% origin 32,438 / 1,396 = 4.126%* pragma 66,321,848 / 97,328 = 0.147% preference-applied 189 / 0 = 0.000% referrer-policy 14,274,787 / 8,091 = 0.057% retry-after 523,533 / 7,585 = 1.428% surrogate-control 282,846 / 976 = 0.344% te 1 / 0 = 0.000% timing-allow-origin 91,979,983 / 8 = 0.000% trailer 1,171 / 0 = 0.000% transfer-encoding 15,098,518 / 0 = 0.000% vary 246,483,644 / 69,607 = 0.028% x-content-type-options 166,063,072 / 237,255 = 0.143% x-frame-options 56,863,322 / 1,014,464 = 1.753% x-xss-protection 132,739,109 / 347,133 = 0.261%¶
Note that this data set only includes response headers, although some request headers are present, indicated with an asterisk (because, the Web). Also, Dictionary and Parameter keys have not been force-lowercased, with the result that any values containing uppercase keys are considered to fail.¶
The top thirty header fields in that data set that were not considered compatible are (* indicates that the field is mapped in Section 3):¶