Network Working Group | K. Oku |
Internet-Draft | DeNA Co, Ltd. |
Intended status: Standards Track | M. Nottingham |
Expires: January 9, 2017 | July 8, 2016 |
Cache Digests for HTTP/2
draft-ietf-httpbis-cache-digest-00
This specification defines a HTTP/2 frame type to allow clients to inform the server of their cache’s contents. Servers can then use this to inform their choices of what to push to clients.
Discussion of this draft takes place on the HTTP working group mailing list (ietf-http-wg@w3.org), which is archived at https://lists.w3.org/Archives/Public/ietf-http-wg/.
Working Group information can be found at http://httpwg.github.io/; source code and issues list for this draft can be found at https://github.com/httpwg/http-extensions/labels/cache-digest.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 9, 2017.
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
HTTP/2 [RFC7540] allows a server to “push” synthetic request/response pairs into a client’s cache optimistically. While there is strong interest in using this facility to improve perceived Web browsing performance, it is sometimes counterproductive because the client might already have cached the “pushed” response.
When this is the case, the bandwidth used to “push” the response is effectively wasted, and represents opportunity cost, because it could be used by other, more relevant responses. HTTP/2 allows a stream to be cancelled by a client using a RST_STREAM frame in this situation, but there is still at least one round trip of potentially wasted capacity even then.
This specification defines a HTTP/2 frame type to allow clients to inform the server of their cache’s contents using a Golumb-Rice Coded Set. Servers can then use this to inform their choices of what to push to clients.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119].
The CACHE_DIGEST frame type is 0xf1. NOTE: This is an experimental value; if standardised, a permanent value will be assigned.
+-----------------------------------------------+ | Digest-Value? (\*) ... +-----------------------------------------------+
The CACHE_DIGEST frame payload has the following fields:
The CACHE_DIGEST frame defines the following flags:
A CACHE_DIGEST frame can be sent from a client to a server on any stream in the “open” state, and conveys a digest of the contents of the client’s cache for associated stream.
In typical use, a client will send one or more CACHE_DIGESTs immediately after the first request on a connection for a given origin, on the same stream, because there is usually a short period of inactivity then, and servers can benefit most when they understand the state of the cache before they begin pushing associated assets (e.g., CSS, JavaScript and images). Clients MAY send CACHE_DIGEST at other times.
If the cache’s state is cleared, lost, or the client otherwise wishes the server to stop using previously sent CACHE_DIGESTs, it can send a CACHE_DIGEST with the RESET flag set.
When generating CACHE_DIGEST, a client MUST NOT include cached responses whose URLs do not share origins [RFC6454] with the request of the stream that the frame is sent upon.
CACHE_DIGEST allows the client to indicate whether the set of URLs used to compute the digest represent fresh or stale stored responses, using the STALE flag. Clients MAY decide whether to only sent CACHE_DIGEST frames representing their fresh stored responses, their stale stored responses, or both.
Clients can choose to only send a subset of the suitable stored responses of each type (fresh or stale). However, when the CACHE_DIGEST frames sent represent the complete set of stored responses of a given type, the last such frame SHOULD have a COMPLETE flag set, to indicate to the server that it has all relevant state of that type. Note that for the purposes of COMPLETE, responses cached since the beginning of the connection or the last RESET flag on a CACHE_DIGEST frame need not be included.
CACHE_DIGEST can be computed to include cached responses’ ETags, as indicated by the VALIDATORS flag. This information can be used by servers to decide what kinds of responses to push to clients; for example, a stale response that hasn’t changed could be refreshed with a 304 (Not Modified) response; one that has changed can be replaced with a 200 (OK) response, whether the cached response was fresh or stale.
CACHE_DIGEST has no defined meaning when sent from servers, and SHOULD be ignored by clients.
Given the following inputs:
digest-value can be computed using the following algorithm:
Given:
hash-value can be computed using the following algorithm:
In typical use, a server will query (as per Section 2.2.1) the CACHE_DIGESTs received on a given connection to inform what it pushes to that client;
Servers MAY use all CACHE_DIGESTs received for a given origin as current, as long as they do not have the RESET flag set; a CACHE_DIGEST frame with the RESET flag set MUST clear any previously stored CACHE_DIGESTs for its origin. Servers MUST treat an empty Digest-Value with a RESET flag set as effectively clearing all stored digests for that origin.
Clients are not likely to send updates to CACHE_DIGEST over the lifetime of a connection; it is expected that servers will separately track what cacheable responses have been sent previously on the same connection, using that knowledge in conjunction with that provided by CACHE_DIGEST.
Given:
we can determine whether there is a match in the digest using the following algorithm:
This draft currently has no requirements for IANA. If the CACHE_DIGEST frame is standardised, it will need to be assigned a frame type.
The contents of a User Agent’s cache can be used to re-identify or “fingerprint” the user over time, even when other identifiers (e.g., Cookies [RFC6265]) are cleared.
CACHE_DIGEST allows such cache-based fingerprinting to become passive, since it allows the server to discover the state of the client’s cache without any visible change in server behaviour.
As a result, clients MUST mitigate for this threat when the user attempts to remove identifiers (e.g., “clearing cookies”). This could be achieved in a number of ways; for example: by clearing the cache, by changing one or both of N and P, or by adding new, synthetic entries to the digest to change its contents.
TODO: discuss how effective the suggested mitigations actually would be.
Additionally, User Agents SHOULD NOT send CACHE_DIGEST when in “privacy mode.”
[RFC6265] | Barth, A., "HTTP State Management Mechanism", RFC 6265, DOI 10.17487/RFC6265, April 2011. |
Thanks to Adam Langley and Giovanni Bajo for their explorations of Golumb-coded sets. In particular, see http://giovanni.bajo.it/post/47119962313/golomb-coded-sets-smaller-than-bloom-filters, which refers to sample code.
Thanks to Stefan Eissing for his suggestions.