Internet DRAFT - draft-drechsler-httpbis-improved-caching
draft-drechsler-httpbis-improved-caching
Network Working Group C. Drechsler, Ed.
Internet-Draft Technische Universitaet Chemnitz
Intended status: Standards Track May 16, 2016
Expires: November 17, 2016
Hypertext Transfer Protocol: Improved HTTP Caching
draft-drechsler-httpbis-improved-caching-05
Abstract
This document describes an improved HTTP caching method which can be
applied in addition to the standard caching behavior for HTTP. It
defines the associated header field that controls this improved
caching mechanism and a modified caching operation which is slightly
different to standard caching operation for HTTP.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 17, 2016.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Drechsler Expires November 17, 2016 [Page 1]
Internet-Draft Improved HTTP Caching May 2016
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. HTTP header field extension . . . . . . . . . . . . . . . 4
2.2. Modified cache operation . . . . . . . . . . . . . . . . 6
2.2.1. Incoming Request Messages . . . . . . . . . . . . . . 6
2.2.2. Incoming Response Messages . . . . . . . . . . . . . 6
2.3. Suggestions . . . . . . . . . . . . . . . . . . . . . . . 11
3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
3.1. Header Field Registration . . . . . . . . . . . . . . . . 11
3.2. Cache Directive Registration . . . . . . . . . . . . . . 11
4. Security Considerations . . . . . . . . . . . . . . . . . . . 12
5. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Normative References . . . . . . . . . . . . . . . . . . 13
5.2. Informative References . . . . . . . . . . . . . . . . . 13
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction
HTTP caching has a significant potential for reducing Interdomain
traffic, especially when shared caches are used within operator
networks. Recent studies have shown very promising results regarding
the cacheability of HTTP traffic (see [Ager], [Erman]).
Unfortunately this potential can not be fully used by the standard
caching behavior described in [RFC7234]. The following two reasons
mainly limit the benefit of caching today:
1. Different URLs for one specific resource:
For cache systems which follow the instructions in [RFC7234]
the URL mainly serves as a identifier for the cached content.
Unfortunately due to mechanisms like load balancing and/or the
use of CDNs the URL for one specific resource can vary. From
Drechsler Expires November 17, 2016 [Page 2]
Internet-Draft Improved HTTP Caching May 2016
the point of the cache system two different URLs mean two
different cache items notwithstanding that the cache items can
be identical in their bit-representation. Therefore caching
systems usually store one specific content several times and
use storage capacity which could potentially be used for
caching of other contents.
2. Personalization of HTTP messages in the header:
When HTTP messages carry personal information like cookies,
session IDs in the query string (this affects also point 1) or
other header attributes for the purpose of personalization (or
managing state) then shared caches cannot reuse these
responses for following requests. In this context content
producers allow caching only in the browser of the user (e.g.
via Cache-control: private) or deny caching at all. If a
specific representation is requested several times by
different clients then this would result in HTTP messages
which differ in the headers while the bodies are equal.
According to [Ager] personalization is also one of the main
reasons for the unused potential of caching.
The goal of this proposal is to address these challenges and come up
with caching, varying URLs and personalization.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Specification
The approach for an improved HTTP caching in this proposal is
twofold.
Section 2.1 introduces a new header field with a hash value. This is
used for precisely identifying the transfered content in the body of
HTTP messages and to signal the permission for caching and reusing of
the body in intermediate cache systems.
The modified caching operation described in Section 2.2 uses the
above-mentioned header field and ensures that all headers (of HTTP
request and response messages) are exchanged between client and
server even if the body of a response message is coming from an
intermediate cache systems.
Drechsler Expires November 17, 2016 [Page 3]
Internet-Draft Improved HTTP Caching May 2016
2.1. HTTP header field extension
For precisely identifying the transfered content independent of the
used URL and independent of additional header fields in the context
of content negotiation the following header field is used:
Cache-NT: sha-256 "=" <base64 encoded sha256 output>
The new header field carries an SHA-256 value (algorithm as in [SHS])
which is computed and encoded the following way:
1. When a client wants to retrieve a specific content it uses a HTTP
GET request with a URL to address the resource. Additionally the
client can use further header fields to negotiate that
representation of the resource which fits best for the client
(this mechanisms is called content negotiation in [RFC7231]).
The SHA-256 value MUST be computed over that representation of
the resource which would be send by the server to the client in
case of a successful response with status code 200 OK.
2. The SHA-256 hash value MUST be computed before the modifications
of the possibly present header fields Content-Encoding, Content-
Range and Transfer-Encoding are applied.
3. The SHA-256 hash value MUST always be computed over the full
representation even if only parts of it are transfered to the
client (e. g. partial content, delta encoding). The hash value
serves as an unique identifier for intermediate cache systems to
identify also parts of the full representation.
4. The SHA-256 hash value MUST be computed by the origin server. It
SHOULD be computed only once (when the resource is made available
on the server or when the resource has changed). It SHOULD NOT
be computed in the moment when the server receives the request
due to not delaying the response.
5. After computing the SHA-256 hash value the output of it MUST be
base64 encoded without line wrapping.
The Cache-NT header field is send by the server in successful
responses with status codes 200 or 206. If the header field is
present then the server signals that the body of the response can be
used for caching by intermediate cache systems for subsequent
requests in compliance with the cache operation described in
Section 2.2.
In the following some examples are given:
Drechsler Expires November 17, 2016 [Page 4]
Internet-Draft Improved HTTP Caching May 2016
Example header field:
Cache-NT: sha-256=ZDJhODRmNGI4YjY1M ... DgyMjlkYTgwNGEyNiAgLQo=
Example for computation of the hash value under UNIX:
sha256sum PopularVideo.mp4 | base64 -w0
Several examples of request-response pairs:
a)
+---------------------------------------------+
| GET /videos/PopularVideo.webm HTTP/1.1 |
| Host: example.com |
+---------------------------------------------+
+---------------------------------------------+
| HTTP/1.1 200 OK |
| Content-Type: video/webm |
| Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA |
| ... |
+---------------------------------------------+
b)
+---------------------------------------------+
| GET /videos/PopularVideo.webm HTTP/1.1 |
| Host: example.com |
| Range: bytes=0-499 |
+---------------------------------------------+
+---------------------------------------------+
| HTTP/1.1 206 Partial Content |
| Content-Type: video/webm |
| Content-Range: bytes 0-499/1000 |
| Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA |
| ... |
+---------------------------------------------+
=> same hash value as in a) because only a part of the
representation is requested
c)
+---------------------------------------------+
| GET /videos/PopularVideo HTTP/1.1 |
| Host: example.com |
| Accept: video/mp4 |
+---------------------------------------------+
+---------------------------------------------+
Drechsler Expires November 17, 2016 [Page 5]
Internet-Draft Improved HTTP Caching May 2016
| HTTP/1.1 200 OK |
| Content-Type: video/mp4 |
| Cache-NT: sha-256=BBBBBBBBBB...BBBBBBBBBB |
| ... |
+---------------------------------------------+
=> different representation as in a) and b) results in a
different hash value
2.2. Modified cache operation
The modified cache operation is slightly different to the one in
[RFC7234]. It uses the header field described in Section 2.1 and
ensures that all headers (of HTTP request and response messages) are
exchanged between client and server even if the body of a response
message is coming from an intermediate cache systems. Client
requests will never terminate at intermediate cache systems as in
[RFC7234].
2.2.1. Incoming Request Messages
Incoming request messages MUST always be forwarded to the origin
server by the intermediate cache system.
For HTTP/1.0 or HTTP/1.1 requests the cache system SHOULD keep track
of the desired connection state by evaluating the Connection header
field.
For HTTP/1.1 requests the cache system MUST keep track of all
pipelined requests.
2.2.2. Incoming Response Messages
The cache system analyzes the header of incoming response messages.
If the status code IS NOT 200 or 206 then the response is forwarded
to the client without modifications. If the status code IS 200 or
206 then the cache system looks for the Cache-NT header field
(described in Section 2.1). Two situations can arise:
a. The Cache-NT header field IS NOT present:
Then the response message is forwarded to the client without
modifications.
b. The Cache-NT header field IS present:
Then the cache system analyzes the hash value in the Cache-NT
header field. Two situations can arise:
Drechsler Expires November 17, 2016 [Page 6]
Internet-Draft Improved HTTP Caching May 2016
1. The cache system has NO cache entry which fits to the hash
value in the Cache-NT header field (cache miss):
Then the response message is forwarded to the client
without modifications. To prevent cache poisoning the
cache system computes the hash value over the
transferred representation in the body (as it is
described in Section 2.1) and if it does match to the
hash value in the Cache-NT header field of the response
from the server then a copy of the message body is
stored in the cache system. Figure 2 visualizes this
cache operation in case of a cache miss.
2. The cache system has an cache entry which fits to the hash
value in the Cache-NT header field (cache hit):
After receiving of the whole message header the cache
system aborts the transfer of the message body from the
server:
o HTTP/2: Via sending RST_STREAM to the server. As
each HTTP request-response exchange is assigned to a
single stream no side effects will arise.
o HTTP/1.0: Via closing the TCP connection to the
server (and sending TCP_RST). If the TCP connection
was intended to stay open (signaling via the
Connection header field) then the cache system
SHOULD open an new TCP connection (with a new TCP
port) to the server immediately for following
requests by the client.
o HTTP/1.1: Via closing the TCP connection (and
sending TCP_RST). If the TCP connection was
intended to stay open (signaling via the Connection
header field) then the cache system SHOULD open an
new TCP connection (with a new TCP port) to the
server immediately for following requests by the
client. If pipelining was used then the cache
system MUST retrieve all requests after the current
request once again.
After that the cache system uses the already received
message header from the server and concatenates it with
the locally stored body. In this process the cache
systems MUST follow the possibly present header fields
o Content-Encoding
Drechsler Expires November 17, 2016 [Page 7]
Internet-Draft Improved HTTP Caching May 2016
o Content-Range
o Transfer-Encoding
and MUST transform the body in the right way. This
means that the client will receive exactly the same
HTTP response message which was originally send out by
the server. Figure 1 visualizes this cache operation
in case of a cache hit.
Drechsler Expires November 17, 2016 [Page 8]
Internet-Draft Improved HTTP Caching May 2016
+-----------------+ +-----------------+
| HEADER (Client) | <-------------------------- | HEADER (Client) |
|-----------------| request is forwarded |-----------------|
| BODY (Client) | <-------------------------- | BODY (Client) |
+-----------------+ +-----------------+
############ ############ ############
# # <------------ # # <------------- # #
# Server # # Cache # # Client #
# # ------------> # # -------------> # #
############ ############ ############
+-----------------+ +-----------------+
| HEADER (Server) | --------------------------> | HEADER (Server) |
|-----------------| HEADER (Server) + BODY |-----------------|
| BODY (Server) | (Cache) is forwarded | |
| | --------------> | BODY (Cache) |
... | | |
| +-----------------+
|
| local stored copy of the body is
| used and concatenated with the
| header from the server
|
|
============ | ============
|| | ||
|| | ||
|| +-----------------+ ||
|| | | ||
|| | BODY (Cache) | ||
|| | | ||
|| +-----------------+ ||
|| ||
|| ||
|| ||
|| cache storage ||
|| ||
===========================
Cache operation in case of cache hit.
Figure 1
Drechsler Expires November 17, 2016 [Page 9]
Internet-Draft Improved HTTP Caching May 2016
+-----------------+ +-----------------+
| HEADER (Client) | <-------------------------- | HEADER (Client) |
|-----------------| request is forwarded |-----------------|
| BODY (Client) | <-------------------------- | BODY (Client) |
+-----------------+ +-----------------+
############ ############ ############
# # <------------ # # <------------- # #
# Server # # Cache # # Client #
# # ------------> # # -------------> # #
############ ############ ############
+-----------------+ +-----------------+
| HEADER (Server) | --------------------------> | HEADER (Server) |
|-----------------| response (HEADER + BODY) |-----------------|
| | is forwarded | |
| BODY (Server) | --------------------------> | BODY (Server) |
| | | | |
+-----------------+ | +-----------------+
|
| copy of body is stored in cache
|
|
============ | ============
|| | ||
|| V ||
|| +-----------------+ ||
|| | | ||
|| | BODY (Server) | ||
|| | | ||
|| +-----------------+ ||
|| ||
|| ||
|| ||
|| cache storage ||
|| ||
===========================
Cache operation in case of cache miss.
Figure 2
Drechsler Expires November 17, 2016 [Page 10]
Internet-Draft Improved HTTP Caching May 2016
2.3. Suggestions
In case of a cache hit the cache system aborts the transfer of the
response body from the server after the whole header has been
received (see Section 2.2). As the transfer of the body cannot be
aborted immediately the server will still send some parts of the
body. How many Kilobytes are transfered depends mainly on the
congestion window of the underlying TCP connection. If the
congestion window is small then only a few Kilobytes of the response
will go over the wire.
Evaluations at Technische Universitaet Chemnitz have shown that at
least around 20 Kilobytes are transfered between origin server and
cache system in case of a cache hit (this is for a HTTP/1.0 or
HTTP/1.1 request right after opening a TCP connection). Therefore
including the Cache-NT header field for small resources does not make
much sense from the point of caching as the whole body is being
transfered before the cache system can abort it.
3. IANA Considerations
3.1. Header Field Registration
HTTP header fields are registered within the Message Header Field
Registry maintained at <http://www.iana.org/assignments/message-
headers/>.
This document defines the following HTTP header fields, so their
associated registry entries shall be updated according to the
permanent registrations below (see [BCP90]):
+-------------------+----------+-------------------+--------------+
| Header Field Name | Protocol | Status | Reference |
+-------------------+----------+-------------------+--------------+
| Cache-NT | http | proposed standard | Section 2.1 |
+-------------------+----------+-------------------+--------------+
The change controller is: "IETF (iesg@ietf.org) - Internet
Engineering Task Force".
3.2. Cache Directive Registration
This document defines the following HTTP header field directives:
Drechsler Expires November 17, 2016 [Page 11]
Internet-Draft Improved HTTP Caching May 2016
+-----------------+--------------+
| Cache Directive | Reference |
+-----------------+--------------+
| sha-256 | Section 2.1 |
+-----------------+--------------+
4. Security Considerations
This section is meant to inform developers, information providers,
and users of known security concerns specific to the caching
mechanism described in this proposal. In addition more general
security considerations of HTTP caching are discussed in Section 8 of
[RFC7234].
The cache operation in Section 2.2 uses the Cache-NT header field
(see Section 2.1) in incoming response messages. If the hash value
in the Cache-NT header field of the (server) response does not
correspond to the representation in the body of that response then a
wrong body is maybe concatenated to the header of the server and send
to the client (this occurs when the cache system has an cache entry
which fits to the hash value in the response of the server). Origin
server SHOULD always include the correct hash value in the Cache-NT
header field which fits to the representation in the body.
Intermediaries MUST NOT change the hash value in the Cache-NT. In
addition the client can compute the hash value over the full
representation (in case of responses with 200 OK) itself and can re-
validate it with the value in the Cache-NT header field.
If a cache system does not have a cache entry which fits to the hash
value in the Cache-NT header field then it forwards the response to
the client and stores a local copy of the body (see Section 2.2). To
prevent cache poisoning the cache system SHOULD compute the hash
value over the full representation in the body (in case of responses
with 200 OK) itself and SHOULD re-validate it with the value in the
Cache-NT header field.
Another security concern will arise if significant security flaws in
the used hash algorithm (currently SHA-256) are detected. Then the
cache can easily be poisoned. In this case origin servers and
intermediate cache systems MUST switch to another hash algorithm (e.
g. SHA-512 or the upcoming SHA-3 family).
5. References
Drechsler Expires November 17, 2016 [Page 12]
Internet-Draft Improved HTTP Caching May 2016
5.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
RFC 7234, June 2014.
5.2. Informative References
[Ager] Ager, B., Schneider, F., Juhoon, K., and A. Feldmann,
"Revisiting Cacheability in Times of User Generated
Content", IEEE Conference on Computer Communications,
Workshops pp. 1-6, March 2010,
<http://ieeexplore.ieee.org/xpls/
abs_all.jsp?arnumber=5466667>.
[BCP90] Klyne, G., Nottingham, M., and J. Mogul, "Registration
Procedures for Message Header Fields", BCP 90, RFC 3864,
September 2004.
[Erman] Erman, J., Gerber, A., Hajiaghayi, M., Pei, D., and O.
Spatscheck, "Network-aware forward caching", Proceedings
of the 18th international conference on World wide web pp.
291-300, 2009,
<http://dl.acm.org/citation.cfm?id=1526749>.
[RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
June 2014.
[SHS] National Institute of Standards and Technology, "Secure
Hash Standard (SHS)", FEDERAL INFORMATION PROCESSING
STANDARDS PUBLICATION 180-4, U.S. Department of Commerce ,
March 2012, <http://csrc.nist.gov/publications/fips/
fips180-4/fips-180-4.pdf>.
Author's Address
Chris Drechsler (editor)
Technische Universitaet Chemnitz
09107 Chemnitz
Germany
Email: chris.drechsler@etit.tu-chemnitz.de
Drechsler Expires November 17, 2016 [Page 13]