Internet DRAFT - draft-peon-pinner-quic-abstractions
draft-peon-pinner-quic-abstractions
Network Working Group R. Peon
Internet-Draft Facebook, Inc.
Intended status: Informational J. Pinner
Expires: July 21, 2018 Lyft, Inc.
January 17, 2018
Proposal for QUIC Abstractions
draft-peon-pinner-quic-abstractions-03
Abstract
Proposes abstraction layers for QUIC and proposes recommendations for
draft v1.
Note to Readers
Discussion of this draft takes place on the QUIC working group
mailing list (quic@ietf.org), which is archived at
https://mailarchive.ietf.org/arch/search/?email_list=quic [1].
Working Group information can be found at https://github.com/quicwg
[2]; source code and issues list for this draft can be found at
https://github.com/quicwg/base-drafts/labels/-http [3].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 21, 2018.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
Peon & Pinner Expires July 21, 2018 [Page 1]
Internet-Draft I-D January 2018
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1. Introduction
This document proposes 5 layers of abstraction for QUIC: QUIC,
Connections, Streams, H3, and HTTP.
2. Abstractions
2.1. QUIC provides:
o Packets
o MTU discovery (packet sizing)
o Version negotiation
o Packet loss detection
o A cryptographic context enabling data encryption within a packet
o Zero-RTT connection establishment with limited data payload
o One-RTT connection establishment
2.2. QUIC Connections provide:
o Identification of the connection including a Connection ID in
addition to the 5-tuple
o Alternate connection IDs/connID 'renaming' without requiring
connection re-establishment
o Multiplexed, non HoL blocking, streams
o Congestion control on a per-path basis
o Data (not packet) retransmission
o Flow control on a per-connection basis
Peon & Pinner Expires July 21, 2018 [Page 2]
Internet-Draft I-D January 2018
o Mechanisms to prove liveness, measure-RTT
2.3. QUIC Streams provide:
o Flow control on a per-stream basis
o Ordered but not necessarily in-order bytestreams
o Grouping: a statement that streams should be delivered to the same
endpoint through proxies
o Data frames
o Support for non-data frames
2.4. 'H3' provides:
o Flow-controlled headers frames on streams
o Compression for headers data in a robust way which trades off HoL
blocking and compression efficiency
2.5. HTTP on QUIC:
o Maps requests to streams using H3
o Defines restrictions on header/data frame sequencing in line with
HTTP semantics
2.6. APIs above these layers:
APIs above these layers will then determine how and when data is
presented to the application, including decisions about whether to
present ordered data as in-order (i.e. socket-like), or to present it
as if a file (ordered but not necessarily in-order), and when to
request retransmissions or discards ('reliable' or partially
reliable).
Note that HTTP does not imply reliable. HTTP implies request-
response.
3. Deeper explanations
3.1. QUIC (Packets):
In order to establish connections, QUIC sends packets before QUIC
connections can be confirmed to be established. The QUIC-layer
abstraction thus includes all parts necessary to operate on a per-
Peon & Pinner Expires July 21, 2018 [Page 3]
Internet-Draft I-D January 2018
packet basis without already being in the context of a QUIC
connection.
QUIC packets are UDP datagrams. These may or may not have a 1:1
correspondence to IP packets based on path MTU estimation and IP
fragmentation.
Payload data is AEAD Encrypted. Minimal routing data is unencrypted:
- In particular this means that acks (and thus congestion control and
loss recovery) are end-to-end instead of hop-by-hop.
Packets are NOT reliably delivered or retransmitted. Some of the
application payload carried by a packet MAY be retransmitted but that
is not required.
Note that this does not preclude the L2 layer from doing its own
retransmissions; duplicate packets may be received, even when not
sent.
All other intermediaries must "participate" in the QUIC connection -
they must be "terminating" intermediaries and have the encryption
keys necessary to terminate connections. Tunneling L5-over-L5 still
requires an initial connection to be terminated at the proxy.
All packets before a the 1-RTT keys are established for a connection
must be versioned. The version number location in these packets must
be static across all versions of the protocol.
3.2. QUIC Connections
QUIC connections may be created between two endpoints communicating
over UDP. A QUIC connection consists of a shared cryptographic
context and set of multiplexed "streams". Connections are created
through a combined cryptographic and transport handshake that is
capable of providing 0-RTT connection establishment when
communicating with a known peer. Finally, in order to be resilient
to NAT re-bindings and changes in network topology, connections may
persist across changes of the client or server IP and port addresses.
QUIC connections are identified by a set of 64-bit unsigned numbers,
one chosen randomly by the client and one or more chosen by the
server, in addition to the "5-tuple" used to identify the underlying
UDP connection. The QUIC connection identifiers allow for the client
and server IP address or port number (or the connection identifier
itself) to change throughout the lifetime of the connection, while
still allowing datagrams to be correctly routed between the two
endpoints.
Peon & Pinner Expires July 21, 2018 [Page 4]
Internet-Draft I-D January 2018
3.2.1. 0-RTT Connection Establishment
TLS 1.3 enables 0-RTT, and QUIC endpoints should support it.
Since packets are not required to arrive in order (or arrive at all)
an endpoint may receive 0-RTT data for a connection that has yet to
be established. Implementations should make appropriate tradeoffs
between buffering this data as to not render 0-RTT connection
establishment infeasible in practice.
An endpoint can always "pretend" it does not have decryption keys for
0-RTT content. Servers can always force a fallback to a 1-RTT
establishment handshake. The existence of this fallback is important
since it is the only mechanism for a server to do address validation
(and thus protect itself from some classes of denial-of-service
attacks).
3.2.2. L4 routing and Connection migration: Requires Working Group
decisions
While the protocol allows for both connection migration across
changes of the endpoint's underlying network address and for changes
of the connection identifiers, it is unclear (under the current
specification) that connection migration can be implemented in a
scalable, interoperable manner.
For data within a QUIC connection to be of utility, packets intended
to be associated with that connection should flow to a specific
endpoint.
For large deployments, there are likely to be a number of L4 load
balancers deployed to ensure that this happens while utilizing L7
endpoints effectively. A set of TCP load balancers in a deployment,
for instance, would forward packets with the same source IP address
and port number to a sole host regardless of which load balancer
received the packet.
A QUIC connection is determined by both the network address and a set
of connection identifiers. As a result, L4 load balancing which uses
only IP address and port number is insufficient to ensure that
packets associated with a QUIC connection actually arrive at the
correct endpoint. A reasonable solution to this problem might be to
hash on the connection ID instead of hashing on the network address;
however, if multiple identifiers are used simultaneously throughout
the lifetime of the connection, this is insufficient given all
identifiers would have to hash to the same host.
Peon & Pinner Expires July 21, 2018 [Page 5]
Internet-Draft I-D January 2018
There are several strategies that can be employed to solve the L4 LB
problem with alternate connection-IDs. The simplest and most
scalable approach requires shared knowledge between the L4 LB and the
endpoint of the connection, specifically an encryption key and/or
cryptographic algorithm. This allows the L7 endpoint to compute a
new connection ID which the L4 LB could successfully deliver to the
correct L7. Other means of making this work (global NAT tables in a
cluster, distributed NAT tables) require additional hops within
datacenters and make successful implementations more difficult while
also likely decreasing performance.
In order to associate multiple alternative connection IDs with the
same connection, we must expose some data to the L4 load balancer to
allow it to correctly map IDs to the expected L7 host. This data
could take the form of some structure embedded in the connection
identifier and agreed upon between all intermediaries on the path,
for example choosing some number of bits to be used for routing that
must be identical between all identifiers for a given connection.
This is most certainly a potential avenue for ossification.
The use of multiple connection IDs to identify a connection is
provided as a mechanism to prevent a passive observer from
correlating activity for the same connection across multiple paths
during connection migration. It is worth noting that while a client
may want to use a new connection identifier, it requires the server
to issue new identifiers, and no mechanism is provided in the
specification for the client to request them or require the server to
issue them. In addition, multi-path support will arguably do a more
effective job of making packet inspection difficult than having
multiple connection IDs would, for those connections where multiple
paths are available. For connections where multiple paths are not
available, the client has the option to open multiple connections to
achieve the same effect.
W.I.P. (The other argument for multiple connection IDs is not packet
inspection but instead privacy, i.e. link-ability between IP address.
If multi-path requires the ability to share connection state between
multiple paths, could we extend this to the application layer to
share state across multiple connections each with its own connection
ID? If so, then there is no privacy concern since the client can
instead open one connection per path.) - Recommendation: defer
alternative connection IDs to the v2 specification. Even excluding
the association of multiple server selected connection IDs to a
single connection, the connection still is identified by two
identifiers, the one randomly selected by the client and the ID
chosen by the server. Without providing mechanism for intermediaries
to route the both identifiers to the same endpoint, load balancers
Peon & Pinner Expires July 21, 2018 [Page 6]
Internet-Draft I-D January 2018
must instead perform some form of address translation in order to
associate both identifiers with the same host.
3.2.3. Multi-Path
Connection migration across network addresses requires the connection
to (briefly) exist simultaneously across multiple paths and as such
should instead be considered in the context of broader multi-path
support.
3.3. Streams
A stream is an ordered sequence of bytes. A QUIC connection contains
a multiplexed set of streams that are grouped into four different
namespaces based upon two properties: if the stream is client or
server initiated; and if the stream is unidirectional or
bidirectional. Streams are flow controlled, both individually and in
aggregate across the connection.
Questions/recommendations: - Streams really have whatever reliability
is used by the two endpoints of the connection - intermediaries must
assume unreliability and we should verify that congestion control and
flow control are not dependent upon any reliability assumption - Now
that we have 4 stream types (unidirection and endpoint-originated) we
should not make any attempt to provide a "mapping" to TCP or Socket -
STREAMS now need to be their own concept independent of prior art so
let's make this explicit - Stream closure is unreliable in QUIC -
when either endpoint closes a stream data is not required to be
flushed. This also leads to connection-level flow control
requirements (i.e. don't block until you get the data to increase it
or you are going to deadlock)
Streams are neither required (at the QUIC layer) to be re-transmitted
nor to be transmitted in-order. They provide no guarantee that data
will be transmitted in their entirety.
Flow control windows are increased when a receiver decides that it is
willing to accept (and possible discard) bytes from a stream up to a
given offset. It is neither a signal that the receiver has received
all bytes below the flow control window nor is a receiver obligated
to treat its flow control window as a contiguous number of bytes
within the stream.
Because streams are flow controlled individually and in their
entirety, and because there is no QUIC-layer requirement that stream
data be transmitted in its entirety, there is the possibility at the
application that connection deadlock may occur if the application
only increases the flow control window based on receiving data
Peon & Pinner Expires July 21, 2018 [Page 7]
Internet-Draft I-D January 2018
encoded in streams. In particular: - any application that deals with
out-of-order data within a stream must carefully do flow control at
the QUIC layer
3.3.1. Grouping of Streams
As this hasn't been discussed within the working group, this likely
needs to be deferred to v2.
Streams may be placed within groups (by default there is only one
group), in which case a different frame-type is used for data and
headers within that stream. This is why grouping is at the stream
layer and not below.
Groups signal to the L7 routing fabric that the data on multiple
streams should be routed to the same (L7) endpoint.
Video is a good example usecase, though pubsub and similar end up
with the same problemset. With video, there are various components
of the video stream which can be interpreted separately. An example
would be I-frames and P-frames. I frames are essentially JPGs and
encode an image. P-frames encode a difference from some prior state
(or to some other state, depending on one's perspective). If the
application presents these at the same priority within one stream, it
would be substantially suboptimal. However, without groups, if the
application presents these as different streams, they may not be
routed to the same L7 endpoint, which would be essential for correct
understanding of the data given the inherently stateful nature of
video codecs (and most any compression). Breaking up the video into
multiple items allows video to be transported and cached using HTTP
semantics reasonably.
Pub-sub, as mentioned before works far better when groups exist: A
subscription is established, and any number of responses may flow
back to the subscriber; If the subscriber wishes to update the
subscription, it sends a new request with the same group, ensuring
the subscription state can be correctly managed.
4. References
4.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
Peon & Pinner Expires July 21, 2018 [Page 8]
Internet-Draft I-D January 2018
4.2. URIs
[1] https://mailarchive.ietf.org/arch/search/?email_list=quic
[2] https://github.com/quicwg
[3] https://github.com/quicwg/base-drafts/labels/-http
Authors' Addresses
Roberto Peon
Facebook, Inc.
Email: fenix@fb.com
Jeff Pinner
Lyft, Inc.
Email: jpinner@lyft.com
Peon & Pinner Expires July 21, 2018 [Page 9]