Network Working Group R. Peon
Internet-Draft Facebook, Inc.
Intended status: Informational J. Pinner
Expires: July 19, 2018 Lyft, Inc.
January 15, 2018

Proposal for QUIC Abstractions
draft-peon-pinner-quic-abstractions-01

Abstract

Proposes abstraction layers for QUIC and proposes recommendations for draft v1.

Note to Readers

Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=quic.

Working Group information can be found at https://github.com/quicwg; source code and issues list for this draft can be found at https://github.com/quicwg/base-drafts/labels/-http.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on July 19, 2018.

Copyright Notice

Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

This document proposes 5 layers of abstraction for QUIC: QUIC, Connections, Streams, H3, and HTTP.

2. Abstractions

2.1. QUIC provides:

2.2. QUIC Connections provide:

2.3. QUIC Streams provide:

2.4. ‘H3’ provides:

2.5. HTTP on QUIC:

2.6. APIs above these layers:

APIs above these layers will then determine how and when data is presented to the application, including decisions about whether to present ordered data as in-order (i.e. socket-like), or to present it as if a file (ordered but not necessarily in-order), and when to request retransmissions or discards (‘reliable’ or partially reliable).

Note that HTTP does not imply reliable. HTTP implies request-response

3. Deeper explanations

3.1. QUIC (Packets):

In order to establish connections QUIC sends packets before QUIC connections can be confirmed to be established. The QUIC-layer abstraction thus includes all parts necessary to operate on a per-packet basis without already being in the context of a QUIC connection.

QUIC packets are UDP datagrams. These may or may not have a 1:1 correspondence to IP packets based on path MTU estimation and IP fragmentation.

Payload data is AEAD Encrypted Minimal routing data is unencrypted - In particular this means that acks (and thus congestion control and loss recovery) are end-to-end instead of hop-by-hop

Packets are NOT reliably delivered or retransmitted. Some of the application payload carried by a packet MAY be retransmitted but that is not required.

Note that this does not preclude the L2 layer from doing its own retransmissions; duplicate packets may be received, even when not sent.

All other intermediaries must “participate†in the QUIC connection — they must be “terminating†intermediaries and have the encryption keys necessary to terminate connections. Tunneling L5-over-L5 still requires an initial connection to be terminated at the proxy.

All packets before a the 1-RTT keys are established for a connection be versioned. The version number location in these packets must be static across all versions of the protocol.

3.2. QUIC Connections

QUIC connections may be created between two endpoints communicating over UDP. A QUIC connection consists of a shared cryptographic context and set of multiplexed “streams†. Connections are created through a combined cryptographic and transport handshake that is capable of providing 0-RTT connection establishment when communicating with a known peer. Finally, in order to be resilient to NAT re-bindings and changes in network topology, connections may persist across changes of the client or server IP and port addresses.

QUIC connections are identified by a set of 64-bit unsigned numbers, one chosen randomly by the client and one or more chosen by the server, in addition to the “5-tuple†used to identify the underlying UDP connection. The QUIC connection identifiers allow for the client and server IP address or port number (or the connection identifier itself) to change throughout the lifetime of the connection, while still allowing datagrams to be correctly routed between the two endpoints.

3.2.1. 0-RTT Connection Establishment

TLS 1.3 enables 0-RTT, and QUIC endpoints should support it.

Since packets are not required to arrive in order (or arrive at all) an endpoint may receive 0-RTT data for a connection that has yet to be established. Implementations should make appropriate tradeoffs between buffering this data as to not render 0-RTT connection establishment infeasible in practice.

An endpoint can always “pretend†it does not have decryption keys for 0-RTT content. Servers can always force a fallback to a 1-RTT establishment handshake. The existence of this fallback is important since it is the only mechanism for a server to do address validation (and thus protect itself from some classes of denial-of-service attacks.)

3.2.2. L4 routing and Connection migration: Requires Working Group decisions

While the protocol allows for both connection migration across changes of the endpoint’s underlying network address and for changes of the connection identifiers, it is unclear (under the current specification) that connection migration can be implemented in a scalable, interoperable manner.

For data within a QUIC connection to be of utility, packets intended to be associated with that connection should flow to a specific endpoint.

For large deployments, there are likely to be a number of L4 load balancers deployed to ensure that this happens while utilizing L7 endpoints effectively. A set of TCP load balancers in a deployment, for instance, would forward packets with the same source IP address and port number to a sole host regardless of which load balancer received the packet.

A QUIC connection is determined by both the network address and a set of connection identifiers. As a result, L4 load balancing which uses only IP address and port number is insufficient to ensure that packets associated with a QUIC connection actually arrive at the correct endpoint. A reasonable solution to this problem might be to hash on the connection ID instead of hashing on the network address; however, if multiple identifiers are used simultaneously throughout the lifetime of the connection however, this is insufficient given all identifiers would have to hash to the same host.

There are several strategies that can be employed to solve the L4 LB problem with alternate connection-IDs. The simplest and most scalable approach requires shared knowledge between the L4 LB and the endpoint of the connection, specifically an encryption key and/or cryptographic algorithm. This allows the L7 endpoint to compute a new connection ID which the L4 LB could successfully deliver to the correct L7. Other means of making this work (global NAT tables in a cluster, distributed NAT tables) require additional hops within datacenters and make successful implementations more difficult while also likely decreasing performance.

In order to associate multiple alternative connection IDs with the same connection, we must expose some data to the L4 load balancer to allows it to correctly map IDs to the expected L7 host. This data could take the form of some structure embedded in the connection identifier and agreed upon between all intermediaries on the path, for example choosing some number of bits to be used for routing that must be identical between all identifiers for a given connection. This is a most certainly a potential avenue for ossification.

The use of multiple connection IDs to identify a connection is provided as a mechanism to prevent a passive observer from correlating activity for the same connection across multiple paths during connection migration. It is worth noting that while a client may want to use a new connection identifier, it requires the server to issue new identifiers, and no mechanism is provided in the specification for the client to request them or require the server to issue them. In addition, multi-path support will arguably do a more effective job of making packet inspection difficult than having multiple connection IDs would, for those connections where multiple paths are available. For connections where multiple paths are not available, the client has the option to open multiple connections to achieve the same effect.

W.I.P. (The other argument for multiple connection IDs is not packet inspection but instead privacy, i.e. link-ability between IP address. If multi-path requires the ability to share connection state between multiple paths, could we extend this to the application layer to share state across multiple connections each with its own connection ID? If so, then there is no privacy concern since the client can instead open one connection per path.) - Recommendation: defer alternative connection IDs to the v2 specification. Even excluding the association of multiple server selected connection IDs to a single connection, the connection still is identified by two identifiers, the one randomly selected by the client and the ID chosen by the server. Without providing mechanism for intermediaries to route the both identifiers to the same endpoint, load balancers must instead perform some form of address translation in order to associate both identifiers with the same host.

3.2.3. Multi-Path

Connection migration across network addresses requires the connection to (briefly) exist simultaneously across multiple paths and as such should instead be considered in the context of broader multi-path support.

3.3. Streams

A stream is an ordered sequence of bytes. A QUIC connection contains a multiplexed set of streams that are grouped into four different namespaces based upon two properties: if the stream is client or server initiated; and if the stream is unidirectional or bidirectional. Streams are flow controlled, both individually and in aggregate across the connection.

Streams are not required (at the QUIC layer) to be re-transmitted or ever transmitted in-order or in their entirety. Flow control windows are increased when a receiver decides that it is willing to accept (and possible discard) bytes from a stream up to a given offset. It is neither a signal that it the receiver has received all bytes below the flow control window nor is a receiver obligated to treat it’s flow control window as a contiguous number of bytes within the stream.

Because streams are flow controlled individually and in their entirety, and because there is no QUIC-layer requirement that stream data be transmitted in its entirety, there is the possibility at the application that connection deadlock may occur if the application only increases the flow control window based on receiving data encoded in streams. (rpeon to fix this statement based on compression) - any application that deals with out-of-order data within a stream must carefully do flow control at the QUIC layer

3.3.1. Grouping of Streams

As this hasn’t been discussed within the working group, this likely needs to be deferred to v2.

Streams may be placed within groups (by default there is only one group), in which case a different frame-type is used for data and headers within that stream. This is why the grouping is at the stream layer and not below.

Groups signal to the L7 routing fabric that the data on multiple streams should be routed to the same (L7) endpoint.

Video is a good example usecase, though pubsub and similar end up with the same problemset. With video, there are various components of the video stream which can be interpreted separately. An example would be I-frames and P-frames. I frames are essentially JPGs and encode an image. P-frames encode a difference from some prior state (or to some other state, depending on one’s perspective). If the application presents these at the same priority within one stream, it would be substantially suboptimal. However, without groups, if the application presents these as different streams, they may not be routed to the same L7 endpoint, which would be essential for correct understanding of the data given the inherently stateful nature of video codecs (and most any compression). Breaking up the video into multiple items allows video to be transported and cached using HTTP semantics reasonably.

Pub-sub, as mentioned before works far better when groups exist: A subscription is established, and any number of responses may flow back to the subscriber; If the subscriber wishes to update the subscription, it sends a new request with the same group, ensuring the subscription state can be correctly managed.

4. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

Authors' Addresses

Roberto Peon Facebook, Inc. EMail: fenix@fb.com
Jeff Pinner Lyft, Inc. EMail: jpinner@lyft.com

Table of Contents