TCP Maintenance Working Group | T. Flach |
Internet-Draft | USC |
Intended status: Experimental | N. Dukkipati |
Expires: January 15, 2014 | Y. Cheng |
B. Raghavan | |
July 14, 2013 |
TCP Instant Recovery: Incorporating Forward Error Correction in TCP
draft-flach-tcpm-fec-00.txt
Ordinary TCP loss recovery takes at least one round-trip time and as such can increase application-perceived latency, especially for short flows such as Web transactions. TCP Instant Recovery (TCP-IR) is an experimental algorithm that allows a receiving end to recover lost packets without retransmissions, thus potentially saving at least one full round-trip time compared to standard TCP. TCP-IR achieves this by judiciously injecting encoded data segments within a TCP stream. This document describes the TCP-IR algorithm at the sending and receiving ends, along with the required protocol changes.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 15, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
TCP Instant Recovery (TCP-IR) enables a receiver to recover lost data segments instantly without the need for retransmissions from a sender. Standard TCP retransmission-based loss recovery takes at least one RTT for loss detection and recovery.
The main motivation for TCP-IR is to reduce the tail latency of Web transactions. Most Web transfers are short and could finish within a few round-trip times (RTTs), but losses can add multiple RTTs to transfer times and increase the variance in Web page download times. The goal of TCP-IR is to reduce loss detection and recovery to zero RTT while still employing TCP's congestion control principles.
Recovery mechanisms, such as fast recovery and retransmission timeout, are fundamentally RTT dependent. Regardless of how fast network bandwidth grows, the number of RTTs that it takes to recover lost packets does not change. TCP-IR employs forward error correction (FEC) to scale the recovery time inversely with bandwidth and make it independent of RTT. It explicitly trades some network bandwidth to reduce RTTs for short transfers. Most bandwidth in the Internet is used by large flows such as video, and thus short, latency-sensitive traffic can benefit using a small degree of FEC without hurting bulk flow throughput.
In this document, we specify the TCP-IR mechanism, which requires both sender and receiver changes, to achieve 1-RTT recovery for commonly observed loss scenarios. Instead of complete redundancy for every segment, we employ FEC within TCP. The sender transmits extra FEC segments, which encode previously transmitted segments, so that the receiver can repair a small number of losses. While the use of FEC for transport has been explored in the past, to our knowledge this is the first specification to place FEC within TCP in a way that is incrementally deployable across today's networks with middleboxes.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The key idea in TCP-IR is the judicious introduction of a small number of checksum or XOR segments into TCP's data stream such that a receiver can immediately recover lost segment(s) without the need for retransmissions. The core design challenge is in injecting out of band XOR segments within the regular data stream. We outline the main aspects of the protocol below.
Several of the design choices we made are rooted in our measurements and observations of Internet loss patterns as documented in [RECOVERY-SIGCOMM13]. We find that among the flows experiencing losses, most flows lose only one or two consecutive packets, commonly at the tail of a burst. As an example, for bursts of at most 10 packets, ~35% experienced exactly one loss, and an additional 10% experienced exactly two losses. Further, the latter packets for any given burst size are more likely to be lost. Given these findings, we chose a simple XOR-based encoding scheme that can perform instant recovery of a small amount (one or two segments) of packet loss.
A TCP-IR sender and receiver first negotiate the use of instant recovery in the initial handshake. If both hosts support the use of instant recovery, every packet in the connection includes a TCP-IR option.
TCP-IR sender:
1. Periodically in every round-trip time, a TCP-IR sender places the XOR of newly transmitted segments into a single MSS-length checksum packet. The XOR is only computed for new segments not previously included in checksums.
2. Regardless of the sizes of original segments, the sender computes the XORs along MSS byte boundaries. Because every packet carries a payload of at most MSS bytes, such an encoding guarantees that the receiver can instantly recover any single packet loss.
3. The encoded XOR packet uses the same sequence number as the first segment it encodes. The encoded packet carries a flag in the TCP-IR option signaling that the payload is encoded. A receiver uses the flag to disambiguate an encoded packet from a regular (re)transmission, since both segments carry the same sequence number. The option also includes the number of bytes that the payload encodes.
There is no reliability provided for the XOR segments.
TCP-IR receiver:
1. A receiver first establishes if the payload of the received segment is encoded, by checking a flag in the TCP-IR option.
2. Once the receiver establishes that the payload is encoded, it obtains the encoded range of bytes by using the sequence number of the TCP-IR packet and the the number of bytes encoded.
3. The receiver checks for holes in the encoded range. If it received the entire sequence range, the receiver drops the encoded packet. Otherwise, if it is missing at most MSS contiguous bytes, the receiver uses the encoded payload to recover the lost sequence range and forwards it to the regular reception routine, thus allowing 0-RTT recovery.
4. For the purpose of recovering lost segments, a receiver buffers the last fifteen in-order MSS blocks that it ACKed, even if the application layer has already consumed these blocks. Because an encoded packet is the XOR of at most sixteen MSS segments, the receiver can recover any single lost packet by computing the XOR of the encoded payload and the buffered data in the encoding range.
5. If too much data is missing for the encoded packet to recover, the receiver sends a duplicate ACK. This ACK informs the sender that a recovery failed and also denotes the byte ranges lost via the TCP-IR option. The sender marks the byte ranges as lost and triggers a fast retransmit and recovery.
6. TCP-IR does not circumvent congestion control. If the receiver were to simply ACK a recovered packet, it would mask the loss and prevent congestion control during a known loss episode. To perform congestion window reduction upon a successful recovery at the receiver, TCP-IR uses a mechanism similar to explicit congestion notification (ECN). Upon a successful recovery, the receiver enables an R_SUCC flag in the TCP-IR option in each outgoing ACK. The sender in turn triggers a congestion window reduction and sets an R_ACK flag in the TCP-IR option of the next packet sent to the receiver. Once the receiver observes R_ACK in an incoming packet, indicating that the sender reduced the congestion window, it disables R_SUCC for future packets.
TCP A (Client) TCP B (Server) ______________ _____________ | | | < SYN + TCP-IR_OPT > | | --------------------------------------> | | < SYN/ACK + TCP-IR_OPT > | | <-------------------------------------- | | | | < DATA > | | <-------------------------------------- | | < DATA > | | <-------------------------------------- | | < DATA SEGMENT LOST in TRANSIT > | | X <-------------------- | | | | < XOR SEGMENT > | | <-------------------------------------- | Recovery | | of lost | | segment | | | < ACK + SUCCESSFUL_RECOVERY_FLAG > | | --------------------------------------> | | | | | CWND reduction | | on successful | | recovery. | < ACK + WINDOW_REDUCED_FLAG > | | <-------------------------------------- |
Figure 1: Sample flow using TCP-IR
Figure 1 gives an example of TCP-IR in action.
In the following, we describe the TCP-IR option design, negotiation of instant recovery, the supported encoding schemes, and finally the sender and receiver side algorithms.
Both the server and the client use a new option to perform the following:
1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +---------------+---------------+---------------+ | Kind = TBD | Length = 1 | Encoding Type | +---------------+---------------+---------------+
Figure 2: TCP-IR Option format for packets with SYN flag set
During the initial handshake (for packets with the SYN flag set), the option has the format shown in Figure 2. It contains the following fields:
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+---------------+--------------- | Kind = TBD | Length | Flags | +---------------+---------------+---------------+--------------- Range (optional) | --------------------------------+
Figure 3: TCP-IR Option format (except for packets with SYN flag set)
For all other packets, the option has the format shown in Figure 3. It contains the following fields:
Bit Flag Name Description 0 R_CWR Congestion Window Reduction Acknowledgement 1 R_SUCCESS Recovery successful 2 R_FAIL Recovery failed 3 ENCODED Packet is encoded 4-7 Unused
TCP-IR MUST be explicitly negotiated during the initial handshake. If the negotiation succeeds, both endpoints can send and receive TCP-IR packets. More specifically TCP-IR is enabled if:
If any endpoint receives a packet (after negotiation succeeds) that does not carry the TCP-IR option, the connection MUST be reset. This is necessary because a receiver can no longer distinguish between regular and TCP-IR packets. We recommend tracking these cases to avoid TCP-IR negotiation on future connections.
The client can select the encoding type to be used by the TCP-IR module but both endpoints have to support it and agree on it during the initial handshake (see Section 3.2).
Currently the following encoding types are supported and are therefore valid values in the "Encoding Type" field of the TCP-IR option during negotiation:
Value Type Description 0 Undefined 1 Basic XOR TCP-IR packets carry the XOR of every MSS length segment. One TCP-IR packet encodes up to 16 MSS length segments. 2 Interleaved XOR TCP-IR packets carry the XOR of every other MSS length segment. One TCP-IR packet encodes up to 8 MSS length segments. 3-255 Undefined
We selected these encoding types to demonstrate the flexibility of TCP-IR with respect to user preferences (like the acceptable amount of redundancy) and connection properties. Basic XOR can recover a single segment loss of up to MSS bytes in the encoding range. Interleaved XOR enables the recovery of two consecutive segments of up to MSS bytes. This makes Interleaved XOR suitable for connections observing bursty losses, but can double the number of generated TCP-IR packets.
The currently supported encoding types use the MSS value to determine the block size for the encoding process. If the MSS value changes after the initial handshake, TCP-IR MUST be disabled for the remainder of the connection.
All packets after the initial handshake carry the TCP-IR option to ensure that the receiver can always distinguish regular packets from encoded packets (packets carrying a payload encoded by the TCP-IR module), or detect the removal of the option by a middlebox. Encoded packets MUST set the ENCODED bit in the "Flags" field of the TCP-IR option; all other packets MUST NOT set the ENCODED bit.
The TCP-IR packet MUST use the same sequence number as the first byte it encodes. This prevents enforcing reliability for encoded packets as well as the overhead of specifying the index of the first encoded byte in a separate option field.
In addition to that, the option in encoded packets MUST carry the "Range" field. The value in the "Range" field specifies the index of the byte after the last encoded byte in the payload relative to the sequence number of the encoded packet.
TCP-IR adds a short delay in the transmission of encoded packets to the reduce the probability of losing both the original transmission and the encoded packet in the same loss burst.
The encoding and transmission routine works as follows:
Receivers distinguish TCP-IR packets from regular packets by checking the ENCODED bit in the "Flags" field of the TCP-IR option. Encoded packets are forwarded to the TCP-IR reception routine (described below). If the packet does not carry the TCP-IR option it is discarded and TCP-IR is disabled for the remainder of the connection. To inform the sender that TCP-IR can no longer be used, the receiver sends an acknowledgement without the TCP-IR option.
Additionally, the regular reception routine is modified as follows. The last 15 ACKed MSS length segments remain in the buffer, even if the application layer has already consumed these segments. Segments received out-of-order are already buffered by default and cannot be consumed by the application layer. Since a single TCP-IR packet encodes at most 8 (interleaved XOR) or 16 (basic XOR) MSS length segments, any single segment loss (up to MSS length) in the encoding range can be recovered by the decoder.
The reception routine for TCP-IR packets works as follows:
If a receiver instantly recovers losses we want to ensure the sender learns of it so as to not circumvent congestion control [RFC5681]. The R_SUCCESS bit in the "Flags" field of the TCP-IR option informs the sender that the receiver successfully recovered a lost packet. Once the sender observes the R_SUCCESS bit in a packet the following steps are executed:
A failed recovery on the receiver side triggers an explicit acknowledgment sent to the sender to inform it about the segments that are considered lost. This is indicated by the R_FAIL bit being set in the "Flags" field of the TCP-IR option. If the sender observes this bit being set, the following steps are executed:
Explicit notification of loss ranges has the benefit that lost segments are retransmitted faster, avoiding the extra wait time until the RTO fires.
An important design goal of TCP-IR is compatibility with middleboxes and support for graceful fallback to standard TCP behavior in situations where middlebox interference prevents proper use of TCP-IR.
Even if hosts negotiate TCP-IR during the initial handshake, it is possible for a middlebox to strip the option from a later packet. To be robust to this, if either host receives a packet without the option, it MUST discard the packet and reset the connection. This is necessary since receivers are no longer able to distinguish TCP-IR packets from regular packets.
TCP-IR uses relative sequence numbers to convey metadata (such as the encoding range) between endpoints. This prevents issues in the cases of middleboxes performing sequence number translations.
Some problems caused by middlebox interference (and their solutions in TCP-IR) are not discussed in the current version of this draft:
We implemented TCP-IR in Linux TCP in about 1600 lines of code. 20% of the modular implementation includes the parts common to both the sender and receiver, which are option encoding/decoding, and negotiation during connection setup. 50% of the implementation is the receiver components including detection of an encoded packet, decoding the TCP-IR payload, and generating the right acknowledgements upon a successful or failed recovery. The remainder 30% of the implementation is the sender component which consists mainly of payload encoding and transmission.
We conducted two kinds of experiments. The first was in an emulated setting using loss patterns similar to those observed in our measurement of real traffic. We used the netem module to emulate a 200 ms RTT and both random and correlated loss rates of 2%. TCP-IR reduced the latency for short transfers in lossy environments by 28% in the 90th percentile. The benefits diminish as the minimum number of RTTs necessary to complete the transaction increases (due to the message size) because the time to recover from losses no longer dominates the overall transmission time. TCP-IR is better suited for small transfers common in today's Web.
In the second set of experiments, we used the Web-page-replay tool and dummynet to replay HTTP resource transfers for actual Web page downloads through controlled, emulated network conditions. We tested a variety of popular Web sites, and ran separate tests for Web pages tailored for desktop and mobile clients. As an example, with TCP-IR, the New York Times website takes 15% less time in the 90th percentile until the first objects are rendered on the screen.
Details on performance experiments with TCP-IR are in [RECOVERY-SIGCOMM13].
Applying FEC to transport, at nearly every layer, is an old idea. [Coding-IEEE2011] suggested placing network coding in TCP, and [CodedTCP-2013] extended this work by implementing a variant over UDP mainly for high loss rate wireless environments. Among others, [AdaptiveFEC-2004] and Tickoo et al. [LT-TCP-2005] explored extending TCP to incorporate FEC. Finally, Maelstrom is an FEC variant for long-range communication between data centers leveraging the benefits of combining and encoding data from multiple sources into a single stream [Maelstrom-2011]. The focus of all of this work is on the performance aspects of using FEC over lossy links. None address the protocol level changes required in TCP to incorporate FEC.
The security considerations outlined in [RFC5681] apply to this document. At this time we did not find any additional security problems with TCP-IR.
The two Options for TCP-IR used during negotiation and subsequently in every packet of the connection require IANA allocate one value from the TCP option Kind namespace. Experimentation prior to the allocation SHOULD follow [EXPOPT] and use experimental option kind 254 and two magic bytes 0xDC60, and migrate to the new option after the allocation accordingly.
Note to RFC Editor: this section may be removed on publication as an RFC.