Internet-Draft | LRD | March 2021 |
Scheffenegger | Expires 23 September 2021 | [Page] |
Lost Retransmissions are a major source of latency for TCP transfers. This note specifies how selective acknowledgment (SACK) information can be used to timely recover from lost retransmissions. In addition, it codifies the congestion control reaction on lost retransmissions.¶
Discussion of this draft takes place on the TCPM working group mailing list, which is archived at https://mailarchive.ietf.org/arch/browse/tcpm/.¶
Working Group information can be found at https://datatracker.ietf.org/wg/tcpm/;¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 23 September 2021.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Selective Acknowledgement (SACK) is widely used to identify exactly which TCP segment was lost and only send these missing segments during a recovery episode. This helps improve the effectiveness of loss recovery and aligns with the principle of packet conservation. In addition, SACK information can also be used to infer about lost retransmissions. When this information is not used, TCP senders revert to the retransmission timeout (RTO) scheme to recover from lost retransmissions.¶
Current SACK implementations, with one widely deployed exception, do not perform lost retransmission detection. Lost retransmission detection (LRD) in the one implementation that performs it was described as an emergent feature due to the way the sender is handling SACK. Therefore, LRD is handled in that stack within the current regime of loss recovery, but without any additional congestion control reaction.¶
This note specifies the use of SACK to detect and recover from lost retransmissions. Using this scheme, a RTO is only required to recover from excessive loss of segments, or ACKs. The intention of this note is to enhance SACK loss recovery so that most RTO events can be mitigated. Only during episodes of pathological network impediments, RTO are still necessary to achieve forward progress.¶
The mechanism described adheres strictly to the principle of packet conservation. It also requires the use of the forward acknowledgement (FACK) mechanism, described in more detail in [MM96a] and [TLP].¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
TCP Selective Acknowledgement [RFC2018] was designed to provide detailed information to the sender about the segements already received. Based on this information, a sender can reduce the number of unnecessary retransmissions to close to zero and also recover from a loss of multiple segments within a single round trip time (RTT), and without reverting to a retransmission timeout (RTO).¶
To that end, [RFC6675] describes the necessary data structures a sender has to maintain to keep track of incoming SACK information. However, no explicit attempt was made to specify how to use the information gained during the recovery episode to detect lost retransmissions.¶
In addition, [RFC2018] specifically stipulated up to which point a SACK enabled sender may promote segments to become eligible for retransmission under the SACK scheme. This heuristic works very well during bulk transfers, where the sender always has additional data to send. Close to the end of a stream, when there is no more data in the socket to send, current SACK implementations fail to promote still outstanding and never acknowledged segments to become eligible for retransmission. When this happens, the performance of a TCP SACK implementation adhereing to [RFC3517] degrades and is lower than the performance of TCP NewReno [RFC3782], which can recovery this particular event without an RTO.¶
The introduction of a rescue retransmission, as described in [RFC6675], addresses this particular issue.¶
This document is concerned with the behavior of a TCP SACK sender, when after retransmission of all ourstanding segments, and the transmission of new data, the recovery state persists (SND.UNA does not advance to SND.MAX at the time of loss recovery initiation, also known as Recovery Point).¶
This document uses the terms SND.UNA, SND.NXT, SND.MAX as defined in [RFC5681].¶
SND.FACK (forward acknowledgment) is used to describe the highest sequence number that has been SACKed by the receiver and subsequently seen by the sender. The full definition can be found in [MM96a] and [MM96b]. The FACK mechanism is further described in [TLP].¶
The algorithm described in this document has to adhere to the principle of packet conservation. Detection and recovery from lost retransmissions is plagued with the same set of problems that can become worrysome during regular loss detection and loss recovery. Especially heavy reordering and recovery at the end-of-stream can make it hard to achive good efficiency during loss recovery.¶
The algorithm outlined does not speak about the engagement of the loss recovery state by the sender TCP. It is assumed, that the methods outlined in Congestion Control [RFC5681], Early Retransmit [RFC5827] and [SRE], now incorporated into [RFC6675] are used to engage in loss recovery. This leaves only the case where all segments between SND.UNA and SND.MAX are lost to be recovered from by means of retransmission timeout.¶
The intuition behind the scheme is that if a retransmission succeeds, then the cumulative ack should increase one round trip time after the retransmission was sent. Otherwise, the retransmission must have been lost. The key is to have a unambiguous signal which indicates that at least one RTT has passed after a retransmission was sent out.¶
As long as the sending TCP has still unsent data available, an unabigious signal can be deducted by using the FACK mechanism. After the first round of sending retransmissions, the sender MAY send previously unsent data. Once SND.FACK advances and encompasses this newly sent data, the sender can deduct with high probability, that any still outstanding packets have been dropped by the network. The sender MAY start retransmitting all still outstanding packets. If the sender chooses to do so, it MUST take an appropriate congestion control action. This action is prudent, as the loss of retransmitted packets can be a signal of persistent congestion in the network, that lasts even after the initial congestion control reaction at least one RTT before.¶
Note that the one popular stack performing LRD already does not react by reducing the congestion window before starting the next cycle of retransmissions. It is therefore more aggressive that the mechanism described herein. Nevertheless, no network instabilities have been reported since that stack started using LRD more than two decades ago.¶
Without making use of additional information not contained in the SACK entries, only reordered ACKs can be discriminated.¶
If a single data segment is delayed, and later resent, it is not possible by using only information available within SACK entries to distinguish if the original or retransmitted segment was SACKed. Thus lost retransmission detection can fall victim to reordered data segments, if it were to use retransmitted segments as signal to detemine lost retransmissions.¶
The use of an SACK acknowledging data that was not sent at the initiation of the recovery episode prevents this issue.¶
On the return path, reordered ACKs may be recognized, by comparing the SACK entries contained in the ACK. The original ACK from the in- sequence, original transmission does not contain any SACK entries beyond SND.FACK, while the ACK for a retransmitted segment would likely contain SACK blocks of segments higher than the newly SACKed segment.¶
Also, if an ACK does not contain any newly SACKed segments than already known in the senders scoreboard, ACK reordering is likely to have occured. For example, the SACK entry may contain only a part of an entry already in the scoreboard. However, such a simple heuristic is not enough to discriminate properly the ACK for a retransmitted data segment from the ACK of the original data segment.¶
There are a number of choices when it comes to deciding which packet to transmit at what time and also in what order. With TCP SACK, the decision of what to send has been decoupled from the decision when (and how much) to send.¶
In the context of lost retransmission detection, there are at least four broad approaches, each of which has a different figure of merit:¶
Furthermore, the senders congestion window might not allow for many re-retransmissions before a stall. Therefore, additional steps would be necessary on the sender side, to ensure continous, paced transmission even after the ACK clock has stopped. This limits the usefulness of this approach, and addressing congestion control and timing related issues are outside the scope of this note. However, this is effectively implemented when using RACK [RFC8985].¶
Section 5 in [RFC2018] seems to have been interpreted as an exlusive list of which segments may become elegible for retransmission, but can also be interpreted as an inclusive list:¶
After the SACKed bit is turned on (as the result of processing a received SACK option), the data sender will skip that segment during any later retransmission. Any segment that has the SACKed bit turned off and is less than the highest SACKed segment is available for retransmission.¶
In order to track if a retransmitted segment might have been lost, the sender requires additional state while in the recovery state.¶
Once TCP has established that genuine loss exists in the network, it enters loss recovery. At this point, the current value of SND.MAX is stored ("Recover" in NewReno [RFC6582]). Thus it is enough to check if SND.FACK advances beyond "Recover". Once that becomes true, some previously unsent data was acknowledged by the receiver. By that time, any outstanding retransmissions should have been received as well. Thus the sender MAY retransmit the outstanding data from the SACK scoreboard again, after taking appropriate congestion control action (i.e. reducing the congestion window).¶
The retransmission SHOULD proceed in order of ascending sequence numbers across the unfilled holes of the SACK scoreboard, to maximize the chance that a delayed segment closes still outstanding holes.¶
Note that implementations tracking sequence-number ranges in their scoreboard only need to track a single sequence number per recovery episode. Multiple cycles of SACK loss recover, without leaving loss recovery in between, are possible by tracking the relevant "Recovery" in the scoreboard data structure.¶
Implicitly, this rule will also make sure, that all the segments which had become elegible for retransmission will have been sent at least one time, before any additional round of retransmissions is initiated. If the entire flight of data except a small number of segments at the end were lost, it takes at least one RTT for the information about successfully received segments to reach the sender. By that time, the first round of retransmissions is already completed (and additional data segments with sequence numbers higher than SND.MAX at the start of the recovery episode start may have been already been sent.)¶
In order to guarantee a timely delivery at end-of-stream, a TCP sender implementing LRD SHOULD also make use of the "Rescue Retransmission" as defined in [RFC6675].¶
The algorithm presented in this paper shares security considerations with [RFC2018] and [RFC6675].¶
This document does not require any IANA actions.¶
The author would like to thank Matt Mathis for the insightful discussions about SACK and it's intended behavior and the spirit driving the design of SACK.¶
Dragana Damjanovic was very helpful in reviewing an earlier version of this text and point out numerous clarifications.¶
Furthermore, valuable feedback was received from John Heffner, Jeff Prem and Anumita Biswas.¶
The following lengthy graph shows the intended behavior under pathological packet loss, where every third segment is lost. Note that SACK LRD will not be able to recover, if the loss ratio during recovery is higher than about 50%, due to the congestion window reduction.¶
For clarity, each segment is denoted only via a single number. Note that the ACKs are also given with the segement they ack, not the next sequence number.¶