Internet DRAFT - draft-hurtig-tcpm-rtorestart
draft-hurtig-tcpm-rtorestart
TCP Maintenance and Minor Extensions P. Hurtig
(tcpm) Karlstad University
Internet-Draft A. Petlund
Intended status: Experimental Simula Research Laboratory AS
Expires: April 25, 2013 M. Welzl
University of Oslo
October 22, 2012
TCP and SCTP RTO Restart
draft-hurtig-tcpm-rtorestart-03
Abstract
This document describes a modified algorithm for managing the TCP and
SCTP retransmission timers that provides faster loss recovery when a
connection's amount of outstanding data is small. The modification
allows the transport to restart its retransmission timer more
aggressively in situations where fast retransmit cannot be used.
This enables faster loss detection and recovery for connections that
are short-lived or application-limited.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 25, 2013.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Hurtig, et al. Expires April 25, 2013 [Page 1]
Internet-Draft TCP and SCTP RTO Restart October 2012
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1. Introduction
TCP uses two mechanisms to detect segment loss. First, if a segment
is not acknowledged within a certain amount of time, a retransmission
timeout (RTO) occurs, and the segment is retransmitted [RFC6298].
While the RTO is based on measured round-trip times (RTTs) between
the sender and receiver, it also has a conservative lower bound of 1
second to ensure that delayed segments are not mistaken as lost.
Second, when a sender receives duplicate acknowledgments, the fast
retransmit algorithm infers segment loss and triggers a
retransmission. Duplicate acknowledgments are generated by a
receiver when out-of-order segments arrive. As both segment loss and
segment reordering cause out-of-order arrival, fast retransmit waits
for three duplicate acknowledgments before considering the segment as
lost. In some situations, however, the number of outstanding
segments is not enough to trigger three duplicate acknowledgments,
and the sender must rely on lengthy RTOs for loss recovery.
The amount of outstanding segments can be small for several reasons:
(1) The connection is limited by the congestion control when the
path has a low total capacity (bandwidth-delay product) or the
connection's share of the capacity is small. It is also limited
by the congestion control in the first RTTs of a connection or
after an RTO when the available capacity is probed using slow-
start.
(2) The connection is limited by the receiver's available buffer
space.
(3) The connection is limited by the application if the available
capacity of the path is not fully utilized (e.g. interactive
applications), or at the end of a transfer, which is frequent if
the total amount of data is small (e.g. web traffic).
The first two situations can occur for any flow, as external factors
at the network and/or host level cause them. The third situation
primarily affects flows that are short or have a low transmission
rate. Typical examples of applications that produce short flows are
web servers. [RJ10] shows that 70% of all web objects, found at the
top 500 sites, are too small for fast retransmit to work. [BPS98]
Hurtig, et al. Expires April 25, 2013 [Page 2]
Internet-Draft TCP and SCTP RTO Restart October 2012
shows that about 56% of all retransmissions sent by a busy web server
are sent after RTO expiry. While the experiments were not conducted
using SACK [RFC2018], only 4% of the RTO-based retransmissions could
have been avoided. Applications have a low transmission rate when
data is sent in response to actions, or as a reaction to real life
events. Typical examples of such applications are stock trading
systems, remote computer operations and online games. What is
special about this class of applications is that they are time-
dependant, and extra latency can reduce the application service level
[P09]. Although such applications may represent a small amount of
data sent on the network, a considerable number of flows have such
properties and the importance of low latency is high.
The RTO restart approach outlined in this document makes the RTO
slightly more aggressive when the number of outstanding segments is
small, in an attempt to enable faster loss recovery for all segments
while being robust to reordering. While it still conforms to the
requirement in [RFC6298] that segments must not be retransmitted
earlier than RTO seconds after their original transmission, it could
increase the chance for a spurious timeout, which could degrade
performance when the congestion window (cwnd) is large -- for
example, when an application sends enough data to reach a cwnd
covering 100 segments and then stops. The likelihood and potential
impact of this problem as well as possible mitigation strategies are
currently under investigation.
While this document focuses on TCP, the described changes are also
valid for the Stream Control Transmission Protocol (SCTP) [RFC4960]
which has similar loss recovery and congestion control algorithms.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. RTO Restart Overview
The RTO management algorithm described in [RFC6298] recommends that
the retransmission timer is restarted when an acknowledgment (ACK)
that acknowledges new data is received and there is still outstanding
data. The restart is conducted to guarantee that unacknowledged
segments will be retransmitted after approximately RTO seconds.
However, by restarting the timer on each incoming acknowledgment,
retransmissions are not typically triggered RTO seconds after their
previous transmission but rather RTO seconds after the last ACK
arrived. The duration of this extra delay depends on several factors
Hurtig, et al. Expires April 25, 2013 [Page 3]
Internet-Draft TCP and SCTP RTO Restart October 2012
but is in most cases approximately one RTT. Hence, in most
situations the time before a retransmission is triggered is equal to
"RTO + RTT".
The extra delay can be significant, especially for applications that
use a lower RTOmin than the standard of 1 second and/or in
environments with high RTTs, e.g. mobile networks. The restart
approach is illustrated in Figure 1 where a TCP sender transmits
three segments to a receiver. The arrival of the first and second
segment triggers a delayed ACK [RFC1122], which restarts the RTO
timer at the sender. The RTO restart is performed approximately one
RTT after the transmission of the third segment. Thus, if the third
segment is lost, as indicated in Figure 1, the effective loss
detection time is "RTO + RTT" seconds. In some situations, the
effective loss detection time becomes even longer. Consider a
scenario where only two segments are outstanding. If the second
segment is lost, the time to expire the delayed ACK timer will also
be included in the effective loss detection time.
Sender Receiver
...
DATA [SEG 1] ----------------------> (ack delayed)
DATA [SEG 2] ----------------------> (send ack)
DATA [SEG 3] ----X /-------- ACK
(restart RTO) <----------/
...
(RTO expiry)
DATA [SEG 3] ---------------------->
Figure 1: RTO restart example
During normal TCP bulk transfer the current RTO restart approach is
not a problem. Actually, as long as enough segments arrive at a
receiver to enable fast retransmit, RTO-based loss recovery should be
avoided. RTOs should only be used as a last resort, as they
drastically lower the congestion window compared to fast retransmit,
and the current approach can therefore be beneficial -- it is
described in [EL04] to act as a "safety margin" that compensates for
some of the problems that the authors have identified with the
standard RTO calculation. Notably, the authors of [EL04] also state
that "this safety margin does not exist for highly interactive
applications where often only a single packet is in flight."
There are only a few situations where timeouts are appropriate, or
the only choice. For example, if the network is severely congested
and no segments arrive, RTO-based recovery should be used. In this
Hurtig, et al. Expires April 25, 2013 [Page 4]
Internet-Draft TCP and SCTP RTO Restart October 2012
situation, the time to recover from the loss(es) will not be the
performance bottleneck. Furthermore, for connections that do not
utilize enough capacity to enable fast retransmit, RTO is the only
choice. The time needed for loss detection in such scenarios can
become a serious performance bottleneck.
3. RTO Restart Algorithm
To enable faster loss recovery for connections that are unable to use
fast retransmit, an alternative RTO restart can be used. By
resetting the timer to "RTO - T_earliest", where T_earliest is the
time elapsed since the earliest outstanding segment was transmitted,
retransmissions will always occur after exactly RTO seconds. This
approach makes the RTO more aggressive than the standardized approach
in [RFC6298] but still conforms to the requirement in [RFC6298] that
segments must not be retransmitted earlier than RTO seconds after
their original transmission.
This document specifies the following update of step 5.3 in Section 5
of [RFC6298] (and a similar update in Section 6.3.2 of [RFC4960] for
SCTP):
When an ACK is received that acknowledges new data:
(1) Set T_earliest = 0.
(2) If the following two conditions hold:
(a) The number of outstanding segments is less than four.
(b) There is no unsent data ready for transmission or the
receiver's advertised window does not permit
transmission.
set T_earliest to the time elapsed since the earliest
outstanding segment was sent.
(3) Restart the retransmission timer so that it will expire after
"RTO - T_earliest" seconds (for the current value of RTO).
The update requires TCP implementations to track the time elapsed
since the transmission of the earliest outstanding segment
(T_earliest). As the alternative restart is used only when the
number of outstanding segments is less than four only four segments
need to be tracked. Furthermore, some implementations of TCP (e.g.
Linux TCP) already track the transmission times of all segments.
Hurtig, et al. Expires April 25, 2013 [Page 5]
Internet-Draft TCP and SCTP RTO Restart October 2012
4. Discussion
The currently standardized algorithm has been shown to add at least
one RTT to the loss recovery process in TCP [LS00] and SCTP
[HB08][PBP09]. Applications that have strict timing requirements
(e.g. telephony signaling and gaming) rather than throughput
requirements may want to use a lower RTOmin than the standard of 1
second [RFC4166]. For such applications the modified restart
approach could be important as the RTT and also the delayed ACK timer
of receivers will be large components of the effective loss recovery
time. Measurements in [HB08] have shown that the total transfer time
of a lost segment (including the original transmission time and the
loss recovery time) can be reduced with up to 35% using the suggested
approach. These results match those presented in [PGH06][PBP09],
where the modified restart approach is shown to significantly reduce
retransmission latency.
There are several proposals that address the problem of not having
enough ACKs for loss recovery. In what follows, we explain why the
mechanism described here is complementary to these approaches:
The limited transmit mechanism [RFC3042] allows a TCP sender to
transmit a previously unsent segment for each of the first two
duplicate acknowledgments. By transmitting new segments, the sender
attempts to generate additional duplicate acknowledgments to enable
fast retransmit. However, limited transmit does not help if no
previously unsent data is ready for transmission or if the receiver
is out of buffer space. [RFC5827] specifies an early retransmit
algorithm to enable fast loss recovery in such situations. By
dynamically lowering the amount of duplicate acknowledgments needed
for fast retransmit (dupthresh), based on the number of outstanding
segments, a smaller number of duplicate acknowledgments are needed to
trigger a retransmission. In some situations, however, the algorithm
is of no use or might not work properly. First, if a single segment
is outstanding, and lost, it is impossible to use early retransmit.
Second, if ACKs are lost, the early retransmit cannot help. Third,
if the network path reorders segments, the algorithm might cause more
unnecessary retransmissions than fast retransmit.
TCP-NCR [RFC4653] sets the dupthresh to three or more, to better
disambiguate reordered and lost segments. In addition, early
retransmit lowers the dupthresh when the amount of outstanding data
is small, to enable faster loss recovery. The reasons why the RTO
restart procedure described in this document does not take dynamic
dupthresh considerations into account are twofold. First, if a
larger dupthresh is used, the RTO restart approach could be used when
the congestion window, and the amount of outstanding data, is larger.
However, in such situations the actual amount of outstanding data can
Hurtig, et al. Expires April 25, 2013 [Page 6]
Internet-Draft TCP and SCTP RTO Restart October 2012
significantly impact the RTT of the connection, making it potentially
dangerous to be more aggressive. Second, if a smaller dupthresh is
used, the amount of outstanding data needed for a restart is smaller.
However, as the congestion window is already small, it does not
matter if a retransmission is due to a fast retransmit or an RTO.
The resulting congestion window will still be very small, and the
only difference is how quickly TCP infers segment loss.
Tail Loss Probe [TLP] is a proposal to send up to two "probe
segments" when a timer fires which is set to a value smaller than the
RTO. A "probe segment" is a new segment if new data is available,
else a retransmission. The intention is to compensate for sluggish
RTO behavior in situations where the RTO greatly exceeds the RTT,
which, according to measurements reported in [TLP], is not uncommon.
The Probe timeout (PTO) is at least 2 RTTs, and only scheduled in
case the RTO is farther than the PTO. A spurious PTO is less risky
than a spurious RTO, as it would not have the same negative effects
(clearing the scoreboard and restarting with slow-start). In
contrast, RTO restart is trying to make the RTO more appropriate in
cases where there is no need to be overly cautious.
TLP could kick in in situations where RTO restart does not apply, and
it could overrule (yielding a similar general behavior, but with a
lower timeout) RTO restart in cases where the number of outstanding
segments is smaller than 4 and no new segments are available for
transmission. The shorter RTO from RTO restart also reduces the
probability that TLP is activated because PTO might be farther than
RTO. This could make RTO restart more aggressive than the algorithm
in [TLP] when:
(1) no data has been sent in an interval exceeding the RTO
(2) the number of outstanding segments is 3
(3) (defined in [RFC5681]) is at least 3
because, under these conditions, in accordance with [RFC5681], 3
packets can immediately be retransmitted, whereas TLP only allows up
to two consecutive PTOs.
5. IANA Considerations
This memo includes no request to IANA.
Hurtig, et al. Expires April 25, 2013 [Page 7]
Internet-Draft TCP and SCTP RTO Restart October 2012
6. Security Considerations
This document discusses a change in how to set the retransmission
timer's value when restarted. This change does not raise any new
security issues with TCP or SCTP.
7. References
7.1. Normative References
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042,
January 2001.
[RFC4166] Coene, L. and J. Pastor-Balbas, "Telephony Signalling
Transport over Stream Control Transmission Protocol (SCTP)
Applicability Statement", RFC 4166, February 2006.
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton,
"Improving the Robustness of TCP to Non-Congestion
Events", RFC 4653, August 2006.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol",
RFC 4960, September 2007.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009.
[RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
P. Hurtig, "Early Retransmit for TCP and Stream Control
Transmission Protocol (SCTP)", RFC 5827, May 2010.
[RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298,
June 2011.
Hurtig, et al. Expires April 25, 2013 [Page 8]
Internet-Draft TCP and SCTP RTO Restart October 2012
7.2. Informative References
[BPS98] Balakrishnan, H., Padmanabhan, V., Seshan, S., Stemm, M.,
and R. Katz, "TCP Behavior of a Busy Web Server: Analysis
and Improvements", Proc. IEEE INFOCOM Conf., March 1998.
[EL04] Ekstroem, H. and R. Ludwig, "The Peak-Hopper: A New End-
to-End Retransmission Timer for Reliable Unicast
Transport", IEEE INFOCOM 2004, March 2004.
[HB08] Hurtig, P. and A. Brunstrom, "SCTP: designed for timely
message delivery?", Springer Telecommunication Systems,
May 2010.
[LS00] Ludwig, R. and K. Sklower, "The Eifel retransmission
timer", ACM SIGCOMM Comput. Commun. Rev., 30(3),
July 2000.
[P09] Petlund, A., "Improving latency for interactive, thin-
stream applications over reliable transport", Unipub PhD
Thesis, Oct 2009.
[PBP09] Petlund, A., Beskow, P., Pedersen, J., Paaby, E., Griwodz,
C., and P. Halvorsen, "Improving SCTP Retransmission
Delays for Time-Dependent Thin Streams",
Springer Multimedia Tools and Applications, 45(1-3), 2009.
[PGH06] Pedersen, J., Griwodz, C., and P. Halvorsen,
"Considerations of SCTP Retransmission Delays for Thin
Streams", IEEE LCN 2006, November 2006.
[RJ10] Ramachandran, S., "Web metrics: Size and number of
resources", Google http://code.google.com/speed/articles/
web-metrics.html, May 2010.
[TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis,
"TCP Loss Probe (TLP): An Algorithm for Fast Recovery of
Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-00.txt
(work in progress), July 2012.
Hurtig, et al. Expires April 25, 2013 [Page 9]
Internet-Draft TCP and SCTP RTO Restart October 2012
Authors' Addresses
Per Hurtig
Karlstad University
Universitetsgatan 2
Karlstad, 651 88
Sweden
Phone: +46 54 700 23 35
Email: per.hurtig@kau.se
Andreas Petlund
Simula Research Laboratory AS
P.O. Box 134
Lysaker, 1325
Norway
Phone: +47 67 82 82 00
Email: apetlund@simula.no
Michael Welzl
University of Oslo
PO Box 1080 Blindern
Oslo, N-0316
Norway
Phone: +47 22 85 24 20
Email: michawe@ifi.uio.no
Hurtig, et al. Expires April 25, 2013 [Page 10]