TCP Maintenance and Minor Extensions (TCPM) WG | I. Rhee |
Internet-Draft | NCSU |
Intended status: Informational | L. Xu |
Expires: August 8, 2017 | UNL |
S. Ha | |
Colorado | |
A. Zimmermann | |
L. Eggert | |
NetApp | |
R. Scheffenegger | |
February 4, 2017 |
CUBIC for Fast Long-Distance Networks
draft-ietf-tcpm-cubic-04
CUBIC is an extension to the current TCP standards. The protocol differs from the current TCP standards only in the congestion window adjustment function in the sender side. In particular, it uses a cubic function instead of a linear window increase function of the current TCP standards to improve scalability and stability under fast and long distance networks. CUBIC and its predecessor algorithm have been adopted as default by Linux and have been used for many years. This document provides a specification of CUBIC to enable third party implementation and to solicit the community feedback through experimentation on the performance of CUBIC.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 8, 2017.
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The low utilization problem of TCP in fast long-distance networks is well documented in [K03][RFC3649]. This problem arises from a slow increase of congestion window following a congestion event in a network with a large bandwidth delay product (BDP). Our experience [HKLRX06] indicates that this problem is frequently observed even in the range of congestion window sizes over several hundreds of packets (each packet is sized around 1000 bytes) especially under a network path with over 100ms round-trip times (RTTs). This problem is equally applicable to all Reno style TCP standards and their variants, including TCP-RENO [RFC5681], TCP-NewReno [RFC6582], TCP-SACK [RFC2018], SCTP [RFC4960], TFRC [RFC5348] that use the same linear increase function for window growth, which we refer to collectively as Standard TCP below.
CUBIC [HRX08] is a modification to the congestion control mechanism of Standard TCP, in particular, to the window increase function of Standard TCP senders, to remedy this problem. Specifically, it uses a cubic function instead of a linear window increase function of the Standrad TCP to improve scalability and stability under fast and long distance networks.
BIC-TCP, a predecessor of CUBIC, has been selected as default TCP congestion control algorithm by Linux in the year 2005 and been used for several years by the Internet community at large. CUBIC uses a similar window growth function as BIC-TCP and is designed to be less aggressive and fairer to TCP in bandwidth usage than BIC-TCP while maintaining the strengths of BIC-TCP such as stability, window scalability and RTT fairness. CUBIC has already been deployed globally by Linux. Through extensive testing in various Internet scenarios, we believe that CUBIC is safe for testing and deployment in the global Internet.
In the ensuing sections, we first brefly explain the design principle of CUBIC, then provide the exact specification of CUBIC, and finally discuss the safety features of CUBIC following the guidelines specified in [RFC5033].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
CUBIC [HRX08] uses a cubic window increase function in terms of the elapsed time from the last congestion event. While most alternative algorithms to Standard TCP uses a convex increase function where during congestion avoidance the window increment is always increasing, CUBIC uses both the concave and convex profiles of a cubic function for window increase. After a window reduction following a loss event detected by duplicate ACKs, it registers the window size where it got the loss event as W_max and performs a multiplicative decrease of congestion window and the regular fast recovery and retransmit of Standard TCP. After it enters into congestion avoidance from fast recovery, it starts to increase the window using the concave profile of the cubic function. The cubic function is set to have its plateau at W_max so the concave growth continues until the window size becomes W_max. After that, the cubic function turns into a convex profile and the convex window growth begins. This style of window adjustment (concave and then convex) improves protocol and network stability while maintaining high network utilization [CEHRX07]. This is because the window size remains almost constant, forming a plateau around W_max where network utilization is deemed highest and under steady state, most window size samples of CUBIC are close to W_max, thus promoting high network utilization and protocol stability. Note that protocols with convex increase functions have the maximum increments around W_max and introduces a large number of packet bursts around the saturation point of the network, likely causing frequent global loss synchronizations.
Another notable feature of CUBIC is that its window increase rate is mostly independent of RTT, and follows a (cubic) function of the elapsed time from the beginning of congestion avoidance. This feature promotes per-flow fairness to Standard TCP as well as RTT-fairness. Note that Standard TCP performs well under short RTT and small bandwidth (or small BDP) networks. Only in a large long RTT and large bandwidth (or large BDP) networks, it has the scalability problem. An alternative protocol to Standard TCP designed to be friendly to Standard TCP at a per-flow basis must operate to increase its window much less aggressively in small BDP networks than in large BDP networks. In CUBIC, its window growth rate is slowest around the inflection point of the cubic function and this function does not depend on RTT. In a smaller BDP network where Standard TCP flows are working well, the absolute amount of the window decrease at a loss event is always smaller because of the multiplicative decrease. Therefore, in CUBIC, the starting window size after a loss event from which the window starts to increase, is smaller in a smaller BDP network, thus falling nearer to the plateau of the cubic function where the growth rate is slowest. By setting appropriate values of the cubic function parameters, CUBIC sets its growth rate always no faster than Standard TCP around its inflection point. When the cubic function grows slower than the window of Standard TCP, CUBIC simply follows the window size of Standard TCP to ensure fairness to Standard TCP in a small BDP network. We call this region where CUBIC behaves like Standard TCP, the TCP-friendly region.
CUBIC maintains the same window growth rate independent of RTTs outside of the TCP-friendly region, and flows with different RTTs have the similar window sizes under steady state when they operate outside the TCP-friendly region. This ensures CUBIC flows with different RTTs to have their bandwidth shares (approximately, window/RTT) linearly proportional to the inverse of their RTT ratio (the longer RTT, the smaller the share). This behavior is the same as that of Standard TCP under high statistical multiplexing environments where packet losses are independent of individual flow rates. However, under low statistical multiplexing environments, the bandwidth share ratio of Standard TCP flows with different RTTs is squarely proportional to the inverse of their RTT ratio [XHR04]. CUBIC always ensures the linear ratio independent of the levels of statistical multiplexing. This is an improvement over Standard TCP. While there is no consensus on a particular bandwidth share ratios of different RTT flows, we believe that under wired Internet, use of the linear share notion seems more reasonable than equal share or a higher order shares. HTCP [LS08] currently uses the equal share.
CUBIC sets the multiplicative window decrease factor to 0.7 while Standard TCP uses 0.5. While this improves the scalability of the protocol, a side effect of this decision is slower convergence especially under low statistical multiplexing environments. This design choice is following the observation that the author of HSTCP [RFC3649] has made along with other researchers (e.g., [GV02]): the current Internet becomes more asynchronous with less frequent loss synchronizations with high statistical multiplexing. Under this environment, even strict Multiplicative-Increase Multiplicative-Decrease (MIMD) can converge. CUBIC flows with the same RTT always converge to the same share of bandwidth independent of statistical multiplexing, thus achieving intra-protocol fairness. We also find that under the environments with sufficient statistical multiplexing, the convergence speed of CUBIC flows is reasonable.
The unit of all window sizes in this document is segments of the maximum segment size (MSS), and the unit of all times is seconds.
CUBIC maintains the acknowledgment (ACK) clocking of Standard TCP by increasing congestion window only at the reception of ACK. The protocol does not make any change to the fast recovery and retransmit of TCP, such as TCP-NewReno [RFC6582] and TCP-SACK [RFC2018]. During congestion avoidance after fast recovery, CUBIC changes the window update algorithm of Standard TCP. Suppose that W_max is the window size before the window is reduced in the last fast retransmit and recovery.
The window growth function of CUBIC uses the following function:
W_cubic(t) = C*(t-K)^3 + W_max (Eq. 1)
where C is a constant fixed to determine the aggressiveness of window growth in high BDP networks, t is the elapsed time from the last window reduction that is measured right after the fast recovery in response to duplicate ACKs or after the congestion window reduction in response to ECN-Echo ACKs, and K is the time period that the above function takes to increase the current window size to W_max if there is no further loss event and is calculated by using the following equation:
K = cubic_root(W_max*(1-beta_cubic)/C) (Eq. 2)
where beta_cubic is the CUBIC multiplication decrease factor, that is, when a packet loss detected by duplicate ACKs or a network congestion detected by ECN-Echo ACKs occurs, CUBIC reduces its current window cwnd to W_cubic(0)=W_max*beta_cubic. We discuss how we set C in the next Section in more details.
Upon receiving an ACK during congestion avoidance, CUBIC computes the window growth rate during the next RTT period using Eq. 1. It sets W_cubic(t+RTT) as the candidate target value of congestion window, where RTT is the weithed average RTT calculated by the standard TCP.
Depending on the value of the current window size cwnd, CUBIC runs in three different modes. First, if cwnd is less than the window size that Standard TCP would reach at time t after the last loss event, then CUBIC is in the TCP friendly region (we describe below how to determine this window size of Standard TCP in term of time t). Otherwise, if cwnd is less than W_max, then CUBIC is the concave region, and if cwnd is larger than W_max, CUBIC is in the convex region. Below, we describe the exact actions taken by CUBIC in each region.
Standard TCP performs well in certain types of networks, for example, under short RTT and small bandwidth (or small BDP) networks. In these networks, we use the TCP-friendly region to ensure that CUBIC achieves at least the same throughput as the standard TCP.
When receiving an ACK in congestion avoidance, we first check whether the protocol is in the TCP region or not. This is done by estimating the average rate of the Standard TCP using a simple analysis described in [FHP00]. It considers the Standard TCP as a special case of an Additive Increase and Multiplicative Decrease algorithm (AIMD), which has an additive factor alpha_aimd and a multiplicative factor beta_aimd with the following function:
AVG_W_aimd = [ alpha_aimd * (1+beta_aimd) / (2*(1-beta_aimd)*p) ]^0.5 (Eq. 3)
By the same analysis, the average window size of Standard TCP is (1.5/p)^0.5, as the Standard TCP is a special case of AIMD with alpha_aimd=1 and beta_aimd=0.5. Thus, for Eq. 3 to be the same as that of Standard TCP, alpha_aimd must be equal to 3*(1-beta_aimd)/(1+beta_aimd). As AIMD increases its window by alpha_aimd per RTT, we can get the window size of AIMD in terms of the elapsed time t as follows:
W_aimd(t) = W_max*beta_aimd + [3*(1-beta_aimd)/(1+beta_aimd)] * (t/RTT) (Eq. 4)
If W_cubic(t) is less than W_aimd(t), then the protocol is in the TCP friendly region and cwnd SHOULD be set to W_aimd(t) at each reception of ACK.
When receiving an ACK in congestion avoidance, if the protocol is not in the TCP-friendly region and cwnd is less than W_max, then the protocol is in the concave region. In this region, cwnd MUST be incremented by (W_cubic(t+RTT) - cwnd)/cwnd for each received ACK, where W_cubic(t+RTT) is calculated using Eq. 1.
When the current window size of CUBIC is larger than W_max, it passes the plateau of the cubic function after which CUBIC follows the convex profile of the cubic function. Since cwnd is larger than the previous saturation point W_max, this indicates that the network conditions might have been perturbed since the last loss event, possibly implying more available bandwidth after some flow departures. Since the Internet is highly asynchronous, some amount of perturbation is always possible without causing a major change in available bandwidth. In this phase, CUBIC is being very careful by very slowly increasing its window size. The convex profile ensures that the window increases very slowly at the beginning and gradually increases its growth rate. We also call this phase as the maximum probing phase since CUBIC is searching for a new W_max. In this region, cwnd MUST be incremented by (W_cubic(t+RTT) - cwnd)/cwnd for each received ACK, where W_cubic(t+RTT) is calculated using Eq. 1.
When a packet loss detected by duplicate ACKs or a network congestion detected by ECN-Echo ACKs occurs, CUBIC updates its W_max, cwnd, and ssthresh (slow start threshold) as follows. Parameter beta_cubic SHOULD be set to 0.7.
W_max = cwnd; // save window size before reduction ssthresh = cwnd * beta_cubic; // new slow start threshold cwnd = cwnd * beta_cubic; // window reduction
A side effect of setting beta_cubic to a bigger value than 0.5 is slower convergence. We believe that while a more adaptive setting of beta_cubic could result in faster convergence, it will make the analysis of the protocol much harder. This adaptive adjustment of beta_cubic is an item for the next version of CUBIC.
To improve the convergence speed of CUBIC, we add a heuristic in the protocol. When a new flow joins the network, existing flows in the network need to give up their bandwidth shares to allow the flow some room for growth if the existing flows have been using all the bandwidth of the network. To increase this release of bandwidth by existing flows, the following mechanism called fast convergence SHOULD be implemented.
With fast convergence, when a loss event occurs, before a window reduction of congestion window, a flow remembers the last value of W_max before it updates W_max for the current loss event. Let us call the last value of W_max to be W_last_max.
if (W_max < W_last_max){ // check downward trend W_last_max = W_max; // remember the last W_max W_max = W_max*(1+beta_cubic)/2; // further reduce W_max } else { // check upward trend W_last_max = W_max // remember the last W_max }
This allows W_max to be slightly less than the original W_max. Since flows spend most of time around their W_max, flows with larger bandwidth shares tend to spend more time around the plateau allowing more time for flows with smaller shares to increase their windows.
In case of timeout, CUBIC follows the standard TCP to reduce cwnd, but sets ssthresh using beta_cubic (same as in Section 4.5).
In this section, we further discuss the safety features of CUBIC following the guidelines specified in [RFC5033].
With a deterministic loss model where the number of packets between two successive lost events is always 1/p, CUBIC always operates with the concave window profile which greatly simplifies the performance analysis of CUBIC. The average window size of CUBIC can be obtained by the following function:
AVG_W_cubic = [C*(3+beta_cubic)/(4*(1-beta_cubic))]^0.25 * (RTT^0.75) / (p^0.75) (Eq. 5)
With beta_cubic set to 0.7, the above formula is reduced to:
AVG_W_cubic = (C*3.7/1.2)^0.25 * (RTT^0.75) / (p^0.75) (Eq. 6)
We will determine the value of C in the following subsection using Eq. 6.
In environments where standard TCP is able to make reasonable use of the available bandwidth, CUBIC does not significantly change this state.
Standard TCP performs well in the following two types of networks:
CUBIC is designed to behave very similarly to standard TCP in the above two types of networks. The following two tables show the average window size of standard TCP, HSTCP, and CUBIC. The average window size of standard TCP and HSTCP is from [RFC3649]. The average window size of CUBIC is calculated by using Eq. 6 and CUBIC TCP friendly mode for three different values of C.
Loss Rate P | TCP | HSTCP | CUBIC (C=0.04) | CUBIC (C=0.4) | CUBIC (C=4) |
---|---|---|---|---|---|
10^-2 | 12 | 12 | 12 | 12 | 12 |
10^-3 | 38 | 38 | 38 | 38 | 59 |
10^-4 | 120 | 263 | 120 | 187 | 333 |
10^-5 | 379 | 1795 | 593 | 1054 | 1874 |
10^-6 | 1200 | 12279 | 3332 | 5926 | 10538 |
10^-7 | 3795 | 83981 | 18740 | 33325 | 59261 |
10^-8 | 12000 | 574356 | 105383 | 187400 | 333250 |
Response function of standard TCP, HSTCP, and CUBIC in networks with RTT = 0.1 seconds. The average window size is in MSS-sized segments.
Loss Rate P | Average TCP W | Average HSTCP W | CUBIC (C=0.04) | CUBIC (C=0.4) | CUBIC (C=4) |
---|---|---|---|---|---|
10^-2 | 12 | 12 | 12 | 12 | 12 |
10^-3 | 38 | 38 | 38 | 38 | 38 |
10^-4 | 120 | 263 | 120 | 120 | 120 |
10^-5 | 379 | 1795 | 379 | 379 | 379 |
10^-6 | 1200 | 12279 | 1200 | 1200 | 1874 |
10^-7 | 3795 | 83981 | 3795 | 5926 | 10538 |
10^-8 | 12000 | 574356 | 18740 | 33325 | 59261 |
Response function of standard TCP, HSTCP, and CUBIC in networks with RTT = 0.01 seconds. The average window size is in MSS-sized segments.
Both tables show that CUBIC with any of these three C values is more friendly to TCP than HSTCP, especially in networks with a short RTT where TCP performs reasonably well. For example, in a network with RTT = 0.01 seconds and p=10^-6, TCP has an average window of 1200 packets. If the packet size is 1500 bytes, then TCP can achieve an average rate of 1.44 Gbps. In this case, CUBIC with C=0.04 or C=0.4 achieves exactly the same rate as Standard TCP, whereas HSTCP is about ten times more aggressive than Standard TCP.
We can see that C determines the aggressiveness of CUBIC in competing with other protocols for the bandwidth. CUBIC is more friendly to the Standard TCP, if the value of C is lower. However, we do not recommend to set C to a very low value like 0.04, since CUBIC with a low C cannot efficiently use the bandwidth in long RTT and high bandwidth networks. Based on these observations, we find C=0.4 gives a good balance between TCP-friendliness and aggressiveness of window growth. Therefore, C SHOULD be set to 0.4. With C set to 0.4, Eq. 6 is reduced to:
Eq. 7 is then used in the next subsection to show the scalability of CUBIC.
CUBIC uses a more aggressive window growth function than Standard TCP under long RTT and high bandwidth networks.
The following table shows that to achieve 10Gbps rate, standard TCP requires a packet loss rate of 2.0e-10, while CUBIC requires a packet loss rate of 2.9e-8.
Throughput(Mbps) | Average W | TCP P | HSTCP P | CUBIC P |
---|---|---|---|---|
1 | 8.3 | 2.0e-2 | 2.0e-2 | 2.0e-2 |
10 | 83.3 | 2.0e-4 | 3.9e-4 | 2.9e-4 |
100 | 833.3 | 2.0e-6 | 2.5e-5 | 1.4e-5 |
1000 | 8333.3 | 2.0e-8 | 1.5e-6 | 6.3e-7 |
10000 | 83333.3 | 2.0e-10 | 1.0e-7 | 2.9e-8 |
Required packet loss rate for Standard TCP, HSTCP, and CUBIC to achieve a certain throughput. We use 1500-byte packets and an RTT of 0.1 seconds.
Our test results in [HKLRX06] indicate that CUBIC uses the spare bandwidth left unused by existing Standard TCP flows in the same bottleneck link without taking away much bandwidth from the existing flows.
CUBIC is designed to remedy the poor performance of TCP in fast long-distance networks.
CUBIC has been extensively studied by using both NS-2 simulation and test-bed experiments covering a wide range of network environments. More information can be found in [HKLRX06].
Same as Standard TCP, CUBIC is a loss-based congestion control algorithm. Therefore, CUBIC, which is designed to be more aggressive than Standard TCP in fast and long distance networks, can fill large drop-tail buffers more quickly than Standard TCP. In this case, proper queue sizing and management [RFC7567] could be used to reduce the packet queueing delay.
With regard to the potential of causing congestion collapse, CUBIC behaves like standard TCP since CUBIC modifies only the window adjustment algorithm of TCP. Thus, it does not modify the ACK clocking and Timeout behaviors of Standard TCP.
CUBIC ensures convergence of competing CUBIC flows with the same RTT in the same bottleneck links to an equal bandwidth share. When competing flows have different RTTs, their bandwidth shares are linearly proportional to the inverse of their RTT ratios. This is true independent of the level of statistical multiplexing in the link.
This is not considered in the current CUBIC.
CUBIC does not raise its congestion window size if the flow is currently limited by the application instead of the congestion window. In cases of idle periods, t in Eq. 1 MUST NOT include the idle time; otherwise, W_cubic(t) might be very high after restarting from a long idle time.
In case that there is a sudden congestion, a routing change, or a mobility event, CUBIC behaves the same as Standard TCP.
CUBIC requires only the change of TCP senders, and does not require any assistant of routers.
This proposal makes no changes to the underlying security of TCP.
There are no IANA considerations regarding this document.
Alexander Zimmermann and Lars Eggert have received funding from the European Union's Horizon 2020 research and innovation program 2014-2018 under grant agreement No. 644866 (SSICLOPS). This document reflects only the authors' views and the European Commission is not responsible for any use that may be made of the information it contains.
[RFC2018] | Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, DOI 10.17487/RFC2018, October 1996. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC3649] | Floyd, S., "HighSpeed TCP for Large Congestion Windows", RFC 3649, DOI 10.17487/RFC3649, December 2003. |
[RFC4960] | Stewart, R., "Stream Control Transmission Protocol", RFC 4960, DOI 10.17487/RFC4960, September 2007. |
[RFC5033] | Floyd, S. and M. Allman, "Specifying New Congestion Control Algorithms", BCP 133, RFC 5033, DOI 10.17487/RFC5033, August 2007. |
[RFC5348] | Floyd, S., Handley, M., Padhye, J. and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, DOI 10.17487/RFC5348, September 2008. |
[RFC5681] | Allman, M., Paxson, V. and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009. |
[RFC6582] | Henderson, T., Floyd, S., Gurtov, A. and Y. Nishida, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 6582, DOI 10.17487/RFC6582, April 2012. |
[RFC7567] | Baker, F. and G. Fairhurst, "IETF Recommendations Regarding Active Queue Management", BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015. |