TCP Maintenance and Minor Extensions | T. Li |
Internet-Draft | K. Zheng |
Intended status: Experimental | R. Jadhav |
Expires: September 7, 2020 | J. Kang |
Huawei Technologies | |
March 6, 2020 |
Advancing ACK Handling for Wireless Transports
draft-li-tcpm-advancing-ack-for-wireless-00
Acknowledgement (ACK) is a basic function and implemented in most of the ordered and reliable transport protocols [RFC0793]. Legacy TCP ACK is designed with high frequency to guarantee correct interaction between sender and receiver. However, the shared nature of the wireless medium over wireless local area network (WLAN) induces contention between data transport and backward signaling, such as acknowledgement. The current way of TCP acknowledgment induces control overhead which is counter-productive for TCP performance especially for WLAN scenarios.
This document conducts the ACK frequency breakdown, analyzes several ways of reducing ACK frequency, and discusses the compatibility issues with existing systems in detail.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 7, 2020.
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
It is well-studied that medium acquisition overhead in WLAN based on the IEEE 802.11 medium access control (MAC) protocol [WL] can severely hamper TCP throughput, and TCP's many small ACKs are one reason [Eugenio][Lynne]. Basically, TCP sends an ACK for every one or two incoming data packets. ACKs share the same medium route with data packets, causing similar medium access overhead despite the much smaller size of the ACKs [Eitan][RFC4341][Mario][Sara][Ruy]. Contentions and collisions, as well as the wasted wireless resources by ACKs, therefore lead to significant throughput decline on the data path.
The WLAN bandwidth can be expanded by hardware modifications, such as 802.11ac and 802.11ax, in which channel binding is extended, or more spatial streams and high-density modulation are used. However, a faster physical (PHY) rate makes the MAC overhead problem even worse. This is because delay associated with medium acquisition wastes time and a higher PHY rate also proportionally increases ACK frequency for legacy TCP. Consequently, rethinking the way of TCP acknowledgement that reduce medium acquisition overheads in WLAN, so as to improve transport throughput, would be a relevant contribution.
In order to improve transport performance over IEEE 802.11 wireless links, Salameh et al. [Lynne] proposed HACK by changing Wi-Fi MAC to carry TCP ACKs inside link-layer ACKs, which eliminates TCP ACK medium acquisitions and thus improves TCP goodput. On the other hand, the study of delaying more than two ACKs was first carried out by Altman and Jimenez [Eitan], followed by a line of ACK thinning technologies [Ammar][Farzaneh][Hong][Jiwei][RFC4341][RDe][Ruy][Rao] on the transport layer. Among them, some studies reduce ACK frequency by dropping selected ACKs on an intermediate node (e.g., a wireless AP or gateway). Due to asymmetry information, this intermediate management unavoidably makes endpoints in chaos when estimating transport-layer states, thus take untimely actions. Under these circumstances, some studies adopt the end-to-end solutions, which fall into two categories: (1) Byte-counting ACK that sends an ACK for every L (L >= 2) incoming full sized packets. (2) Periodic ACK that sends an ACK for each time interval (or send window). Both fail to match the number of ACKs to transport’s required frequency under different network conditions (e.g, time-varying data throughput). Tame ACK (TACK) is suggested to combines these two approaches, achieving a minimized ACK frequency under different network scenarios.
However, simply reducing ACK frequency not only disturbs the packet clocking algorithms (e.g., loss recovery, send pattern, window update) and round-trip timing [Jbson], but also impairs the feedback robustness (e.g., more sensitive to ACK loss). The hurdle here is that legacy TCP couples the high ACK frequency with transport controls such as rapid loss detection, accurate round trip timing, and effective send rate control.
ACK frequency can be denoted by f with unit of Hz, i.e., number of ACKs per second. Byte-counting ACK and periodic ACK are two fundamental ways to reduce ACK frequency on the transport layer.
1. Byte-counting ACK: ACK frequency is in control by sending an ACK for every L (L >= 2) incoming full-sized packets (packet size equals to the maximum segment size (MSS)) [Eitan][RFC4341][Sara][Rao]. The frequency of byte-counting ACK is proportional to data throughput (bw):
f(b) = bw/L*MSS (1)
In general, f(b) can be reduced by setting a large value of L. However, for a given L, f(b) increases with bw. This means when data throughput is extremely high, the ACK frequency still might be comparatively large. In other words, the frequency of byte-counting ACK is unbounded under bandwidth change.
2. Periodic ACK: Byte-counting ACK’s unbounded frequency can be attributed to the coupling between ACK sending and packet arrivals. Periodic ACK can decouple ACK frequency from packet arrivals, achieving a bounded ACK frequency when bw is high. The frequency of periodic ACK is:
f(pack) = 1/alpha (2)
Where alpha is the time interval between two ACKs. However, when bw is extremely low, the ACK frequency is always as high as that in the case of a high throughput. In other words, the frequency of periodic ACK is unadaptable to bandwidth change, which wastes resources.
3. Tame ACK (TACK): To summarize, both of the above ways fail to bound or minimize the number of ACKs that required by transport under different network conditions (e.g, time-varying data throughput). TACK is suggested to combine these two ways so that minimize ACK frequency required, i.e., ACK frequency can be set as f(tack) = min{f(b),f(pack)}. Through Equations (1) and (2), we have
f(tack) = min{bw/L*MSS, 1/alpha} (3)
In 2006, Floyd and Kohler [RFC4341] proposed a tunable transport control variant in which the minimum ACK frequency allowed is twice per send window (i.e., per RTT). In 2007, Sara Landstrom et al. [Sara] has also demonstrated that, in theory, acknowledging data twice per send window should be sufficient to ensure utilization with some modifications to the traditional TCP. Doubling the acknowledgment frequency to four times per send window can produce good performance and it is more robust in practice. Based on this rationale, we set alpha = RTTmin/beta, which means sending beta ACKs per RTTmin. RTTmin is the smallest RTT observed over a long period of time. As a consequence, the frequency of TACK can be given as follow:
f(tack) = min{bw/L*MSS, beta/RTTmin} (4)
where beta indicates the number of ACKs per RTT, and L indicates the number of full-sized data packets counted before sending an ACK. To minimize the ACK frequency, a smaller beta or a larger L is expected. Sara Landstrom et al. has given a lower bound of beta in [Sara], i.e., beta >= 2. An upper bound of L can also be derived according to the loss rate on the data path (plr_data) and the ack path (plr_ack), i.e., L <= feedback_info/(plr_data*plr_ack), where feedback_info denotes the amount of information carried by an ACK.
Qualitatively, TACK turns to periodic ACK when bandwidth-delay product (bdp) is large (i.e., bdp >= beta*L*MSS), and falls back to byte-counting ACK when bdp is small (i.e., bdp < beta*L*MSS).
In terms of a transport with a large bdp, beta = 2 should be sufficient to ensure utilization, but the large bottleneck buffer (i.e., one bdp) makes it necessary to acknowledge data more often. In general, the minimum send window (SWNDmin) can be roughly estimated as follow:
SWNDmin = beta*bdp/(beta-1) (5)
Ideally, the bottleneck buffer requirement is decided by the minimum send window, i.e., SWNDmin − bdp. Since doubling the ACK frequency reduces the bottleneck buffer requirement substantially from 1 bdp to 0.33 bdp, beta = 4 is RECOMMENDED to provide redundancy, being more robust in practice.
The parameter L comes into effect in terms of TACK frequency when the bdp is small. Latency-sensitive transport usually has a small bdp, in the case of application limitation, the latency of thin flows might be enlarged due to a large L. Since the high ACK frequency is not the main bottleneck in the case of a mall bdp, the delayed TCP-like provisioning of L = 2 is RECOMMENDED to be more robust in practice. It is also feasible to enable the TCP QUICKACK option, allowing real-time applications to set L = 1.
First, given an L, the frequency of TACK is always no more than that of the legacy TCP ACK. Assume all data packets are full-sized (MSS = 1500 B), L = 1 and beta = 4, the frequency is reduced to 10% when bw = 48 Mbps and RTTmin = 10 ms. Second, the higher bit rate over wireless links, the more number of ACKs are reduced. For example, the frequency has dropped two orders of magnitude when bw increases from 48 Mbps to 200 Mbps. Meanwhile, the larger latency between endpoints, the more number of ACKs are reduced. For example, the frequency has dropped three orders of magnitude when RTTmin increases from 10 ms to 20 ms (bw = 200 Mbps). In summary, TACK significantly reduces ACK frequency in most cases. This can be not only a win for throughput improvement but also win for energy and CPU efficiency.
To facilitate the analysis, we assume that every data packet is full-sized (i.e., MSS). When the TCP socket option TCP QUICKACK is enabled, the legacy TCP sends an ACK for every packet (i.e., per-packet ACK). The frequency of per-packet ACK is computed as:
f(tcp) = bw/MSS (6)
Transport protocols, such as TCP and QUIC, also alternatively adopt delayed ACK, in which a data receiver may delay sending an ACK response by a given time interval (gamma) or for every L full-sized incoming packets. As described in [RFC1122] and updated in [RFC5681], L is strictly limited up to 2, and gamma is tens to hundreds of milliseconds and varies in different Linux distributions.
When 0 <= bw < 2*MSS/gamma, less than 2 full-sized data packets are transported during the time period of gamma. The frequency of delayed ACK is computed as:
f(tcp_delayed) = bw/MSS, 0 <= bw < 2*MSS/gamma (7)
When bw >= 2*MSS/gamma, at least 2 full-sized data packets are transported during the time period of gamma. The frequency of delayed ACK is computed as:
f(tcp_delayed) = bw/(2*MSS), bw >= 2*MSS/gamma (8)
Delayed ACK falls into the category of byte-counting ACK except that an extra timer prevents ACK from being excessively delayed. For full-sized data packets, it turns to byte-counting ACK when bw is large and falls back to per-packet ACK when bw is small (see Equations (7) and (8)). TACK differs from delayed ACK by mandatorily sending ACKs periodically when bw is large (see Equation (4)). In particular, TACK applies periodic ACK when bdp is large and falls back to byte-counting ACK when bdp is small.
It is depicted in [RFC2525] that a TCP receiver which does not generate an ACK for every second full-sized segment exhibits a "Stretch ACK Violation". Several implications of generating fewer ACKs are also discussed in [RFC2525]. TACK aims to achieve a minimized or controlled ACK frequency under different network scenarios, but unfortunately exhibits the "Stretch ACK Violation". Since TACK excessively decreases the rate at which ACKs are transmitted, in order to apply TACK without decreasing transport performance, we list all detailed challenges that need to be overcome as below.
For ordered and byte-stream transport, when loss occurs and a packet has to be retransmitted, packets that have already arrived but that appear later in the bytestream must await delivery of the missing packet so the bytestream can be reassembled in order. Known as head-of-line blocking (HoLB), this incurs high delay of packet reassembling and thus can be detrimental to the transport performance. Applying TACK will further enlarge this delay incurred by HoLB.
We define the TACK delay as the delay incurred between when the packet is received and when the TACK is sent. According to Equation (4), with a large RTTmin, TACK might be excessively delayed. When loss occurs during the TACK interval, the excessive TACK delay might disturb loss detection, resulting in costly retransmission timeouts. TACK loss further aggravates this problem. For example, RTTmin = 200 ms, bw = 10 Mbps, and L = 1, then f(tack) = 20 Hz. Compared to per-packet ACK, TACK can cause the feedback delay of up to 50 ms upon loss event. If ACK is lost or the retransmission is lost again, then the delay doubles.
An RTT can be sampled at the sender upon receiving a TACK. For example, a packet is sent at time t_0 and arrives at time t_2. Assume that the TACK is constructed at time t_3, the receiver computes the TACK delay delta_t = t_3 − t_2. The sender therefore computes the RTT according to delta_t, t_0 and the TACK arrival time (t_1), i.e., RTT_sample = t_1 − t_0 − delta_t. Measuring delta_t at the receiver assures an explicit correction for a more accurate RTT estimate.
The issue here is that multiple data packets might be received during the TACK interval, generating only one RTT sample among multiple packets is likely to result in biases. For example, a larger minimum RTT estimate and a smaller maximum RTT estimate. In general, the higher the throughput, the larger the biases. One alternative way to reduce biases can be that, each TACK carries the per-packet delta_t (specific TACK delays for each data packet) for the sender to generate more RTT samples. However, (1) the overhead is high, which is unacceptable especially under high-bandwidth transport. Also, (2) the number of data packets might be far more than the maximum number of delta_t that a TACK is capable to carry.
A burst of packets can be sent in response to a single delayed ACK. Legacy TCP usually sends micro bursts of one to three packets, which is bounded by L <= 2 according to the definition of TCP delayed ACK. However, the fewer ACKs sent, the larger the bursts of packets released. Since TACK might be excessively delayed, the burst send pattern is non-negligible as it may have larger buffer requirement, higher loss rate and longer queueing delay if not carefully handled.
The TCP slow start algorithm increases the congestion window (cwnd) upon ACK arrials. Therefore, applying TACK increases the amount of time needed by the slow start algorithm to reach the maximum bandwidth, which diminishes performance in networks with large bdps.
Send window update requires ACKs to update the largest acknowledged packet and the announcement window (awnd). With a small frequency, TACK probably delays acknowledging packet receipts and reporting the awnd, resulting in feedback lags and bandwidth under-utilization. For example, f(tack) = 20 Hz, then TACK is sent every 50 ms. Assume a TACK notifies awnd = 0 due to receive buffer runs out at t = 0 ms, upon receiving this TACK, the sender stops sending data. In the case that the receive buffer is released at t = 5 ms due to loss recovery, the sender continues to be blocked for another 45 ms until a subsequent TACK is sent at t = 50 ms, and thus wastes opportunity of sending data. TACK loss further aggravates this issue.
This document neither strengthens nor weakens TCP's current security properties.
This document is the problem statement. This section will be completed when further solution is proposed.
[RFC0793] | Postel, J., "Transmission Control Protocol", STD 7, RFC 793, DOI 10.17487/RFC0793, September 1981. |
[RFC1122] | Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, October 1989. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC2525] | Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J., Heavens, I., Lahey, K., Semke, J. and B. Volz, "Known TCP Implementation Problems", RFC 2525, DOI 10.17487/RFC2525, March 1999. |
[RFC4341] | Floyd, S. and E. Kohler, "Profile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 2: TCP-like Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 2006. |
[RFC5681] | Allman, M., Paxson, V. and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009. |