Internet DRAFT - draft-erblich-tcp-no-karn-alg
draft-erblich-tcp-no-karn-alg
Network Working Group
Internet draft Mitchell Erblich August 2006
Category: Experimental
Alteration of Karn's Algorithm for
High Bandwidth / Delay Environments
<draft-erblich-tcp-no-karn-alg-00.txt>
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2006). All Rights Reserved.
ABSTRACT
Karn's algorithm specifies acknowledgements that are the
result of segment retransmits should be ignored, not timed,
and should not contribute to the smoothed round-trip-time
(SRTT) because they are considered "ambiguous". It is also
stated in Karn's paper that "If an acknowledgement arrives
after the RTO has expired, it is highly likely to come very
shortly after wards." In time, we have added the "fast retransmit"
functionality, so we are not solely dependent on RTO for
retransmits. Common sense dictates that if we receive an
acknowledgement "very shortly after wards", that those
acknowledgments should not be considered "ambiguous". These
non-ambiguous acknowledgments should be added to the SRTT and
trigger us to return to our previous non-congestion behavior.
Table of Contents
1. Motivation ................ 2
2. Introduction ................ 2
3. Implementation .............. 3
4. Conclusion ................ 4
5. References ................ 4
6. Security Considerations ..... 4
7. Author's Address 5
1. Motivation
An ISP measured that at certain times of day, that the amount of
transmitted data through a number of TCP connections far exceeded
the amount of data thought to be generated by a specified set of
applications. It was theorized that either a large number of
segments were being dropped and/or that some segments had a large
RTT and were being retransmitted needlessly.
A major unseen item was a resulting drop in useable throughput
of a TCP flow when congestion was not present. This is not believed
to have been an issue possibly due to application limited TCP flows
Out-of-order (ooo) segments are more likely to appear in high
bandwidth / delay environments. It is these environments that can
consume a receiver's buffer, such that the receiver can reneg and
discard data that has been selectively acknowledged (SACK).
However, we are more concerned about ooo segments arriving arriving
at the receiver that latter results in fast-retransmits. These
fast-retransmits forces us into congestion avoidance (CA), and
it is the recovery time resulting and lost bandwidth that we
are addressing here.
In addition, [RFC3522] requires the use of TCP timestamps. This
document attempts to justify that the same results can occur
when TCP timestamps are not used in high-bandwidth / delay
environments and quickly recovers from the false CA.
2. Introduction
[KARN] specifies that segment retransmission should not be
timed because the ack can not be determined to be resulting
from the original or the later retransmitted segment.
In Karn's original paper ONLY course grained RTO timeouts were
triggers for retransmits. Thus the paper concentrated on the
determination of a proper SRTT, given a set of segment RTTs.
[RFC2525] specifies that "when the initial RTO < RTT, it can
take a long time for the TCP to correct the problem by
adapting the RTT estimate, because of the use of the Karn's
algorithm". This is from section 2.7.
[PAX97] introduces the concept that a large number of
environments can or do re-order more than a minimal number
of segments.
Spurious retransmissions are the result of segment
retransmissions, that are later determined to be unnecessary. These
unnecessary added segment transmissions / retransmissions consume
link bandwidth and decrease the actual application throughput.
RFC3522 introduces multiple events that can lead to false
congestion avoidance and a detection algorithm. This RFC
requires the use of the segment timestamp.
This document attempts to take extra steps to detect false
congestion without the use of the Timestamp option and suggest
parameters that could be used to restore the pre-congestion
bandwidth throughput as-soon-as-possible without creating
a local congestion event.
2.1 Timestamp Issues
* Not all implementations enable the Timestamps option.
* A receiver may forge a echoed Timestamp
* The granularity for the timestamp clock for a high
bandwidth link to a low delay receiver may be too
fine grained, than to a high delay receiver over
the same link.
Only the third item is reviewed in this document. To attempt
to resolve the granularity issue, we attempt to adjust the
timestamp granularity based on the number of inflight segments
on a per connection basis.
3. Implementation
A R&D environment with interfaces up to 10Gb Ethernet was highly
instrumented for a number of days. The number of inflight segments
per major aggregated flow periodiclly exceeded 100k. Because of the
number of possible inflight segments, it was deemed that
Selective-ACKs would only complicate the implementation. We
later re-enabled Timestamp option and SACK support for validation
of our results.
We added and used these parameters: number of inflight segments,
the average size of each inflight segment, the approximate number
of inflight acks, the number of un-ambiguous acks, the number of
ambiguous acks within each RTT interval, a set of pre-congestion
metrics, the number of current duplicate acks, etc. We used these
values to identify a point that the ACK was highly likely for the
original segment. Otherwise we consider this ACK without timestamp
support, a ambiguous ACK.
If we identified that the ack was for the original segment then
the goal of this project was to attempt to implement a fast
recovery scheme to pre-congestion status because of this
false-congestion event. Adjusting the number of duplicate acks
before a fast-retransmit was not in the scope of this project.
This implementation resulted in the equivalent of a non-slow-start
restart, but while we are in congestion avoidance.
The specifics of the TCP modifications and the testing environment
was deemed Intellectual Property (IP) by the legal staff at
the client's site. Thus, those specifics have been removed from
this document.
4. Conclusion
The ability to identify whether a ACK is for a original or
re-transmitted segment should be common sense without SACK or
timestamp option if the number of in-flight segments was large
enough and the ACK came shortly after the fast retransmit.
However, given a large enough in-flight ACK reduction, and
decreasing the number of in-flight ACKs, an implementation
needs to support a segment burst methodology and the ability
to determine that their the ACK is still ambiguous. It is
highly likely that if the implementation uses an aggressive
method, that some ACKs really are ambiguous (retransmit ACK)
but are treated as non-ambiguous.
Significant bandwidth recovery up to 50% can result depending
on the now non-ambiguous ACKs. The amount of recovery is based
partially on the amount of aggressiveness of the segment burst
method used. It is also based on the quantity of ooo segments
and the amount of drift of those segments. Some experimental
TCP RFCs have suggested methods to decrease the likeness to
generate localized congestion when restoring or generating a
number of in-flight segments.
5. References
KARN, P. [Aug 1987] Improving Round-Trip Times Estimates in
Reliable Transport Protocols, Proceedings of the ACM SIGCOMM '87.
[RFC2525] Paxson, Allman, etc. "Known TCP Implementation Problems", RFC 2525, March 1999.
[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for TCP", RFC 3522, April 2003.
6. Security Considerations
This memo does not create any new security issues for the
TCP protocol.
7. Author's Address
Mitchell Erblich erblichs@earthlink.net
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.