Internet DRAFT - draft-sabatini-tdlc
draft-sabatini-tdlc
Internet Engineering Task Force A. Sabatini
Internet-Draft Broker Communications Inc.
Intended status: Standards Track August 28, 2012
Expires: March 1, 2013
TDLC, An improved "Selective Repeat" protocol.
draft-sabatini-tdlc-00
Abstract
This paper describes an extremely efficient and reliable real-time
"Selective Repeat" protocol that was implemented as the backbone
transmission protocol for the largest financial information network
in the United States during the 1980's and 1990's. The protocol is
capable of working efficiently in all environments including those
with substantial delay and high error or congestion rates,
transmitting optimally down to the point that there is one error per
block, never resending a block that has been received correctly and
always sending the blocks in an order which is optimal for
expeditious delivery. This protocol represents a new way of looking
at "Selective Repeat" protocols and the concepts it introduces have
wide applicability in other protocols. It works equally well in
point to multi-point environments as it does point to point ones. It
has been used as both a link level and a transport level protocol.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 1, 2013.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
Sabatini Expires March 1, 2013 [Page 1]
Internet-Draft TDLC Theory August 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Underlying Concepts . . . . . . . . . . . . . . . . . . . . . 4
4. The HDLC (or X.25) protocol . . . . . . . . . . . . . . . . . 4
5. Considerations in designing TDLC . . . . . . . . . . . . . . . 5
5.1. Re-Transmission . . . . . . . . . . . . . . . . . . . . . 5
5.2. Sequence state . . . . . . . . . . . . . . . . . . . . . . 5
5.3. Transmit sequence numbering . . . . . . . . . . . . . . . 6
5.4. Window Size . . . . . . . . . . . . . . . . . . . . . . . 7
5.5. Connection and Re-Connection . . . . . . . . . . . . . . . 7
5.6. Use of Receive Ready . . . . . . . . . . . . . . . . . . . 8
5.7. Point to Multipoint . . . . . . . . . . . . . . . . . . . 9
5.8. Calculation of round trip delay . . . . . . . . . . . . . 9
6. Performance . . . . . . . . . . . . . . . . . . . . . . . . . 9
7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
9. Security Considerations . . . . . . . . . . . . . . . . . . . 10
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
10.1. Normative References . . . . . . . . . . . . . . . . . . . 10
10.2. Informative References . . . . . . . . . . . . . . . . . . 11
Appendix A. Related Readings . . . . . . . . . . . . . . . . . . 11
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12
Sabatini Expires March 1, 2013 [Page 2]
Internet-Draft TDLC Theory August 2012
1. Introduction
Although known and used in the financial community for over 20 years,
and the subject of presentations to both industry and academic groups
there has never been an academic paper written on this protocol. As
both the inventor of this protocol and the original implementer I
felt it was my responsibility to correct this oversight. This
protocol is a direct result of my studies of the Yu-Lin protocol and
the first implementation of it was in 1983. On the first anniversary
of its industry publication in 1984, Telerate decided not to assert
intellectual property rights and to add it to the public domain. Two
attempts were made to commercialize it outside of the financial
industry, once by AT&T in 1988 and by AT&T Paradyne in 1991 but lack
of an approved standard ultimately lead in both cases to AT&T
abandoning the project.
Since Telerate operated worldwide over an amazing number of
communications technologies, TDLC was born out of the need for a
protocol that would handle any communications environment no matter
how noisy, how slow or how much delay (including multiple satellite
hops) was in the path. In later years its properties were found
valuable in congestion situations where packets were dropped.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. History
Both major classes of automatic repeat request (ARQ) transport
protocols, represented by SDLC/HDLC/X.25 and TCP, have an implicit
assumption that the delay between sender and receiver is very small.
Because of this assumption when used over a link with considerable
delay their error recovery mechanisms are inadequate to insure
optimum use of the link or consistent delivery times.
Over the years a number of papers suggesting improved "GO BACK N",
and selective repeat (SR) procedures and a number of protocols (such
as SACK, XTP, NETBLT) have been proposed but none addressed the
underlying considerations of recreating the current state of the
receiver at the transmitter.
Sabatini Expires March 1, 2013 [Page 3]
Internet-Draft TDLC Theory August 2012
3. Underlying Concepts
In order for a sender to know how to optimally transmit messages to a
receiver the sender must recreate the state of the receiver as of the
last acknowledgement received and then "age" or modify that state by
updating it based upon the messages sent since the state implicit in
the acknowledgement was current. In order to do this the sender must
maintain a transmission order buffer which lists the message number
of each message as it is sent. In TDLC we called the index into the
transmission order buffer "Send State" and transmitted this state
variable with each message. The receiver, after correctly receiving
the message, saves this value and returns it (now called "Returned
State") and the list of outstanding messages and their status with
each acknowledgement. When the sender receives this information it
is then capable of constructing a list of missing messages by taking
the list of such messages from the received acknowledgement and then
removing from that list all messages that have been transmitted since
the message which caused the acknowledgement which is all messages
sent with indexes between the current "Send State" and the "Returned
State" in the acknowledgement message. The list of outstanding
messages is sent as two indexes and a bit array whose length is the
difference between these two indexes. The first index is the "Next
Confirmed", this value confirms that all messages up to (but not
including) this value have been received correctly. The second index
is the value of the "Next Highest" which is the highest message
number received plus one. The reason that these are both "plus one"
is to facilitate link start where there are no previous messages and
thus the values would become undefined. The bit array has a one bit
if the message is being acknowledged; a zero if has not been
received.
Thus by transmitting the complete acknowledgement information from
the receiver along with an indicator to the sender as to its state
current at the time of the acknowledgement the sender can accurately
recreate the current status of the receiver assuming all "in flight"
messages were received and thus only send the unacknowledged messages
starting with the oldest along with any new messages whose
retransmission is requested.
4. The HDLC (or X.25) protocol
All "Selective Repeat" protocols must recreate the state of the
receiver and ultimately implement, perhaps implicitly, the concepts
outlined above. It is therefore instructive to look at how HDLC
actually works. One of the least understood aspects of this protocol
is that the receive sequence number performs three functions
simultaneously, it provides a floor for the acknowledgement system
Sabatini Expires March 1, 2013 [Page 4]
Internet-Draft TDLC Theory August 2012
(it acknowledges all messages up to this one), it indicates the last
message received correctly (highest in sequence) and it establishes
the sequence of the messages received (and thus transmitted). This
is accomplished by the rules of its use. Since you only ever
acknowledge the last in sequence message you received, you implicitly
acknowledge all that preceded it. Since HDLC is a "GO BACK N"
protocol, the sequence of the messages is always identical to the
order of transmission (you discard those messages which are out of
message sequence). Even with the inclusion of a selective reject
construct you can only selectively reject the first possible missing
message (the message at "N").
Thus to provide any enhancements to this class of protocols, as we
have done in TDLC, we must first split these functions into three
separate entities, namely a "Confirmed" which indicates that all the
messages up through this one have been received properly, a "Highest
Received" which indicates the highest message number of any message
properly received that is within the window, and a sequence state
token which denotes the order in which the messages were sent.
5. Considerations in designing TDLC
5.1. Re-Transmission
Given that we must retransmit a series of messages whose number
certainly exceeds 8, we must find some convenient but compact
methodology to express which messages need to be re-transmitted. The
two most obvious are either a bit map or ordered pairs of indexes.
Since there is a high probability that the number of messages which
needs to be transmitted is fairly small, the most compact system, a
bit map, was chosen. Since it is inelegant to send more bits than
are actually needed we should only transmit those bits between the
limits "Next Confirmed" and "Next highest received". Although it has
been found more practical to transmit all of these bits, one need not
send the bit for "Next Confirmed", which by definition you know is
missing. "Highest Received" is a different case since it may have
been set by an "RR" message and thus the message itself may not have
been received.
5.2. Sequence state
One property of any transmission environment which is often ignored
is that of delay. Even on a locally attached link the time involved
in physically sending and receiving the message along with the delay
before it is actually processed introduces an element of uncertainty
as to what the other end of the link is doing, what messages it has
seen and whether it has received a request for retransmission. This
Sabatini Expires March 1, 2013 [Page 5]
Internet-Draft TDLC Theory August 2012
problem can be solved relatively easily by including a sequence state
index with each message sent. This token, which is returned with the
next transmission from the other end of the link, tells the
transmitter which messages were sent after the point that the
receiver is acknowledging. Thus each message being transmitted,
whether an original transmission or a re-transmission, increments the
sequence state. By maintaining a vector, whose index is sequence
state, of the message numbers in the order in which the messages were
transmitted or retransmitted, when the sequence state is returned to
the transmitting end, the transmitter knows which messages the
receiver should have seen and if the receiver still indicates that it
has not yet received a particular message that was transmitted then
it must be queued for transmission again.
In a practical application of this technology, a bit map
corresponding to all message numbers is kept by the transmitter. As
each message is added to the queue to be transmitted, its
corresponding bit in the bit map is turned off, and as it is
transmitted it is turned on. When a bit map representing message
status is received, the bits for all unacknowledged messages are
turned off in the transmitter bitmap. Then using the received state
as an index into the vector of transmitted messages, any messages
that the transmitter has sent since then are turned back on. The bit
map of outgoing messages is then rescanned starting at message "Next
Confirmed" and the message associated with the first zero bit is set
as the next message to transmit. The transmitting process then
continues processing the bit map transmitting messages which have a
zero bit until it reaches "Next Highest Queued".
A variant of TDLC for higher speed links was designed which used two
byte message numbers and a list of acknowledged message ranges in
place of the bit map in the original protocol
5.3. Transmit sequence numbering
One problem associated with SDLC which does not exist, for example,
in DECNET is that of losing the last message or series of messages
with no indication until the link times out. This is a small problem
at low data rates and short transmission delay environments, but can
severely degrade the link under any other circumstances. This is due
to the fact that only the "I" frame has a transmit sequence number as
well as a receive sequence number, thus there is no way to tell from
the "link idle" RR that a message is in fact missing. The obvious
solution to this problem is to include a second, transmit sequence
number on the "RR", "REJ" and "SREJ" packets (which are very short
anyway) to facilitate recovery. In TDLC we treat all messages,
whether data carrying or not, equally thus the complete state is
transmitted with each message. Since we are sending the complete
Sabatini Expires March 1, 2013 [Page 6]
Internet-Draft TDLC Theory August 2012
state only a "RR" message is needed, the "REJ" and "SREJ" are
implicit.
5.4. Window Size
One concept that has not been well explored for "GO BACK N" protocols
is the effect of window size on data recovery. When not explicitly
acknowledging all members in a sequence, but rather individual
messages you create a zone of uncertainty as to which message numbers
are merely late acknowledgments or retransmissions and which are
errored but CRC correct messages (either through the transmission
media or program problem). Thus because of these two features one
must assume a minimum of a "window-size" leading and a "window-size"
trailing interval in which the message number can legally reside.
Thus with NO protection from errored messages, the window size can be
at most half the message modulus and good practice, because it is
easier and faster to deal with powers of 2, dictates we should keep
it at one quarter of modulus, thus allowing us to know with certainty
that messages outside of this range should be ignored.
5.5. Connection and Re-Connection
One of the known problems in SDLC is that of link establishment. In
normal SDLC since there is no information passed in the SABM packet
other than the request for connection, it is impossible to explicitly
tie UA replies to their respective SABM other than by allowing
sufficient time to elapse such that it is impossible for more than
one SABM or UA to be outstanding (interestingly the same error
resolution by waiting for extended periods is the ultimate heart of
all Binary Synchronous Protocols). This leads to excessive line time
on long delay circuits and still does not guard against SABM
collisions where each side initializes separately and tries to send
SABM's simultaneously. This problem would just be of theoretical
interest if it only occurred when one side or the other accomplished
a physical restart of the link, however with the lack of a logical
restart mechanism the physical restart using SABM occurs much more
frequently in the real world and results in a total destruction and
re-establishment of the link every time there is a communications
interruption which exceeds the limits of it's timers as often happens
with noisy satellite environments or frame relay transmissions in a
congested environment. Another frightful by-product of this behavior
is that of the link layer of HDLC makes un-OSI demands on the layers
above; delivering restart indications all the way up through the
protocol stack. The answer to these problems is the introduction of
two new commands and two new replies, "INITIAL CONNECT" (ICN), "RE-
CONNECT" (RCN), "INITIAL CONNECT ACKNOWLEDGE" (ICA), and "RE-CONNECT
ACKNOWLEDGE" (RCA). These new commands allow the connected link
layers to differentiate between a Physical Restart by using an ICN
Sabatini Expires March 1, 2013 [Page 7]
Internet-Draft TDLC Theory August 2012
command or an ICA reply and a Logical Restart by using a RCN command
or a RCA reply. When a link is re-established after an interruption,
the exchange of RCN/RCA reconnects the link without affecting the
upper layers of the protocol. If either side needs to have the link
counters initialized or the message buffers flushed it need only send
an ICN or, in response to an RCN, an ICA to force link
initialization.
To guard against possible link re-establishment where the circuit may
have been looped back causing acknowledgment and disposal of messages
that weren't actually received and the possibility of the physical
switching of a circuit from one active port to another, the data
field in the ICN or RCN command will have a random identity stamp and
ICA or RCA will return these stamps so that replies can be matched.
By comparison of these stamps one can establish/determine the
absolute link session information.
5.6. Use of Receive Ready
Since the goal of a real time protocol is to be event driven ("ACK
Clocked" in TCP nomenclature) and thus much less dependent on timers
to correct missing message situations the rules of about using
Receive Ready have been broadened in order to maximize the
probability of error correction without timer expiration.
Note that receiving a "RR" packet with a changed send token is a
change in the state of the receiver so the receiver must return a
"RR" in response.
The first rule about using Receive Ready is that after the receipt of
an ICA or RCA packet the receiving end will immediately send an "RR"
packet. This is in order to allow the far end to start transmitting,
the far end, once it is in connection state, can not transmit until
an "RR" is received. Like wise if an end sends an ICA or RCA it must
immediately follow with a "RR" to allow the far end to start
transmitting data.
The second rule is that after the transmission of the last
information packet "I" a "RR" packet is always sent. Since a loss of
the last information packet will not be noticed until a "RR" or "I"
is received, if we send one immediately we can help insure the
detection of the loss of the last data packet immediately.
The third rule is that if there are messages that are unacknowledged
or undelivered and the link has gone idle, another "RR" packet is
sent after an interval T3 where T3 is typically one quarter of the
roundtrip delay between the two ends (or defaulted to one quarter of
T1) as a maximum. If there are no messages outstanding "RR" is sent
Sabatini Expires March 1, 2013 [Page 8]
Internet-Draft TDLC Theory August 2012
as a "heartbeat" message after interval T1 where T1 is typically
slightly greater than twice the total roundtrip delay with a default
of four seconds. T2, the link disconnected timer, is typically ten
times T1 or defaults to 40 seconds. On dedicated links where
transmitting a "RR" has no cost the preferred practice is simply to
repeatedly send "RR" packets when the link is idle.
5.7. Point to Multipoint
TDLC has been used in both polled multipoint and broadcast multipoint
with individual low speed acknowledgement channels. The inclusive
oring of all retransmission requests accommodates the retransmission
needs of all tributaries at once.
One major issue in multipoint is a single station making so many
retransmission requests that it impairs the entire channel. Since
TDLC had stringent real time delivery requirements an upper level
started canceling messages which were not received within 10 seconds,
causing the offending terminal to restart but protecting the
integrity of the overall facility. Since this is not a generalized
solution counts were maintained of the number of messages requested
for retransmission by station and an error threshold was established,
those stations exceeding the threshold were disconnected. Further
study is warranted to establish a better threshold mechanism.
5.8. Calculation of round trip delay
One pleasant benefit of having a token which is immediately returned
by the far end is the easy calculation of round trip delay. In TDLC
we can save a time stamp along with the message number in our
transmission order array. This allows us to calculate round trip
delay when we receive our transmission state value and use it to
access the timestamp. Since more than one received message might
have the same transmission state value we zero the timestamp after
use to indicate that the value should not be used again. Note that
if an acknowledgement is lost we will calculate a longer delay than
is accurate therefore we must smooth the returned values, typically
returning the smallest out of the last N (where N in TDLC was four).
6. Performance
The point to point performance of TDLC has been extensively tested
(it was a production protocol for twenty years in a worldwide network
of 88,000 computers), this testing confirming its near optimal
performance no matter what the combination of message loss and delay.
This was due to each reply having both the complete state of the
receiver and that of the sender at the time the reply was generated.
Sabatini Expires March 1, 2013 [Page 9]
Internet-Draft TDLC Theory August 2012
For each reply that got through the message stream could be re-
optimized to reflect the current state at the far end.
Since the point to point version was designed for dedicated bandwidth
situations it was normally configured to send "RR" messages at close
to continuous rates. Since the hardware generated and processed
these directly (later versions of the protocol were implemented
directly in the line interface) they imposed no overhead in a
dedicated bandwidth situation but provided the earliest possible
indication of message loss and thus the least overall latency.
7. Conclusion
TDLC represents an entirely new way at looking at selective
acknowledgement protocols. Using the concepts outlined a whole new
family of protocols optimized for long delay or high loss rates
(especially due to noise or congestion) is possible. TDLC itself
represents a large improvement over HDLC based protocols and can be
used as a replacement transport protocol with little additional
overhead anywhere that HDLC could be used.
8. IANA Considerations
This memo includes no request to IANA.
All drafts are required to have an IANA considerations section (see
the update of RFC 2434 [I-D.narten-iana-considerations-rfc2434bis]
for a guide). If the draft does not require IANA to do anything, the
section contains an explicit statement that this is the case (as
above). If there are no requirements for IANA, the section will be
removed during conversion into an RFC by the RFC Editor.
9. Security Considerations
All drafts are required to have a security considerations section.
See RFC 3552 [RFC3552] for a guide.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Sabatini Expires March 1, 2013 [Page 10]
Internet-Draft TDLC Theory August 2012
10.2. Informative References
[I-D.narten-iana-considerations-rfc2434bis]
Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs",
draft-narten-iana-considerations-rfc2434bis-09 (work in
progress), March 2008.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC0998] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A bulk data
transfer protocol", RFC 998, March 1987.
[RFC1072] Jacobson, V. and R. Braden, "TCP extensions for long-delay
paths", RFC 1072, October 1988.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
June 1999.
[RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC
Text on Security Considerations", BCP 72, RFC 3552,
July 2003.
Appendix A. Related Readings
S. Lin and P. S. Yu, "An effective error control scheme for satellite
communications", IEEE Transactions on Communications, vol. COM-28,
no. 3, pp. 395--401, Mar. 1980.
P. S. Yu and S. Lin, "An efficient selective-repeat ARQ scheme for
satellite channels and its throughput analysis," IEEE Transactions on
Communications, vol. 29, no. 3, pp. 353--363, March 1981.
S. Lin and P. S. Yu, "A hybrid ARQ scheme with parity retransmission
for error control of satellite channels," IEEE Trans. Commun., vol.
30, pp. 1701-1719, July 1982.
M. J. Miller and S. L. Lin. "The analysis of some selective-repeat
ARQ schemes with finite receiver buffer", IEEE Transactions on
Communications, Vol. COM-29, No. 9, pages 1307--1315, September 1981.
S. R. Chandran and S. Lin, "A selective repeat ARQ scheme for point-
to-multipoint communications and its throughput analysis," ACM
Sabatini Expires March 1, 2013 [Page 11]
Internet-Draft TDLC Theory August 2012
Computer Communications Review, vol. 16, pp. 292--301, Aug. 1986.
S. R. Chandran and S. Lin, "Selective-repeat-ARQ schemes for
broadcast links," IEEE Trans. Commun., vol. 40, no. 1, pp. 12-19,
Jan. 1992.
H. M. de Lima, O. Carlos and M. B. Duarte, "A Point-to-Multipoint ARQ
Scheme with Multicopy Retransmission for High Speed Satellite
Communications" IEEE International Telecommunication Symposium
ITS'94, (Rio de Janeiro, 1994)
A. N. Netravali, W. D. Roome, and K. Sabnani, "Design and
implementation of a high-speed transport protocol," IEEE Transactions
on Communications, vol. COM38, pp. 2010.
K. K. Sabnani and A. N. Netravali, "A high speed transport protocol
for datagram/virtual circuit networks", ACM SIGCOMM Computer
Communication Review , Symposium proceedings on Communications
architectures & protocols SIGCOMM '89, Volume 19 Issue 4, Aug. 1989.
D. Towsley and S. Mithal, "A selective repeat ARQ protocol for a
point to multipoint channel," in IEEE International Conference on
Communications INFOCOM'87, (San Francisco,CA), pp. 521--526, Apr.
1987.
E. Weldon, "An improved selective repeat ARQ strategy", IEEE
Transactions on Communications, vol. COM-30, no. 3, pp. 480--486,
Mar. 1982.
J. L. Wang and J. A. Silvester, "Optimal adaptative multireceiver ARQ
protocols," IEEE Transactions on Communications, vol. COM-41, pp.
1816--1829, Dec. 1993.
T. Shikama and T. Mizuno, "Resequencing schemes for selective-repeat
ARQ and their performance", Advanced Information Networking and
Applications, 2005. AINA 2005. 19th International Conference on
Volume 2, Issue , Page(s): 491 - 494 vol.2, Mar. 2005.
R. Sanders and A. Weaver 1990. "The Xpress transfer protocol (XTP):
A tutorial", SIGCOMM Comp. Comm. Rev. 20, 5, Pg 67-80, Oct. 1990.
Sabatini Expires March 1, 2013 [Page 12]
Internet-Draft TDLC Theory August 2012
Author's Address
Anthony Sabatini
Broker Communications Inc.
200 West 20th Street
Apt. 1216
New York, NY 10011
US
Phone: +1 212 867 7179
Email: tsabatini@hotmail.com
URI: http://tsabatini.com/
Sabatini Expires March 1, 2013 [Page 13]