Internet DRAFT - draft-kuehlewind-tcpm-accurate-ecn
draft-kuehlewind-tcpm-accurate-ecn
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe
Internet-Draft Simula Research Laboratory
Intended status: Experimental M. Kuehlewind
Expires: April 21, 2016 ETH Zurich
R. Scheffenegger
NetApp, Inc.
October 19, 2015
More Accurate ECN Feedback in TCP
draft-kuehlewind-tcpm-accurate-ecn-05
Abstract
Explicit Congestion Notification (ECN) is a mechanism where network
nodes can mark IP packets instead of dropping them to indicate
incipient congestion to the end-points. Receivers with an ECN-
capable transport protocol feed back this information to the sender.
ECN is specified for TCP in such a way that only one feedback signal
can be transmitted per Round-Trip Time (RTT). Recently, new TCP
mechanisms like Congestion Exposure (ConEx) or Data Center TCP
(DCTCP) need more accurate ECN feedback information whenever more
than one marking is received in one RTT. This document specifies an
experimental scheme to provide more than one feedback signal per RTT
in the TCP header. Given TCP header space is scarce, it overloads
the three existing ECN-related flags in the TCP header and provides
additional information in a new TCP option.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 21, 2016.
Briscoe, et al. Expires April 21, 2016 [Page 1]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4
1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 6
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 7
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 8
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 8
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 9
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10
2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 10
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 11
3.1. Negotiation during the TCP handshake . . . . . . . . . . 11
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 14
3.2.1. The ACE Field . . . . . . . . . . . . . . . . . . . . 14
3.2.2. Safety against Ambiguity of the ACE Field . . . . . . 16
3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 16
3.2.4. Path Traversal of the AccECN Option . . . . . . . . . 17
3.2.5. Usage of the AccECN TCP Option . . . . . . . . . . . 19
3.3. AccECN Compliance by TCP Proxies, Offload Engines and
other Middleboxes . . . . . . . . . . . . . . . . . . . . 20
4. Interaction with Other TCP Variants . . . . . . . . . . . . . 21
4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 21
4.2. Compatibility with Other TCP Options and Experiments . . 21
4.3. Compatibility with Feedback Integrity Mechanisms . . . . 21
5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 23
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
7. Security Considerations . . . . . . . . . . . . . . . . . . . 25
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 26
Briscoe, et al. Expires April 21, 2016 [Page 2]
Internet-Draft Accurate TCP-ECN Feedback October 2015
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 26
10.1. Normative References . . . . . . . . . . . . . . . . . . 26
10.2. Informative References . . . . . . . . . . . . . . . . . 27
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 29
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 29
A.2. Example Algorithm for Safety Against Long Sequences of
ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2.1. Safety Algorithm without the AccECN Option . . . . . 30
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 32
A.3. Example Algorithm to Estimate Marked Bytes from Marked
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 34
A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 35
Appendix B. Alternative Design Choices (To Be Removed Before
Publication) . . . . . . . . . . . . . . . . . . . . 35
Appendix C. Open Protocol Design Issues (To Be Removed Before
Publication) . . . . . . . . . . . . . . . . . . . . 36
Appendix D. Changes in This Version (To Be Removed Before
Publication) . . . . . . . . . . . . . . . . . . . . 37
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37
1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where
network nodes can mark IP packets instead of dropping them to
indicate incipient congestion to the end-points. Receivers with an
ECN-capable transport protocol feed back this information to the
sender. ECN is specified for TCP in such a way that only one
feedback signal can be transmitted per Round-Trip Time (RTT).
Recently, proposed mechanisms like Congestion Exposure (ConEx
[I-D.ietf-conex-abstract-mech]) or DCTCP [I-D.bensley-tcpm-dctcp]
need more accurate ECN feedback information whenever more than one
marking is received in one RTT. A fuller treatment of the motivation
for this specification is given in the associated requirements
document [RFC7560].
This documents specifies an experimental scheme for ECN feedback in
the TCP header to provide more than one feedback signal per RTT. It
will be called the more accurate ECN feedback scheme, or AccECN for
short. If AccECN progresses from experimental to the standards
track, it is intended to be a complete replacement for classic ECN
feedback, not a fork in the design of TCP. Thus, the applicability
of AccECN is intended to include all public and private IP networks
(and even any non-IP networks over which TCP is used today). Until
the AccECN experiment succeeds, [RFC3168] will remain as the
standards track specification for adding ECN to TCP. To avoid
confusion, in this document we use the term 'classic ECN' for the
pre-existing ECN specification [RFC3168].
Briscoe, et al. Expires April 21, 2016 [Page 3]
Internet-Draft Accurate TCP-ECN Feedback October 2015
AccECN is solely an (experimental) change to the TCP wire protocol.
It is completely independent of how TCP might respond to congestion
feedback. This specification overloads flags and fields in the main
TCP header with new definitions, so both ends have to support the new
wire protocol before it can be used. Therefore during the TCP
handshake the two ends use the three ECN-related flags in the TCP
header to negotiate the most advanced feedback protocol that they can
both support.
It is likely (but not required) that the AccECN protocol will be
implemented along with the following experimental additions to the
TCP-ECN protocol: ECN-capable SYN/ACK [RFC5562], ECN path-probing and
fall-back [I-D.kuehlewind-tcpm-ecn-fallback] and testing receiver
non-compliance [I-D.moncaster-tcpm-rcv-cheat].
1.1. Document Roadmap
The following introductory sections outline the goals of AccECN
(Section 1.2) and the goal of experiments with ECN (Section 1.3) so
that it is clear what success would look like. Then terminology is
defined (Section 1.4) and a recap of existing prerequisite technology
is given (Section 1.5).
Section 2 gives an informative overview of the AccECN protocol. Then
Section 3 gives the normative protocol specification. Section 4
assesses the interaction of AccECN with commonly used variants of
TCP, whether standardised or not. Section 5 summarises the features
and properties of AccECN.
Section 6 summarises the protocol fields and numbers that IANA will
need to assign and Section 7 points to the aspects of the protocol
that will be of interest to the security community.
Appendix A gives pseudocode examples for the various algorithms that
AccECN uses.
1.2. Goals
[RFC7560] enumerates requirements that a candidate feedback scheme
will need to satisfy, under the headings: resilience, timeliness,
integrity, accuracy (including ordering and lack of bias),
complexity, overhead and compatibility (both backward and forward).
It recognises that a perfect scheme that fully satisfies all the
requirements is unlikely and trade-offs between requirements are
likely. Section 5 presents the properties of AccECN against these
requirements and discusses the trade-offs made.
Briscoe, et al. Expires April 21, 2016 [Page 4]
Internet-Draft Accurate TCP-ECN Feedback October 2015
The requirements document recognises that a protocol as ubiquitous as
TCP needs to be able to serve as-yet-unspecified requirements.
Therefore an AccECN receiver aims to act as a generic (dumb)
reflector of congestion information so that in future new sender
behaviours can be deployed unilaterally.
1.3. Experiment Goals
TCP is critical to the robust functioning of the Internet, therefore
any proposed modifications to TCP need to be thoroughly tested. The
present specification describes an experimental protocol that adds
more accurate ECN feedback to the TCP protocol. The intention is to
specify the protocol sufficiently so that more than one
implementation can be built in order to test its function, robustness
and interoperability (with itself and with previous version of ECN
and TCP).
The experimental protocol will be considered successful if it
satisfies the requirements of [RFC7560] in the consensus opinion of
the IETF tcpm working group. In short, this requires that it
improves the accuracy and timeliness of TCP's ECN feedback, as
claimed in Section 5, while striking a balance between the
conflicting requirements of resilience, integrity and minimisation of
overhead. It also requires that it is not unduly complex, and that
it is compatible with prevalent equipment behaviours in the current
Internet, whether or not they comply with standards.
1.4. Terminology
AccECN: The more accurate ECN feedback scheme will be called AccECN
for short.
Classic ECN: the ECN protocol specified in [RFC3168].
Classic ECN feedback: the feedback aspect of the ECN protocol
specified in [RFC3168], including generation, encoding,
transmission and decoding of feedback, but not the Data Sender's
subsequent response to that feedback.
ACK: A TCP acknowledgement, with or without a data payload.
Pure ACK: A TCP acknowledgement without a data payload.
TCP client: The TCP stack that originates a connection.
TCP server: The TCP stack that responds to a connection request.
Briscoe, et al. Expires April 21, 2016 [Page 5]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Data Receiver: The endpoint of a TCP half-connection that receives
data and sends AccECN feedback.
Data Sender: The endpoint of a TCP half-connection that sends data
and receives AccECN feedback.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
1.5. Recap of Existing ECN feedback in IP/TCP
ECN [RFC3168] uses two bits in the IP header. Once ECN has been
negotiated with the receiver at the transport layer, an ECN sender
can set two possible codepoints (ECT(0) or ECT(1)) in the IP header
to indicate an ECN-capable transport (ECT). If both ECN bits are
zero, the packet is considered to have been sent by a Not-ECN-capable
Transport (Not-ECT). When a network node experiences congestion, it
will occasionally either drop or mark a packet, with the choice
depending on the packet's ECN codepoint. If the codepoint is Not-
ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1),
the node can mark the packet by setting both ECN bits, which is
termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'.
Table 1 summarises these codepoints.
+-----------------------+---------------+---------------------------+
| IP-ECN codepoint | Codepoint | Description |
| (binary) | name | |
+-----------------------+---------------+---------------------------+
| 00 | Not-ECT | Not ECN-Capable Transport |
| 01 | ECT(1) | ECN-Capable Transport (1) |
| 10 | ECT(0) | ECN-Capable Transport (0) |
| 11 | CE | Congestion Experienced |
+-----------------------+---------------+---------------------------+
Table 1: The ECN Field in the IP Header
In the TCP header the first two bits in byte 14 are defined as flags
for the use of ECN (CWR and ECE in Figure 1 [RFC3168]). A TCP client
indicates it supports ECN by setting ECE=CWR=1 in the SYN, and an
ECN-enabled server confirms ECN support by setting ECE=1 and CWR=0 in
the SYN/ACK. On reception of a CE-marked packet at the IP layer, the
Data Receiver starts to set the Echo Congestion Experienced (ECE)
flag continuously in the TCP header of ACKs, which ensures the signal
is received reliably even if ACKs are lost. The TCP sender confirms
that it has received at least one ECE signal by responding with the
congestion window reduced (CWR) flag, which allows the TCP receiver
to stop repeating the ECN-Echo flag. This always leads to a full RTT
Briscoe, et al. Expires April 21, 2016 [Page 6]
Internet-Draft Accurate TCP-ECN Feedback October 2015
of ACKs with ECE set. Thus any additional CE markings arriving
within this RTT cannot be fed back.
The ECN Nonce [RFC3540] is an optional experimental addition to ECN
that the TCP sender can use to protect against accidental or
malicious concealment of marked or dropped packets. The sender can
send an ECN nonce, which is a continuous pseudo-random pattern of
ECT(0) and ECT(1) codepoints in the ECN field. The receiver is
required to feed back a 1-bit nonce sum that counts the occurrence of
ECT(1) packets using the last bit of byte 13 in the TCP header, which
is defined as the Nonce Sum (NS) flag.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 1: The (post-ECN Nonce) definition of the TCP header flags
2. AccECN Protocol Overview and Rationale
This section provides an informative overview of the AccECN protocol
that will be normatively specified in Section 3
Like the original TCP approach, the Data Receiver of each TCP half-
connection sends AccECN feedback to the Data Sender on TCP
acknowledgements, reusing data packets of the other half-connection
whenever possible.
The AccECN protocol has had to be designed in two parts:
o an essential part that re-uses ECN TCP header bits to feed back
the number of arriving CE marked packets. This provides more
accuracy than classic ECN feedback, but limited resilience against
ACK loss;
o a supplementary part using a new AccECN TCP Option that provides
additional feedback on the number of bytes that arrive marked with
each of the three ECN codepoints (not just CE marks). This
provides greater resilience against ACK loss than the essential
feedback, but it is more likely to suffer from middlebox
interference.
The two part design was necessary, given limitations on the space
available for TCP options and given the possibility that certain
incorrectly designed middleboxes prevent TCP using any new options.
Briscoe, et al. Expires April 21, 2016 [Page 7]
Internet-Draft Accurate TCP-ECN Feedback October 2015
The essential part overloads the previous definition of the three
flags in the TCP header that had been assigned for use by ECN. This
design choice deliberately replaces the classic ECN feedback
protocol, rather than leaving classic ECN feedback intact and adding
more accurate feedback separately because:
o this efficiently reuses scarce TCP header space, given TCP option
space is approaching saturation;
o a single upgrade path for the TCP protocol is preferable to a fork
in the design;
o otherwise classic and accurate ECN feedback could give conflicting
feedback on the same segment, which could open up new security
concerns and make implementations unnecessarily complex;
o middleboxes are more likely to faithfully forward the TCP ECN
flags than newly defined areas of the TCP header.
AccECN is designed to work even if the supplementary part is removed
or zeroed out, as long as the essential part gets through.
2.1. Capability Negotiation
AccECN is a change to the wire protocol of the main TCP header,
therefore it can only be used if both endpoints have been upgraded to
understand it. The TCP client signals support for AccECN on the
initial SYN of a connection and the TCP server signals whether it
supports AccECN on the SYN/ACK. The TCP flags on the SYN that the
client uses to signal AccECN support have been carefully chosen so
that a TCP server will interpret them as a request to support the
most recent variant of ECN feedback that it supports. Then the
client falls back to the same variant of ECN feedback.
An AccECN TCP client does not send the new AccECN Option on the SYN
as SYN option space is limited and successful negotiation using the
flags in the main header is taken as sufficient evidence that both
ends also support the AccECN Option. The TCP server sends the AccECN
Option on the SYN/ACK and the client sends it on the first ACK to
test whether the network path forwards the option correctly.
2.2. Feedback Mechanism
A Data Receiver maintains four counters initialised at the start of
the half-connection. Three count the number of arriving payload
bytes marked CE, ECT(1) and ECT(0) respectively. The fourth counts
the number of packets arriving marked with a CE codepoint (including
control packets without payload if they are CE-marked).
Briscoe, et al. Expires April 21, 2016 [Page 8]
Internet-Draft Accurate TCP-ECN Feedback October 2015
The Data Sender maintains four equivalent counters for the half
connection, and the AccECN protocol is designed to ensure they will
match the values in the Data Receiver's counters, albeit after a
little delay.
Each ACK carries the three least significant bits (LSBs) of the
packet-based CE counter using the ECN bits in the TCP header, now
renamed the Accurate ECN (ACE) field. The LSBs of each of the three
byte counters are carried in the AccECN Option.
2.3. Delayed ACKs and Resilience Against ACK Loss
With both the ACE and the AccECN Option mechanisms, the Data Receiver
continually repeats the current LSBs of each of its respective
counters. Then, even if some ACKs are lost, the Data Sender should
be able to infer how much to increment its own counters, even if the
protocol field has wrapped.
The 3-bit ACE field can wrap fairly frequently. Therefore, even if
it appears to have incremented by one (say), the field might have
actually cycled completely then incremented by one. The Data
Receiver is required not to delay sending an ACK to such an extent
that the ACE field would cycle. However cyling is still a
possibility at the Data Sender because a whole sequence of ACKs
carrying intervening values of the field might all be lost or delayed
in transit.
The fields in the AccECN Option are larger, but they will increment
in larger steps because they count bytes not packets. Nonetheless,
their size has been chosen such that a whole cycle of the field would
never occur between ACKs unless there had been an infeasibly long
sequence of ACK losses. Therefore, as long as the AccECN Option is
available, it can be treated as a dependable feedback channel.
If the AccECN Option is not available, e.g. it is being stripped by a
middlebox, the AccECN protocol will only feed back information on CE
markings (using the ACE field). Although not ideal, this will be
sufficient, because it is envisaged that neither ECT(0) nor ECT(1)
will ever indicate more severe congestion than CE, even though future
uses for ECT(0) or ECT(1) are still unclear. Because the 3-bit ACE
field is so small, when it is the only field available the Data
Sender has to interpret it conservatively assuming the worst possible
wrap.
Certain specified events trigger the Data Receiver to include an
AccECN Option on an ACK. The rules are designed to ensure that the
order in which different markings arrive at the receiver is
communicated to the sender (as long as there is no ACK loss).
Briscoe, et al. Expires April 21, 2016 [Page 9]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Implementations are encouraged to send an AccECN Option more
frequently, but this is left up to the implementer.
2.4. Feedback Metrics
The CE packet counter in the ACE field and the CE byte counter in the
AccECN Option both provide feedback on received CE-marks. The CE
packet counter includes control packets that do not have payload
data, while the CE byte counter solely includes marked payload bytes.
If both are present, the byte counter in the option will provide the
more accurate information needed for modern congestion control and
policing schemes, such as DCTCP or ConEx. If the option is stripped,
a simple algorithm to estimate the number of marked bytes from the
ACE field is given in Appendix A.3.
Feedback in bytes is recommended in order to protect against the
receiver using attacks similar to 'ACK-Division' to artificially
inflate the congestion window, which is why [RFC5681] now recommends
that TCP counts acknowledged bytes not packets.
2.5. Generic (Dumb) Reflector
The ACE field provides information about CE markings on both data and
control packets. According to [RFC3168] the Data Sender is meant to
set control packets to Not-ECT. However, mechanisms in certain
private networks (e.g. data centres) set control packets to be ECN
capable because they are precisely the packets that performance
depends on most.
For this reason, AccECN is designed to be a generic reflector of
whatever ECN markings it sees, whether or not they are compliant with
a current standard. Then as standards evolve, Data Senders can
upgrade unilaterally without any need for receivers to upgrade too.
It is also useful to be able to rely on generic reflection behaviour
when senders need to test for unexpected interference with markings
(for instance [I-D.kuehlewind-tcpm-ecn-fallback] and
[I-D.moncaster-tcpm-rcv-cheat]).
The initial SYN is the most critical control packet, so AccECN
provides feedback on whether it is CE marked, even though it is not
allowed to be ECN-capable according to RFC 3168. However,
middleboxes have been known to overwrite the ECN IP field as if it is
still part of the old Type of Service (ToS) field. If a TCP client
has set the SYN to Not-ECT, but receives CE feedback, it can detect
such middlebox interference and send Not-ECT for the rest of the
connection (see [I-D.kuehlewind-tcpm-ecn-fallback] for the detailed
fall-back behaviour).
Briscoe, et al. Expires April 21, 2016 [Page 10]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Today, if a TCP server receives CE on a SYN, it cannot know whether
it is invalid (or valid) because only the TCP client knows whether it
originally marked the SYN as Not-ECT (or ECT). Therefore, the
server's only safe course of action is to disable ECN for the
connection. Instead, the AccECN protocol allows the server to feed
back the CE marking to the client, which then has all the information
to decide whether the connection has to fall-back from supporting ECN
(or not).
Providing feedback of CE marking on the SYN also supports future
scenarios in which SYNs might be ECN-enabled (without prejudging
whether they ought to be). For instance, in certain environments
such as data centres, it might be appropriate to allow ECN-capable
SYNs. Then, if feedback showed the SYN had been CE marked, the TCP
client could reduce its initial window (IW). It could also reduce IW
conservatively if feedback showed the receiver did not support ECN
(because if there had been a CE marking, the receiver would not have
understood it). Note that this text merely motivates dumb reflection
of CE on a SYN, it does not judge whether a SYN ought to be ECN-
capable.
3. AccECN Protocol Specification
3.1. Negotiation during the TCP handshake
During the TCP handshake at the start of a connection, to request
more accurate ECN feedback the TCP client (host A) MUST set the TCP
flags NS=1, CWR=1 and ECE=1 in the initial SYN segment.
If a TCP server (B) that is AccECN enabled receives a SYN with the
above three flags set, it MUST set both its half connections into
AccECN mode. Then it MUST set the flags CWR=1 and ECE=0 on its
response in the SYN/ACK segment to confirm that it supports AccECN.
The TCP server MUST NOT set this combination of flags unless the
preceding SYN requested support for AccECN as above.
A TCP server in AccECN mode MUST additionally set the flag NS=1 on
the SYN/ACK if the SYN was CE-marked (see Section 2.5). If the
received SYN was Not-ECT, ECT(0) or ECT(1), it MUST clear NS (NS=0)
on the SYN/ACK.
Once a TCP client (A) has sent the above SYN to declare that it
supports AccECN, and once it has received the above SYN/ACK segment
that confirms that the TCP server supports AccECN, the TCP client
MUST set both its half connections into AccECN mode.
If after the normal TCP timeout the TCP client has not received a
SYN/ACK to acknowledge its SYN, the SYN might just have been lost,
Briscoe, et al. Expires April 21, 2016 [Page 11]
Internet-Draft Accurate TCP-ECN Feedback October 2015
e.g. due to congestion, or a middlebox might be blocking segments
with the AccECN flags. To expedite connection setup, the host SHOULD
fall back to NS=CWR=ECE=0 on the retransmission of the SYN. It would
make sense to also remove any other experimental fields or options on
the SYN in case a middlebox might be blocking them, although the
required behaviour will depend on the specification of the other
option(s) and any attempt to co-ordinate fall-back between different
modules of the stack. Implementers MAY use other fall-back
strategies if they are found to be more effective (e.g. attempting to
retransmit a second AccECN segment before fall-back, falling back to
classic ECN feedback rather than non-ECN, and/or caching the result
of a previous attempt to access the same host while negotiating
AccECN).
The fall-back procedure if the TCP server receives no ACK to
acknowledge a SYN/ACK that tried to negotiate AccECN is specified in
Section 3.2.4.
The three flags set to 1 to indicate AccECN support on the SYN have
been carefully chosen to enable natural fall-back to prior stages in
the evolution of ECN. Table 2 tabulates all the negotiation
possibilities for ECN-related capabilities that involve at least one
AccECN-capable host. To compress the width of the table, the
headings of the first four columns have been severely abbreviated, as
follows:
Ac: More *Ac*curate ECN Feedback
N: ECN-*N*once [RFC3540]
E: *E*CN [RFC3168]
I: Not-ECN (*I*mplicit congestion notification using packet drop).
Briscoe, et al. Expires April 21, 2016 [Page 12]
Internet-Draft Accurate TCP-ECN Feedback October 2015
+----+---+---+---+------------+--------------+----------------------+
| Ac | N | E | I | SYN A->B | SYN/ACK B->A | Feedback Mode |
+----+---+---+---+------------+--------------+----------------------+
| | | | | NS CWR ECE | NS CWR ECE | |
| AB | | | | 1 1 1 | 0 1 0 | AccECN |
| AB | | | | 1 1 1 | 1 1 0 | AccECN (CE on SYN) |
| | | | | | | |
| A | B | | | 1 1 1 | 1 0 1 | classic ECN |
| A | | B | | 1 1 1 | 0 0 1 | classic ECN |
| A | | | B | 1 1 1 | 0 0 0 | Not ECN |
| | | | | | | |
| B | A | | | 0 1 1 | 0 0 1 | classic ECN |
| B | | A | | 0 1 1 | 0 0 1 | classic ECN |
| B | | | A | 0 0 0 | 0 0 0 | Not ECN |
| | | | | | | |
| A | | | B | 1 1 1 | 1 1 1 | Not ECN (broken) |
| A | | | | 1 1 1 | 0 1 1 | Not ECN (see Appx B) |
| A | | | | 1 1 1 | 1 0 0 | Not ECN (see Appx B) |
+----+---+---+---+------------+--------------+----------------------+
Table 2: ECN capability negotiation between Originator (A) and
Responder (B)
Table 2 is divided into blocks each separated by an empty row.
1. The top block shows the case already described where both
endpoints support AccECN and how the TCP server (B) indicates
congestion feedback.
2. The second block shows the cases where the TCP client (A)
supports AccECN but the TCP server (B) supports some earlier
variant of TCP feedback, indicated in its SYN/ACK. Therefore, as
soon as an AccECN-capable TCP client (A) receives the SYN/ACK
shown it MUST set both its half connections into the feedback
mode shown in the rightmost column.
3. The third block shows the cases where the TCP server (B) supports
AccECN but the TCP client (A) supports some earlier variant of
TCP feedback, indicated in its SYN. Therefore, as soon as an
AccECN-enabled TCP server (B) receives the SYN shown, it MUST set
both its half connections into the feedback mode shown in the
rightmost column.
4. The fourth block displays combinations that are not valid or
currently unused and therefore both ends MUST fall-back to Not
ECN for both half connections. Especially the first case (marked
`broken') where all bits set in the SYN are reflected by the
receiver in the SYN/ACK, which happens quite often if the TCP
Briscoe, et al. Expires April 21, 2016 [Page 13]
Internet-Draft Accurate TCP-ECN Feedback October 2015
connection is proxied.{ToDo: Consider using the last two cases
for AccECN f/b of ECT(0) and ECT(1) on the SYN (Appendix B)}
The following exceptional cases need some explanation:
ECN Nonce: An AccECN implementation, whether client or server,
sender or receiver, does not need to implement the ECN Nonce
behaviour [RFC3540]. AccECN is compatible with an alternative ECN
feedback integrity approach that does not use up the ECT(1)
codepoint and can be implemented solely at the sender (see
Section 4.3).
Simultaneous Open: An originating AccECN Host (A), having sent a SYN
with NS=1, CWR=1 and ECE=1, might receive another SYN from host B.
Host A MUST then enter the same feedback mode as it would have
entered had it been a responding host and received the same SYN.
Then host A MUST send the same SYN/ACK as it would have sent had
it been a responding host (see the third block above).
3.2. AccECN Feedback
Each Data Receiver maintains four counters, r.cep, r.ceb, r.e0b and
r.e1b. The CE packet counter (r.cep), counts the number of packets
the host receives with the CE code point in the IP ECN field,
including CE marks on control packets without data. r.ceb, r.e0b and
r.e1b count the number of TCP payload bytes in packets marked
respectively with the CE, ECT(0) and ECT(1) codepoint in their IP-ECN
field. When a host first enters AccECN mode, it initialises its
counters to r.cep = 6, r.e0b = 1 and r.ceb = r.e1b.= 0 (see
Appendix A.5). Non-zero initial values are used to be distinct from
cases where the fields are incorrectly zeroed (e.g. by middleboxes).
A host feeds back the CE packet counter using the Accurate ECN (ACE)
field, as explained in the next section. And it feeds back all the
byte counters using the AccECN TCP Option, as specified in
Section 3.2.3. Whenever a host feeds back the value of any counter,
it MUST report the most recent value, no matter whether it is in a
pure ACK, an ACK with new payload data or a retransmission.
3.2.1. The ACE Field
After AccECN has been negotiated on the SYN and SYN/ACK, both hosts
overload the three TCP flags ECE, CWR and NS in the main TCP header
as one 3-bit field. Then the field is given a new name, ACE, as
shown in Figure 2.
Briscoe, et al. Expires April 21, 2016 [Page 14]
Internet-Draft Accurate TCP-ECN Feedback October 2015
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | | U | A | P | R | S | F |
| Header Length | Reserved | ACE | R | C | S | S | Y | I |
| | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: Definition of the ACE field within bytes 13 and 14 of the
TCP Header (when AccECN has been negotiated and SYN=0).
The original definition of these three flags in the TCP header,
including the addition of support for the ECN Nonce, is shown for
comparison in Figure 1. This specification does not rename these
three TCP flags, it merely overloads them with another name and
definition once an AccECN connection has been established.
A host MUST interpret the ECE, CWR and NS flags as the 3-bit ACE
counter on a segment with SYN=0 that it sends or receives if both of
its half-connections are set into AccECN mode having successfully
negotiated AccECN (see Section 3.1). A host MUST NOT interpret the 3
flags as a 3-bit ACE field on any segment with SYN=1 (whether ACK is
0 or 1), or if AccECN negotiation is incomplete or has not succeeded.
Both parts of each of these conditions are equally important. For
instance, even if AccECN negotiation has been successful, the ACE
field is not defined on any segments with SYN=1 (e.g. a
retransmission of an unacknowledged SYN/ACK, or when both ends send
SYN/ACKs after AccECN support has been successfully negotiated during
a simultaneous open).
The ACE field encodes the three least significant bits of the r.cep
counter, therefore its initial value will be 0b110 (decimal 6). This
non-zero initialization allows a TCP server to use a stateless
handshake (see Section 4.1) but still detect from the TCP client's
first ACK that the client considers it has successfully negotiated
AccECN. If the SYN/ACK was CE marked, the client MUST increase its
r.cep counter before it sends its first ACK, therefore the initial
value of the ACE field will be 0b111 (decimal 7). These values have
deliberately been chosen such that they are distinct from [RFC5562]
behaviour, where the TCP client would set ECE on the first ACK as
feedback for a CE mark on the SYN/ACK.
If the value of the ACE field on the first segment with SYN=0 in
either direction is anything other than 0b110 or 0b111, the Data
Receiver MUST disable ECN for the remainder of the half-connection by
marking all subsequent packets as Not-ECT.
Briscoe, et al. Expires April 21, 2016 [Page 15]
Internet-Draft Accurate TCP-ECN Feedback October 2015
3.2.2. Safety against Ambiguity of the ACE Field
If too many CE-marked segments are acknowledged at once, or if a long
run of ACKs is lost, the 3-bit counter in the ACE field might have
cycled between two ACKs arriving at the Data Sender.
Therefore an AccECN Data Receiver SHOULD immediately send an ACK once
'n' CE marks have arrived since the previous ACK, where 'n' SHOULD be
2 and MUST be no greater than 6.
If the Data Sender has not received AccECN TCP Options to give it
more dependable information, and it detects that the ACE field could
have cycled under the prevailing conditions, it SHOULD conservatively
assume that the counter did cycle. It can detect if the counter
could have cycled by using the jump in the acknowledgement number
since the last ACK to calculate or estimate how many segments could
have been acknowledged. An example algorithm to implement this
policy is given in Appendix A.2. An implementer MAY develop an
alternative algorithm as long as it satisfies these requirements.
If missing acknowledgement numbers arrive later (reordering) and
prove that the counter did not cycle, the Data Sender MAY attempt to
neutralise the effect of any action it took based on a conservative
assumption that it later found to be incorrect.
3.2.3. The AccECN Option
The AccECN Option is defined as shown below in Figure 3. It consists
of three 24-bit fields that provide the 24 least significant bits of
the r.e0b, r.ceb and r.e1b counters, respectively. The initial 'E'
of each field name stands for 'Echo'.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Kind = TBD1 | Length = 11 | EE0B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B (cont'd) | ECEB field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: The AccECN Option
The Data Receiver MUST set the Kind field to TBD1, which is
registered in Section 6 as a new TCP option Kind called AccECN. An
experimental TCP option with Kind=254 MAY be used for initial
experiments, with magic number 0xACCE.
Briscoe, et al. Expires April 21, 2016 [Page 16]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Appendix A.1 gives an example algorithm for the Data Receiver to
encode its byte counters into the AccECN Option, and for the Data
Sender to decode the AccECN Option fields into its byte counters.
Note that there is no field to feedback Not-ECT bytes. Nonetheless
an algorithm for the Data Sender to calculate the number of payload
bytes received as Not-ECT is given in Appendix A.5.
Whenever a Data Receiver sends an AccECN Option, the rules in
Section 3.2.5 expect it to always send a full-length option. To cope
with option space limitations, it can omit unchanged fields from the
tail of the option, as long as it preserves the order of the
remaining fields and includes any field that has changed. The length
field MUST indicate which fields are present as follows:
Length=11: EE0B, ECEB, EE1B
Length=8: EE0B, ECEB
Length=5: EE0B
Length=2: (empty)
The empty option of Length=2 is provided to allow for a case where an
AccECN Option has to be sent (e.g. on the SYN/ACK to test the path),
but there is very limited space for the option. For initial
experiments, the Length field MUST be 2 greater to accommodate the
16-bit magic number.
All implementations of a Data Sender MUST be able to read in AccECN
Options of any of the above lengths. They MUST ignore an AccECN
Option of any other length.
3.2.4. Path Traversal of the AccECN Option
An AccECN host MUST NOT include the AccECN TCP Option on the SYN.
Nonetheless, if the AccECN negotiation using the ECN flags in the
main TCP header (Section 3.1) is successful, it implicitly declares
that the endpoints also support the AccECN TCP Option.
If the TCP client indicated AccECN support, a TCP server tha confirms
its support for AccECN (as described in Section 3.1) SHOULD also
include an AccECN TCP Option in the SYN/ACK. A TCP client that has
successfully negotiated AccECN SHOULD include an AccECN Option in the
first ACK at the end of the 3WHS. However, this first ACK is not
delivered reliably, so the TCP client SHOULD also include an AccECN
Option on the first data segment it sends (if it ever sends one). A
host need not include an AccECN Option in any of these three cases if
Briscoe, et al. Expires April 21, 2016 [Page 17]
Internet-Draft Accurate TCP-ECN Feedback October 2015
it has cached knowledge that the packet would be likely to be blocked
on the path to the other host if it included an AccECN Option.
If the TCP client has successfully negotiated AccECN but does not
receive an AccECN Option on the SYN/ACK, it switches into a mode that
assumes that the AccECN Option is not available for this half
connection. Similarly, if the TCP server has successfully negotiated
AccECN but does not receive an AccECN Option on the first ACK or on
the first data segment, it switches into a mode that assumes that the
AccECN Option is not available for this half connection.
While a host is in the mode that assumes the AccECN Option is not
available, it MUST adopt the conservative interpretation of the ACE
field discussed in Section 3.2.2. However, it cannot make any
assumption about support of the AccECN Option on the other half
connection, so it MUST continue to send the AccECN Option itself.
If after the normal TCP timeout the TCP server has not received an
ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
lost, e.g. due to congestion, or a middlebox might be blocking the
AccECN Option. To expedite connection setup, the host SHOULD fall
back to NS=CWR=ECE=0 and no AccECN Option on the retransmission of
the SYN/ACK. Implementers MAY use other fall-back strategies if they
are found to be more effective (e.g. retransmitting a SYN/ACK with
AccECN TCP flags but not the AccECN Option; attempting to retransmit
a second AccECN segment before fall-back (most appropriate during
high levels of congestion); or falling back to classic ECN feedback
rather than non-ECN).
Similarly, if the TCP client detects that the first data segment it
sent was lost, it SHOULD fall back to no AccECN Option on the
retransmission. Again, implementers MAY use other fall-back
strategies such as attempting to retransmit a second segment with the
AccECN Option before fall-back, and/or caching the result of previous
attempts.
Either host MAY include the AccECN Option in a subsequent segment to
retest whether the AccECN Option can traverse the path.
Currently the Data Sender is not required to test whether the
arriving byte counters in the AccECN Option have been correctly
initialised. This allows different initial values to be used as an
additional signalling channel in future. If any inappropriate
zeroing of these fields is discovered during testing, this approach
will need to be reviewed.
Briscoe, et al. Expires April 21, 2016 [Page 18]
Internet-Draft Accurate TCP-ECN Feedback October 2015
3.2.5. Usage of the AccECN TCP Option
The following rules determine when a Data Receiver in AccECN mode
sends the AccECN TCP Option, and which fields to include:
Change-Triggered ACKs: If an arriving packet increments a different
byte counter to that incremented by the previous packet, the Data
Receiver SHOULD immediately send an ACK with an AccECN Option,
without waiting for the next delayed ACK. Certain offload
hardware might not be able to support change-triggered ACKs, but
otherwise it is important to keep exceptions to this rule to a
minimum so that Data Senders can generally rely on this behaviour;
Continual Repetition: Otherwise, if arriving packets continue to
increment the same byte counter, the Data Receiver can include an
AccECN Option on most or all (delayed) ACKs, but it does not have
to. If option space is limited on a particular ACK, the Data
Receiver MUST give precedence to SACK information about loss. It
SHOULD include an AccECN Option if the r.ceb counter has
incremented and it MAY include an AccECN Option if r.ec0b or
r.ec1b has incremented;
Full-Length Options Preferred: It SHOULD always use full-length
AccECN Options. It MAY use shorter AccECN Options if space is
limited, but it MUST include the counter(s) that have incremented
since the previous AccECN Option and it MUST only truncate fields
from the right-hand tail of the option to preserve the order of
the remaining fields (see Section 3.2.3);
Beaconing Full-Length Options: Nonetheless, it MUST include a full-
length AccECN TCP Option on at least three ACKs per RTT, or on all
ACKs if there are less than three per RTT (see Appendix A.4 for an
example algorithm that satisfies this requirement).
The following example series of arriving marks illustrates when a
Data Receiver will emit an ACK if it is using a delayed ACK factor of
2 segments and change-triggered ACKs: 01 -> ACK, 01, 01 -> ACK, 10 ->
ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01 -> ACK.
For the avoidance of doubt, the change-triggered ACK mechanism
ignores the arrival of a control packet with no payload, because it
does not alter any byte counters. The change-triggered ACK approach
will lead to some additional ACKs but it feeds back the timing and
the order in which ECN marks are received with minimal additional
complexity.
Implementation note: sending an AccECN Option each time a different
counter changes and including a full-length AccECN Option on every
Briscoe, et al. Expires April 21, 2016 [Page 19]
Internet-Draft Accurate TCP-ECN Feedback October 2015
delayed ACK will satisfy the requirements described above and might
be the easiest implementation, as long as sufficient space is
available in each ACK (in total and in the option space).
Appendix A.3 gives an example algorithm to estimate the number of
marked bytes from the ACE field alone, if the AccECN Option is not
available.
If a host has determined that segments with the AccECN Option always
seem to be discarded somewhere along the path, it is no longer
obliged to follow the above rules.
3.3. AccECN Compliance by TCP Proxies, Offload Engines and other
Middleboxes
A large class of middleboxes split TCP connections. Such a middlebox
would be compliant with the AccECN protocol if the TCP implementation
on each side complied with the present AccECN specification and each
side negotiated AccECN independently of the other side.
Another large class of middleboxes intervene to some degree at the
transport layer, but attempts to be transparent (invisible) to the
end-to-end connection. A subset of this class of middleboxes
attempts to `normalise' the TCP wire protocol by checking that all
values in header fields comply with a rather narrow interpretation of
the TCP specifications. To comply with the present AccECN
specification, such a middlebox MUST NOT change the ACE field or the
AccECN Option and it MUST attempt to preserve the timing of each ACK
(for example, if it coalesced ACKs it would not be AccECN-compliant).
A middlebox claiming to be transparent at the transport layer MUST
forward the AccECN TCP Option unaltered, whether or not the length
value matches one of those specified in Section 3.2.3, and whether or
not the initial values of the byte-counter fields are correct. This
is because blocking apparently invalid values does not improve
security (because AccECN hosts are required to ignore invalid values
anyway), while it prevents the standardised set of values being
extended in future (because outdated normalisers would block updated
hosts from using the extended AccECN standard).
Hardware to offload certain TCP processing represents another large
class of middleboxes, even though it is often a function of a host's
network interface and rarely in its own 'box'. Leeway has been
allowed in the present AccECN specification in the expectation that
offload hardware could comply and still serve its function.
Nonetheless, such hardware MUST attempt to preserve the timing of
each ACK (for example, if it coalesced ACKs it would not be AccECN-
compliant).
Briscoe, et al. Expires April 21, 2016 [Page 20]
Internet-Draft Accurate TCP-ECN Feedback October 2015
4. Interaction with Other TCP Variants
This section is informative, not normative.
4.1. Compatibility with SYN Cookies
A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to
protect itself from SYN flooding attacks. It places minimal commonly
used connection state in the SYN/ACK, and deliberately does not hold
any state while waiting for the subsequent ACK (e.g. it closes the
thread). Therefore it cannot record the fact that it entered AccECN
mode for both half-connections. Indeed, it cannot even remember
whether it negotiated the use of classic ECN [RFC3168].
Nonetheless, such a server can determine that it negotiated AccECN as
follows. If a TCP server using SYN Cookies supports AccECN and if
the first ACK it receives contains an ACE field with the value 0b110
or 0b111, it can assume that:
o the TCP client must have requested AccECN support on the SYN
o it (the server) must have confirmed that it supported AccECN
Therefore the server can switch itself into AccECN mode, and continue
as if it had never forgotten that it switched itself into AccECN mode
earlier.
4.2. Compatibility with Other TCP Options and Experiments
AccECN is compatible (at least on paper) with the most commonly used
TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is
also compatible with the recent promising experimental TCP options
TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]).
AccECN is friendly to all these protocols, because space for TCP
options is particularly scarce on the SYN, where AccECN consumes zero
additional header space.
When option space is under pressure from other options, Section 3.2.5
provides guidance on how important it is to send an AccECN Option and
whether it needs to be a full-length option.
4.3. Compatibility with Feedback Integrity Mechanisms
The ECN Nonce [RFC3540] is an experimental IETF specification
intended to allow a sender to test whether ECN CE markings (or
losses) introduced in one network are being suppressed by the
receiver or anywhere else in the feedback loop, such as another
network or a middlebox. The ECN nonce has not been deployed as far
Briscoe, et al. Expires April 21, 2016 [Page 21]
Internet-Draft Accurate TCP-ECN Feedback October 2015
as can be ascertained. The nonce would now be nearly impossible to
deploy retrospectively, because to catch a misbehaving receiver it
relies on the receiver volunteering feedback information to
incriminate itself. A receiver that has been modified to misbehave
can simply claim that it does not support nonce feedback, which will
seem unremarkable given so many other hosts do not support it either.
With minor changes AccECN could be optimised for the possibility that
the ECT(1) codepoint might be used as a nonce. However, given the
nonce is now probably undeployable, the AccECN design has been
generalised so that it ought to be able to support other possible
uses of the ECT(1) codepoint, such as a lower severity or a more
instant congestion signal than CE.
Three alternative mechanisms are available to assure the integrity of
ECN and/or loss signals. AccECN is compatible with any of these
approaches:
o The Data Sender can test the integrity of the receiver's ECN (or
loss) feedback by occasionally setting the IP-ECN field to a value
normally only set by the network (and/or deliberately leaving a
sequence number gap). Then it can test whether the Data
Receiver's feedback faithfully reports what it expects
[I-D.moncaster-tcpm-rcv-cheat]. Unlike the ECN Nonce, this
approach does not waste the ECT(1) codepoint in the IP header, it
does not require standardisation and it does not rely on
misbehaving receivers volunteering to reveal feedback information
that allows them to be detected. However, setting the CE mark by
the sender might conceal actual congestion feedback from the
network and should therefore only be done sparsely.
o Networks generate congestion signals when they are becoming
congested, so they are more likely than Data Senders to be
concerned about the integrity of the receiver's feedback of these
signals. A network can enforce a congestion response to its ECN
markings (or packet losses) using congestion exposure (ConEx)
audit [I-D.ietf-conex-abstract-mech]. Whether the receiver or a
downstream network is suppressing congestion feedback or the
sender is unresponsive to the feedback, or both, ConEx audit can
neutralise any advantage that any of these three parties would
otherwise gain.
ConEx is a change to the Data Sender that is most useful when
combined with AccECN. Without AccECN, the ConEx behaviour of a
Data Sender would have to be more conservative than would be
necessary if it had the accurate feedback of AccECN.
Briscoe, et al. Expires April 21, 2016 [Page 22]
Internet-Draft Accurate TCP-ECN Feedback October 2015
o The TCP authentication option (TCP-AO [RFC5925]) can be used to
detect any tampering with AccECN feedback between the Data
Receiver and the Data Sender (whether malicious or accidental).
The AccECN fields are immutable end-to-end, so they are amenable
to TCP-AO protection, which covers TCP options by default.
However, TCP-AO is often too brittle to use on many end-to-end
paths, where middleboxes can make verification fail in their
attempts to improve performance or security, e.g. by
resegmentation or shifting the sequence space.
5. Protocol Properties
This section is informative not normative. It describes how well the
protocol satisfies the agreed requirements for a more accurate ECN
feedback protocol [RFC7560].
Accuracy: From each ACK, the Data Sender can infer the number of new
CE marked segments since the previous ACK. This provides better
accuracy on CE feedback than classic ECN. In addition if the
AccECN Option is present (not blocked by the network path) the
number of bytes marked with CE, ECT(1) and ECT(0) are provided.
Overhead: The AccECN scheme is divided into two parts. The
essential part reuses the 3 flags already assigned to ECN in the
IP header. The supplementary part adds an additional TCP option
consuming up to 11 bytes. However, no TCP option is consumed in
the SYN.
Ordering: The order in which marks arrive at the Data Receiver is
preserved in AccECN feedback, because the Data Receiver is
expected to send an ACK immediately whenever a different mark
arrives.
Timeliness: While the same ECN markings are arriving continually at
the Data Receiver, it can defer ACKs as TCP does normally, but it
will immediately send an ACK as soon as a different ECN marking
arrives.
Timeliness vs Overhead: Change-Triggered ACKs are intended to enable
latency-sensitive uses of ECN feedback by capturing the timing of
transitions but not wasting resources while the state of the
signalling system is stable. The receiver can control how
frequently it sends the AccECN TCP Option and therefore it can
control the overhead induced by AccECN.
Resilience: All information is provided based on counters.
Therefore if ACKs are lost, the counters on the first ACK
Briscoe, et al. Expires April 21, 2016 [Page 23]
Internet-Draft Accurate TCP-ECN Feedback October 2015
following the losses allows the Data Sender to immediately recover
the number of the ECN markings that it missed.
Resilience against Bias: Because feedback is based on repetition of
counters, random losses do not remove any information, they only
delay it. Therefore, even though some ACKs are change-triggered,
random losses will not alter the proportions of the different ECN
markings in the feedback.
Resilience vs Overhead: If space is limited in some segments (e.g.
because more option are need on some segments, such as the SACK
option after loss), the Data Receiver can send AccECN Options less
frequently or truncate fields that have not changed, usually down
to as little as 5 bytes. However, it has to send a full-sized
AccECN Option at least three times per RTT, which the Data Sender
can rely on as a regular beacon or checkpoint.
Resilience vs Timeliness and Ordering: Ordering information and the
timing of transitions cannot be communicated in three cases: i)
during ACK loss; ii) if something on the path strips the AccECN
Option; or iii) if the Data Receiver is unable to support Change-
Triggered ACKs.
Complexity: An AccECN implementation solely involves simple counter
increments, some modulo arithmetic to communicate the least
significant bits and allow for wrap, and some heuristics for
safety against fields cycling due to prolonged periods of ACK
loss. Each host needs to maintain eight additional counters. The
hosts have to apply some additional tests to detect tampering by
middleboxes, but in general the protocol is simple to understand,
simple to implement and requires few cycles per packet to execute.
Integrity: AccECN is compatible with at least three approaches that
can assure the integrity of ECN feedback. If the AccECN Option is
stripped the resolution of the feedback is degraded, but the
integrity of this degraded feedback can still be assured.
Backward Compatibility: If only one endpoint supports the AccECN
scheme, it will fall-back to the most advanced ECN feedback scheme
supported by the other end.
Backward Compatibility: If the AccECN Option is stripped by a
middlebox, AccECN still provides basic congestion feedback in the
ACE field. Further, AccECN can be used to detect mangling of the
IP ECN field; mangling of the TCP ECN flags; blocking of ECT-
marked segments; and blocking of segments carrying the AccECN
Option. It can detect these conditions during TCP's 3WHS so that
Briscoe, et al. Expires April 21, 2016 [Page 24]
Internet-Draft Accurate TCP-ECN Feedback October 2015
it can fall back to operation without ECN and/or operation without
the AccECN Option.
Forward Compatibility: The behaviour of endpoints and middleboxes is
carefully defined for all reserved or currently unused codepoints
in the scheme, to ensure that any blocking of anomalous values is
always at least under reversible policy control.
6. IANA Considerations
This document defines a new TCP option for AccECN, assigned a value
of TBD1 (decimal) from the TCP option space. This value is defined
as:
+------+--------+-----------------------+-----------+
| Kind | Length | Meaning | Reference |
+------+--------+-----------------------+-----------+
| TBD1 | N | Accurate ECN (AccECN) | RFC XXXX |
+------+--------+-----------------------+-----------+
[TO BE REMOVED: This registration should take place at the following
location: http://www.iana.org/assignments/tcp-parameters/tcp-
parameters.xhtml#tcp-parameters-1]
Early implementation before the IANA allocation MUST follow [RFC6994]
and use experimental option 254 and magic number 0xACCE (16 bits)
{ToDo register this with IANA}, then migrate to the new option after
the allocation.
7. Security Considerations
If ever the supplementary part of AccECN based on the new AccECN TCP
Option is unusable (due for example to middlebox interference) the
essential part of AccECN's congestion feedback offers only limited
resilience to long runs of ACK loss (see Section 3.2.2). These
problems are unlikely to be due to malicious intervention (because if
an attacker could strip a TCP option or discard a long run of ACKs it
could wreak other arbitrary havoc). However, it would be of concern
if AccECN's resilience could be indirectly compromised during a
flooding attack. AccECN is still considered safe though, because if
the option is not presented, the AccECN Data Sender is then required
to switch to more conservative assumptions about wrap of congestion
indication counters (see Section 3.2.2 and Appendix A.2).
Section 4.1 describes how a TCP server can negotiate AccECN and use
the SYN cookie method for mitigating SYN flooding attacks.
Briscoe, et al. Expires April 21, 2016 [Page 25]
Internet-Draft Accurate TCP-ECN Feedback October 2015
There is concern that ECN markings could be altered or suppressed,
particularly because a misbehaving Data Receiver could increase its
own throughput at the expense of others. Given the experimental ECN
nonce is now probably undeployable, AccECN has been generalised for
other possible uses of the ECT(1) codepoint to avoid obsolescence of
the codepoint even if the nonce mechanism is obsoleted. AccECN is
compatible with the three other schemes known to assure the integrity
of ECN feedback (see Section 4.3 for details). If the AccECN Option
is stripped by an incorrectly implemented middlebox, the resolution
of the feedback will be degraded, but the integrity of this degraded
information can still be assured.
The AccECN protocol is not believed to introduce any new privacy
concerns, because it merely counts and feeds back signals at the
transport layer that had already been visible at the IP layer.
8. Acknowledgements
We want to thank Koen De Schepper, Praveen Balasubramanian and
Michael Welzl for their input and discussion. The idea of using the
three ECN-related TCP flags as one field for more accurate TCP-ECN
feedback was first introduced in the re-ECN protocol that was the
ancestor of ConEx.
Bob Briscoe was part-funded by the European Community under its
Seventh Framework Programme through the Reducing Internet Transport
Latency (RITE) project (ICT-317700) and through the Trilogy 2 project
(ICT-317756). The views expressed here are solely those of the
authors.
9. Comments Solicited
Comments and questions are encouraged and very welcome. They can be
addressed to the IETF TCP maintenance and minor modifications working
group mailing list <tcpm@ietf.org>, and/or to the authors.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
Briscoe, et al. Expires April 21, 2016 [Page 26]
Internet-Draft Accurate TCP-ECN Feedback October 2015
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<http://www.rfc-editor.org/info/rfc5681>.
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013,
<http://www.rfc-editor.org/info/rfc6994>.
10.2. Informative References
[I-D.bensley-tcpm-dctcp]
Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P.,
and G. Judd, "Microsoft's Datacenter TCP (DCTCP): TCP
Congestion Control for Datacenters", draft-bensley-tcpm-
dctcp-05 (work in progress), July 2015.
[I-D.ietf-conex-abstract-mech]
Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism and Requirements", draft-
ietf-conex-abstract-mech-13 (work in progress), October
2014.
[I-D.kuehlewind-tcpm-ecn-fallback]
Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path
Probing and Fallback", draft-kuehlewind-tcpm-ecn-
fallback-01 (work in progress), September 2013.
[I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
Allow Senders to Identify Receiver Non-Compliance", draft-
moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, DOI 10.17487/RFC3540, June 2003,
<http://www.rfc-editor.org/info/rfc3540>.
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<http://www.rfc-editor.org/info/rfc4987>.
Briscoe, et al. Expires April 21, 2016 [Page 27]
Internet-Draft Accurate TCP-ECN Feedback October 2015
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
DOI 10.17487/RFC5562, June 2009,
<http://www.rfc-editor.org/info/rfc5562>.
[RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP
Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
June 2010, <http://www.rfc-editor.org/info/rfc5925>.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<http://www.rfc-editor.org/info/rfc6824>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<http://www.rfc-editor.org/info/rfc7413>.
[RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
"Problem Statement and Requirements for Increased Accuracy
in Explicit Congestion Notification (ECN) Feedback",
RFC 7560, DOI 10.17487/RFC7560, August 2015,
<http://www.rfc-editor.org/info/rfc7560>.
Briscoe, et al. Expires April 21, 2016 [Page 28]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Appendix A. Example Algorithms
This appendix is informative, not normative. It gives example
algorithms that would satisfy the normative requirements of the
AccECN protocol. However, implementers are free to choose other ways
to implement the requirements.
A.1. Example Algorithm to Encode/Decode the AccECN Option
The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE byte counter r.ceb into the ECEB field within the
AccECN TCP Option, and how a Data Sender in AccECN mode could decode
the ECEB field into its byte counter s.ceb. The other counters for
bytes marked ECT(0) and ECT(1) in the AccECN Option would be
similarly encoded and decoded.
It is assumed that each local byte counter is an unsigned integer
greater than 24b (probably 32b), and that the following constant has
been assigned:
DIVOPT = 2^24
Every time a CE marked data segment arrives, the Data Receiver
increments its local value of r.ceb by the size of the TCP Data.
Whenever it sends an ACK with the AccECN Option, the value it writes
into the ECEB field is
ECEB = r.ceb % DIVOPT
where '%' is the modulo operator.
On the arrival of an AccECN Option, the Data Sender uses the TCP
acknowledgement number and any SACK options to calculate newlyAckedB,
the amount of new data that the ACK acknowledges in bytes. If
newlyAckedB is negative it means that a more up to date ACK has
already been processed, so this ACK has been superseded and the Data
Sender has to ignore the AccECN Option. Then the Data Sender
calculates the minimum difference d.ceb between the ECEB field and
its local s.ceb counter, using modulo arithmetic as follows:
if (newlyAckedB >= 0) {
d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT
s.ceb += d.ceb
}
For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal),
then
Briscoe, et al. Expires April 21, 2016 [Page 29]
Internet-Draft Accurate TCP-ECN Feedback October 2015
s.ceb % DIVOPT = 1
d.ceb = (1461 + 2^24 - 1) % 2^24
= 1460
s.ceb = 33,554,433 + 1460
= 33,555,893
A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss
The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE packet counter r.cep into the ACE field, and how
the Data Sender in AccECN mode could decode the ACE field into its
s.cep counter. The Data Sender's algorithm includes code to
heuristically detect a long enough unbroken string of ACK losses that
could have concealed a cycle of the congestion counter in the ACE
field of the next ACK to arrive.
Two variants of the algorithm are given: i) a more conservative
variant for a Data Sender to use if it detects that the AccECN Option
is not available (see Section 3.2.2 and Section 3.2.4); and ii) a
less conservative variant that is feasible when complementary
information is available from the AccECN Option.
A.2.1. Safety Algorithm without the AccECN Option
It is assumed that each local packet counter is a sufficiently sized
unsigned integer (probably 32b) and that the following constant has
been assigned:
DIVACE = 2^3
Every time a CE marked packet arrives, the Data Receiver increments
its local value of r.cep by 1. It repeats the same value of ACE in
every subsequent ACK until the next CE marking arrives, where
ACE = r.cep % DIVACE.
If the Data Sender received an earlier value of the counter that had
been delayed due to ACK reordering, it might incorrectly calculate
that the ACE field had wrapped. Therefore, on the arrival of every
ACK, the Data Sender uses the TCP acknowledgement number and any SACK
options to calculate newlyAckedB, the amount of new data that the ACK
acknowledges. If newlyAckedB is negative it means that a more up to
date ACK has already been processed, so this ACK has been superseded
and the Data Sender has to ignore the AccECN Option. If newlyAckedB
is zero, to break the tie the Data Sender could use timestamps (if
present) to work out newlyAckedT, the amount of new time that the ACK
acknowledges. Then the Data Sender calculates the minimum difference
Briscoe, et al. Expires April 21, 2016 [Page 30]
Internet-Draft Accurate TCP-ECN Feedback October 2015
d.cep between the ACE field and its local s.cep counter, using modulo
arithmetic as follows:
if ((newlyAckedB > 0) || (newlyAckedB == 0 && newlyAckedT > 0))
d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE
Section 3.2.2 requires the Data Sender to assume that the ACE field
did cycle if it could have cycled under prevailing conditions. The
3-bit ACE field in an arriving ACK could have cycled and become
ambiguous to the Data Sender if a row of ACKs goes missing that
covers a stream of data long enough to contain 8 or more CE marks.
We use the word `missing' rather than `lost', because some or all the
missing ACKs might arrive eventually, but out of order. Even if some
of the lost ACKs are piggy-backed on data (i.e. not pure ACKs)
retransmissions will not repair the lost AccECN information, because
AccECN requires retransmissions to carry the latest AccECN counters,
not the original ones.
The phrase `under prevailing conditions' allows the Data Sender to
take account of the prevailing size of data segments and the
prevailing CE marking rate just before the sequence of ACK losses.
However, we shall start with the simplest algorithm, which assumes
segments are all full-sized and ultra-conservatively it assumes that
ECN marking was 100% on the forward path when ACKs on the reverse
path started to all be dropped. Specifically, if newlyAckedB is the
amount of data that an ACK acknowledges since the previous ACK, then
the Data Sender could assume that this acknowledges newlyAckedPkt
full-sized segments, where newlyAckedPkt = newlyAckedB/MSS. Then it
could assume that the ACE field incremented by
dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE),
For example, imagine an ACK acknowledges newlyAckedPkt=9 more full-
size segments than any previous ACK, and that ACE increments by a
minimum of 2 CE marks (d.cep=2). The above formula works out that it
would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) =
2). However, if ACE increases by a minimum of 2 but acknowledges 10
full-sized segments, then it would be necessary to assume that there
could have been 10 CE marks (because 10 - ((10-2) % 8) = 10).
Implementers could build in more heuristics to estimate prevailing
average segment size and prevailing ECN marking. For instance,
newlyAckedPkt in the above formula could be replaced with
newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing
segment size and p is the prevailing ECN marking probability.
However, ultimately, if TCP's ECN feedback becomes inaccurate it
still has loss detection to fall back on. Therefore, it would seem
safe to implement a simple algorithm, rather than a perfect one.
Briscoe, et al. Expires April 21, 2016 [Page 31]
Internet-Draft Accurate TCP-ECN Feedback October 2015
The simple algorithm for dSafer.cep above requires no monitoring of
prevailing conditions and it would still be safe if, for example,
segments were on average at least 5% of full-sized as long as ECN
marking was 5% or less. Assuming it was used, the Data Sender would
increment its packet counter as follows:
s.cep += dSafer.cep
If missing acknowledgement numbers arrive later (due to reordering),
Section 3.2.2 says "the Data Sender MAY attempt to neutralise the
effect of any action it took based on a conservative assumption that
it later found to be incorrect". To do this, the Data Sender would
have to store the values of all the relevant variables whenever it
made assumptions, so that it could re-evaluate them later. Given
this could become complex and it is not required, we do not attempt
to provide an example of how to do this.
A.2.2. Safety Algorithm with the AccECN Option
When the AccECN Option is available on the ACKs before and after the
possible sequence of ACK losses, if the Data Sender only needs CE-
marked bytes, it will have sufficient information in the AccECN
Option without needing to process the ACE field. However, if for
some reason it needs CE-marked packets, if dSafer.cep is different
from d.cep, it can calculate the average marked segment size that
each implies to determine whether d.cep is likely to be a safe enough
estimate. Specifically, it could use the following algorithm, where
d.ceb is the amount of newly CE-marked bytes (see Appendix A.1):
SAFETY_FACTOR = 2
if (dSafer.cep > d.cep) {
s = d.ceb/d.cep
if (s <= MSS) {
sSafer = d.ceb/dSafer.cep
if (sSafer < MSS/SAFETY_FACTOR)
dSafer.cep = d.cep % d.cep is a safe enough estimate
} % else
% No need for else; dSafer.cep is already correct,
% because d.cep must have been too small
}
The chart below shows when the above algorithm will consider d.cep
can replace dSafer.cep as a safe enough estimate of the number of CE-
marked packets:
Briscoe, et al. Expires April 21, 2016 [Page 32]
Internet-Draft Accurate TCP-ECN Feedback October 2015
^
sSafer|
|
MSS+
|
| dSafer.cep
| is
MSS/2+--------------+ safest
| |
| d.cep is safe|
| enough |
+-------------------->
MSS s
The following examples give the reasoning behind the algorithm,
assuming MSS=1,460 [B]:
o if d.cep=0, dSafer.cep=8 and d.ceb=1,460, then s=infinity and
sSafer=182.5.
Therefore even though the average size of 8 data segments is
unlikely to have been as small as MSS/8, d.cep cannot have been
correct, because it would imply an average segment size greater
than the MSS.
o if d.cep=2, dSafer.cep=10 and d.ceb=1,460, then s=730 and
sSafer=146.
Therefore d.cep is safe enough, because the average size of 10
data segments is unlikely to have been as small as MSS/10.
o if d.cep=7, dSafer.cep=15 and d.ceb=10,200, then s=1,457 and
sSafer=680.
Therefore d.cep is safe enough, because the average data segment
size is more likely to have been just less than one MSS, rather
than below MSS/2.
If pure ACKs were allowed to be ECN-capable, missing ACKs would be
far less likely. However, because [RFC3168] currently precludes
this, the above algorithm assumes that pure ACKs are not ECN-capable.
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets
If the AccECN Option is not available, the Data Sender can only
decode CE-marking from the ACE field in packets. Every time an ACK
arrives, to convert this into an estimate of CE-marked bytes, it
needs an average of the segment size, s_ave. Then it can add or
subtract s_ave from the value of d.ceb as the value of d.cep
increments or decrements.
Briscoe, et al. Expires April 21, 2016 [Page 33]
Internet-Draft Accurate TCP-ECN Feedback October 2015
To calculate s_ave, it could keep a record of the byte numbers of all
the boundaries between packets in flight (including control packets),
and recalculate s_ave on every ACK. However it would be simpler to
merely maintain a counter packets_in_flight for the number of packets
in flight (including control packets), which it could update once per
RTT. Either way, it would estimate s_ave as:
s_ave ~= flightsize / packets_in_flight,
where flightsize is the variable that TCP already maintains for the
number of bytes in flight. To avoid floating point arithmetic, it
could right-bit-shift by lg(packets_in_flight), where lg() means log
base 2.
An alternative would be to maintain an exponentially weighted moving
average (EWMA) of the segment size:
s_ave = a * s + (1-a) * s_ave,
where a is the decay constant for the EWMA. However, then it is
necessary to choose a good value for this constant, which ought to
depend on the number of packets in flight. Also the decay constant
needs to be power of two to avoid floating point arithmetic.
A.4. Example Algorithm to Beacon AccECN Options
Section 3.2.5 requires a Data Receiver to beacon a full-length AccECN
Option at least 3 times per RTT. This could be implemented by
maintaining a variable to store the number of ACKs (pure and data
ACKs) since a full AccECN Option was last sent and another for the
approximate number of ACKs sent in the last round trip time:
if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ)
send_full_AccECN_Option()
For optimised integer arithmetic, BEACON_FREQ = 4 could be used,
rather than 3, so that the division could be implemented as an
integer right bit-shift by lg(BEACON_FREQ).
In certain operating systems, it might be too complex to maintain
acks_in_round. In others it might be possible by tagging each data
segment in the retransmit buffer with the number of ACKs sent at the
point that segment was sent. This would not work well if the Data
Receiver was not sending data itself, in which case it might be
necessary to beacon based on time instead, as follows:
if (time_now > time_last_option_sent + RTT / BEACON_FREQ)
send_full_AccECN_Option()
Briscoe, et al. Expires April 21, 2016 [Page 34]
Internet-Draft Accurate TCP-ECN Feedback October 2015
However, this time-based approach does not work well when all the
ACKs are sent early in each round trip, as is the case during slow-
start.
{ToDo: A simple and robust beaconing algorithm for all circumstances
is still work-in-progress.}
A.5. Example Algorithm to Count Not-ECT Bytes
A Data Sender in AccECN mode can infer the amount of TCP payload data
arriving at the receiver marked Not-ECT from the difference between
the amount of newly ACKed data and the sum of the bytes with the
other three markings, d.ceb, d.e0b and d.e1b. Note that, because
r.e0b is initialised to 1 and the other two counters are initialised
to 0, the initial sum will be 1, which matches the initial offset of
the TCP sequence number on completion of the 3WHS.
For this approach to be precise, it has to be assumed that spurious
(unnecessary) retransmissions do not lead to double counting. This
assumption is currently correct, given that RFC 3168 requires that
the Data Sender marks retransmitted segments as Not-ECT. However,
the converse is not true; necessary transmissions will result in
under-counting.
However, such precision is unlikely to be necessary. The only known
use of a count of Not-ECT marked bytes is to test whether equipment
on the path is clearing the ECN field (perhaps due to an out-dated
attempt to clear, or bleach, what used to be the ToS field). To
detect bleaching it will be sufficient to detect whether nearly all
bytes arrive marked as Not-ECT. Therefore there should be no need to
keep track of the details of retransmissions.
Appendix B. Alternative Design Choices (To Be Removed Before
Publication)
This appendix is informative, not normative. It records alternative
designs that the authors chose not to include in the normative
specification, but which the IETF might wish to consider for
inclusion:
Feedback all four ECN codepoints on the SYN/ACK: The last two
negotiation combinations in Table 2 could also be used to indicate
AccECN support and to feedback that the arriving SYN was ECT(0) or
ECT(1). This could be used to probe the client to server path for
incorrect forwarding of the ECN field
[I-D.kuehlewind-tcpm-ecn-fallback]. Note, however, that it would
be unremarkable if ECN on the SYN was zeroed by security devices,
Briscoe, et al. Expires April 21, 2016 [Page 35]
Internet-Draft Accurate TCP-ECN Feedback October 2015
given RFC 3168 prohibited ECT on SYN because it enables DoS
attacks.
Feedback all four ECN codepoints on the First ACK: To probe the
server to client path for incorrect ECN forwarding, it could be
useful to have four feedback states on the first ACK from the TCP
client. This could be achieved by assigning four combinations of
the ECN flags in the main TCP header, and only initialising the
ACE field on subsequent segments.
Empty AccECN Option: It might be useful to allow an empty (Length=2)
AccECN Option on the SYN/ACK and first ACK. Then if a host had to
omit the option because there was insufficient space for a larger
option, it would not give the impression to the other end that a
middlebox had stripped the option.
Appendix C. Open Protocol Design Issues (To Be Removed Before
Publication)
1. Currently it is specified that the receiver `SHOULD' use Change-
Triggered ACKs. It is controversial whether this ought to be a
`MUST' instead. A `SHOULD' would leave the Data Sender uncertain
whether it can rely on the timing and ordering information in
ACKs. If the sender guesses wrongly, it will probably introduce
at least 1RTT of delay before it can use this timing information.
Ironically it will most likely be wanting this information to
reduce ramp-up delay. A `MUST' could make it hard to implement
AccECN in offload hardware. However, it is not known whether
AccECN would be hard to implement in such hardware even with a
`SHOULD' here. For instance, was it hard to offload DCTCP to
hardware because of change-triggered ACKs, or was this just one
of many reasons? The choice between MUST and SHOULD here is
critical. Before that choice is made, a clear use-case for
certainty of timing and ordering information is needed, plus
well-informed discussion about hardware offload constraints.
2. There is possibly a concern that a receiver could deliberately
omit the AccECN Option pretending that it had been stripped by a
middlebox. No known way can yet be contrived to take advantage
of this downgrade attack, but it is mentioned here in case
someone else can contrive one.
3. The s.cep counter might increase even if the s.ceb counter does
not (e.g. due to a CE-marked control packet). The sender's
response to such a situation is considered out of scope, because
this ought to be dealt with in whatever future specification
allows ECN-capable control packets. However, it is possible that
the situation might arise even if the sender has not sent ECN-
Briscoe, et al. Expires April 21, 2016 [Page 36]
Internet-Draft Accurate TCP-ECN Feedback October 2015
capable control packets, in which case, this draft might need to
give some advice on how the sender should respond.
Appendix D. Changes in This Version (To Be Removed Before Publication)
The difference between any pair of versions can be displayed at
<http://datatracker.ietf.org/doc/draft-kuehlewind-tcpm-accurate-ecn/
history/>
From 04 to 05::
* Corrected ambiguity between Classic ECN and Classic ECN
feedback throughout
* Changed MUST to SHOULD send AccECN option on SYN/ACK last ACK
of 3WHS and first data segment from client, to allow for cached
knowledge of option traversal problems.
* Removed duplication of normative language about sending a full-
length option in the sections on "The AccECN Option" and "Usage
of the AccECN Option", and mutually cross referenced.
* Acknowledged Koen De Schepper and Praveen Balasubramanian
* Noted in Appendix that algo to beacon a full-length option is
work-in-progress
* Editorial corrections and clarifications throughout
Authors' Addresses
Bob Briscoe
Simula Research Laboratory
EMail: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/
Mirja Kuehlewind
ETH Zurich
Gloriastrasse 35
Zurich 8092
Switzerland
EMail: mirja.kuehlewind@tik.ee.ethz.ch
Briscoe, et al. Expires April 21, 2016 [Page 37]
Internet-Draft Accurate TCP-ECN Feedback October 2015
Richard Scheffenegger
NetApp, Inc.
Am Euro Platz 2
Vienna 1120
Austria
Phone: +43 1 3676811 3146
EMail: rs@netapp.com
Briscoe, et al. Expires April 21, 2016 [Page 38]