Internet DRAFT - draft-nielsen-tsvwg-sctp-tlr

draft-nielsen-tsvwg-sctp-tlr







Network Working Group                                         K. Nielsen
Internet-Draft                                              R. De Santis
Intended status: Experimental                                   Ericsson
Expires: April 21, 2016                                     A. Brunstrom
                                                     Karlstad University
                                                               M. Tuexen
                                         Muenster Univ. of Appl. Science
                                                              R. Stewart
                                                           Netflix, Inc.
                                                        October 19, 2015


                  SCTP Tail Loss Recovery Enhancements
                  draft-nielsen-tsvwg-sctp-tlr-02.txt

Abstract

   Loss Recovery by means of T3-Retransmission has significant
   detrimental impact on the delays experienced through an SCTP
   association.  The throughput achievable over an SCTP association also
   is negatively impacted by the occurrence of T3-Retransmissions.  The
   present SCTP Fast Recovery algorithms as specified by [RFC4960] are
   not able to adequately or timely recover losses in certain
   situations, thus resorting to loss recovery by lengthy
   T3-Retransimissions or by non-timely activation of Fast Recovery.  In
   this document we specify a number of enhancements to the SCTP Loss
   Recovery algorithms which amends some of these deficiencies with a
   particular focus on Loss Recovery for drops in Traffic Tails.  The
   enhancements supplement the existing algorithms of [RFC4960] with
   proactive probing and timer driven activation of the Fast
   Retransmission algorithm as well as a number of enhancements of the
   Fast Retransmission algorithm in itself are specified.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."




Nielsen, et al.          Expires April 21, 2016                 [Page 1]

Internet-Draft                  SCTP TLR                    October 2015


   This Internet-Draft will expire on April 21, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  The SCTP TLR Function . . . . . . . . . . . . . . . . . .   4
       1.1.1.  Dependencies  . . . . . . . . . . . . . . . . . . . .   5
     1.2.  Relation to other work  . . . . . . . . . . . . . . . . .   5
       1.2.1.  Early Retransmit and RTO Restart  . . . . . . . . . .   5
       1.2.2.  TCP applicability . . . . . . . . . . . . . . . . . .   6
       1.2.3.  Packet Re-ordering  . . . . . . . . . . . . . . . . .   6
       1.2.4.  Congestion Control  . . . . . . . . . . . . . . . . .   7
       1.2.5.  CMT-SCTP Applicability  . . . . . . . . . . . . . . .   7
   2.  Conventions and Terminology . . . . . . . . . . . . . . . . .   8
   3.  Description of Algorithms . . . . . . . . . . . . . . . . . .   9
     3.1.  SCTP Scoreboard and miss indication Counting Enhancement    9
       3.1.1.  Multi-Path Considerations . . . . . . . . . . . . . .  11
     3.2.  RFC6675 nextseg() Tail Loss Enhancements for SCTP FR  . .  11
       3.2.1.  Multi-Path Considerations . . . . . . . . . . . . . .  14
     3.3.  SCTP-TLR Description  . . . . . . . . . . . . . . . . . .  15
       3.3.1.  Principles  . . . . . . . . . . . . . . . . . . . . .  15
       3.3.2.  SCTP - TLR Statemachine . . . . . . . . . . . . . . .  19
       3.3.3.  TLPP Transmission Rules . . . . . . . . . . . . . . .  24
       3.3.4.  Masking of TLPP Recovered Losses  . . . . . . . . . .  28
       3.3.5.  Elimination of unnecesary DELAY-ACK delays  . . . . .  30
   4.  Confirmation of support for Immediate SACK  . . . . . . . . .  31
   5.  Socket API Considerations . . . . . . . . . . . . . . . . . .  31
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  31
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  32
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  32
   9.  Discussion and Evaluation of function . . . . . . . . . . . .  32
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  32



Nielsen, et al.          Expires April 21, 2016                 [Page 2]

Internet-Draft                  SCTP TLR                    October 2015


     10.2.  Informative References . . . . . . . . . . . . . . . . .  33
   Appendix A.  Unambuiguous SACK  . . . . . . . . . . . . . . . . .  35
     A.1.  TSN Retransmission ID in Data Chunk Header  . . . . . . .  35
       A.1.1.  Sender side behaviour . . . . . . . . . . . . . . . .  36
       A.1.2.  Receiver side behaviour . . . . . . . . . . . . . . .  36
     A.2.  Unambuiguous SACK Chunk . . . . . . . . . . . . . . . . .  36
       A.2.1.  Receiver side behaviour . . . . . . . . . . . . . . .  40
     A.3.  Unambuigous SACK return . . . . . . . . . . . . . . . . .  40
     A.4.  Negotiation . . . . . . . . . . . . . . . . . . . . . . .  41
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  41

1.  Introduction

   Loss Recovery by means of T3-Retransmission has significant impact on
   the delays experienced through, as well as, the throughput achievable
   over an SCTP association.  Loss Recovery by Fast Retransmission
   operation in many situations is superior to T3-Retransmission from
   both a latency and a throughput perspective.

   The present SCTP Fast Retransmission algorithm, as specified by
   [RFC4960], is driven uniquely by exceed of a DupTresh number of miss
   indication counts stemming for returned SACKs, and it is as such not
   able to adequately or timely recover losses in traffic tails where a
   sufficient number of such SACKs may not be generated, there resorting
   to loss recovery by T3-Retransimissions or by non-timely activation
   of Fast Recovery.  Non-timely activation here refer to the situation
   where activation of Fast Recovery for packets lost within one data
   burst needs to await arrival of SACKs from a subsequent data burst.

   By drop in traffic tails (or tail drops) we refer generally and
   specifically to the following situations:

   1.  Drops of the last SCTP packets of an SCTP association or more
       generally drop of packets in the end of an SCTP association which
       are not proceeded by more than DupThresh number of packets which
       are not dropped.

   2.  Drops among packets sent in a the end of bursts spaced by pauses
       of time equal to or greater than the T3-timeout (approximately).
       It is noted that such bursts (pauses in between bursts) may
       result from application limitations, from congestion control
       limitations or from receiver side limitations.

   3.  Drops among packets sent so sparsely that each dropped packet
       constitutes a tail drop in that DupThresh number of packets would
       not be sent (would not be available for sent) prior to expiry of
       the T3-timeout.




Nielsen, et al.          Expires April 21, 2016                 [Page 3]

Internet-Draft                  SCTP TLR                    October 2015


   It shall be noted that while the above traffic drop criteria describe
   drops among the forward data packets only, then drops among forward
   data packets combined with drops of the returned SACKs may together
   result in that an insufficient number of SACKs be returned to traffic
   sender for that the Fast Retransmission algorithm be activated prior
   to T3-timeout occurring.  The tail traffic situations for which SCTP
   Fast Retransmission is not able to recover the losses is thus in
   general broader than the exact situations listed above.  The
   improvements specified include enhancement of SCTP to deduce the miss
   indication counts from enhanced scoreboard information thus removing
   some of the vulnerability of the present SCTP miss indication
   counting to loss of SACKs.

1.1.  The SCTP TLR Function

   The function proposed for enhancements of the SCTP Loss Recovery
   operation for Traffic Tail Losses is divided in two parts:

   o  Enhancements of SCTP Fast Retransmission (SCTP FR) algorithm by
      means of the following Tail Loss Recovery improving functions
      inspired by or specified by [RFC6675] for TCP:

      *  miss indication counting for a missing (non-SACK'ed) TSN will
         be based on augmented scoreboard information such that the miss
         indications will be based not on the number of returned SACKs
         but on the number of SACK'ed SCTP packets carrying data chunks
         of higher TSNs.  The mechanism is specified both in terms of
         packets, the book-keeping of which requires new logic, as well
         as in terms of a less implementation demanding byte based
         variant following the Islost() approach of [RFC6675].  We shall
         refer to this improvement as Extended miss indication Counting.

      *  Fast Recovery operation is extended to include the "last
         resort" retransmission, Nextseg 3) and Nextseg 4), operations
         of [RFC6675], thus supporting conditional proactive fast
         retransmissions of missing, but not yet classified as lost,
         TSNs within the Fast Recovery Exit Point.

   o  New SCTP Tail Loss Recovery State machine with proactive timer
      driven activation of (the enhanced) Fast Recovery operation.
      Timer driven activation of Fast Recovery is initiated for
      outstanding data whenever a certain time, shorter then the T3
      timeout, has elapsed from the transmittal of the lowest
      outstanding TSN and network responsiveness, in form of SACKs of
      packets ahead of the TSN, has been proven since the transmittal of
      the lowest outstanding TSN.  The SCTP TLR mechanism implements a
      new timer, the Tail Loss Probe timer (PTO), and it works in parts
      by:



Nielsen, et al.          Expires April 21, 2016                 [Page 4]

Internet-Draft                  SCTP TLR                    October 2015


      *  Forced activation of Fast Recovery when network responsiveness
         has been proven, and the PTO timer has kicked, since
         transmittal of the lowest outstanding TSN, but additional
         traffic sent (SACKs of TSNs ahead of the TSN) has not served to
         activate Fast Recovery based on the Extended Mis Indication
         Counting.

      *  Probing for network responsiveness, by transmittal of a TLR
         probe packet, when no network responsiveness information (no
         SACKs have been received for any packets ahead of line of the
         TSN) is available at expiration of the PTO timer relative to
         the lowest outstanding TSN

      *  Activation for T3-retransmission Loss Recovery only when the
         network remains unresponsive (no SACKs are received) also after
         transmittal, and subsequently timeout, of a TLR probe packet.

1.1.1.  Dependencies

   The SCTP TLR procedures proposed apply as add-on supplements to any
   SCTP implementation based on [RFC4960].  The SCTP TLR procedures in
   their core are sender-side only and do not impact the SCTP receiver.

   Exploitation of SCTP immediate SACK feature, [RFC7053], and usage of
   new (to be defined) Unambiguous Selective Acknowledgement feature of
   SCTP require support in both sender and receiver of these SCTP
   extensions.

1.2.  Relation to other work

1.2.1.  Early Retransmit and RTO Restart

   It is noted that the Early Retransmit algorithm, [RFC5827], addresses
   activation of Fast Recovery for a particular subset of the tail drop
   situations in target of the SCTP TLR function.  The solution proposed
   embeds (as a special case) the Early Retransmits algorithm in the
   delayed variant, experienced with for TCP in [DUKKIPATI02] in which
   Early Retransmission is only activated provided a certain time has
   elapsed since the lowest outstanding TSN was transmitted.  The delay
   adds robustness towards spurious retransmissions caused by "mild"
   packet re-ordering as documented for TCP in [DUKKIPATI02].

   It is further noted that depending on the exact situation (e.g., drop
   pattern, congestion window and amount of data in flight) then
   T3-retransmission procedures need not be inferior to Fast
   Retransmission procedures.  Rather in some situations
   T3-retransmission will indeed be superior as T3-retransmissions allow
   for ramp up of the congestion window during the recovery process.



Nielsen, et al.          Expires April 21, 2016                 [Page 5]

Internet-Draft                  SCTP TLR                    October 2015


   The changes proposed in this document focus on improving the Loss
   Recovery operation of SCTP by enforcing timely activation of
   (improved) Fast Retransmission algorithms.  With the purpose to
   reduce the latency of the TCP and SCTP Loss Recovery operation
   [HURTIG] has taken the alternative approach of accelerating the
   activation of T3-retransmission processes when Fast Recovery is not
   able to kick in to recover the loss.  [HURTIG] only addresses a
   subset of the Tail loss scenarios in scope in the work presented
   here.  The ideas of [HURTIG] for accurate RTO restart are drawn on in
   the solution proposed here for accurate restart of the new tail loss
   probe timer (PTO-timer) as well as for accurate set of the T3-timer
   under certain conditions thus harvesting some of the same latency
   optimizations as [HURTIG].  The same approach has recently been
   exploited for TCP by the invention of the TLPR function by the
   authors of [Rajiullah].

1.2.2.  TCP applicability

   SCTP Loss Recovery operation in its core is based on the design of
   Loss Recovery for TCP with SACK enabled.  The enhancements of SCTP
   Tail Loss Recovery proposed here are applicable for TCP.

   Note: The - to be determined - exploitation of SCTP immediate SACK
   feature, [RFC7053], and the - to be determined - usage of new
   unambiguous selective acknowledgement feature of SCTP may not be
   readably applicable to TCP at present.  ISSUE: Need to follow up on
   [zimmermann02], [zimmermann03],

   It is noted that while the SCTP TLR algorithms and SCTP TLR state
   machine defined is inspired by the timer driven tail loss probe
   approach specified in [DUKKIPATI01] for TCP, then the solution
   defined here differs in the approach taken.  The approach here is a
   clean state approach defining a new comprehensive SCTP TLR state
   machine as an add-on to the (at least conceptually) existing Fast
   Recovery and T3-Retransmission SCTP state machines of SCTP.  Thereby
   the SCTP TLR algorithm is able to address all tail loss patterns,
   whereas the approach of [DUKKIPATI01] relies on a number of
   experimental mechanisms ([DUKKIPATI02], [MATHIS], [RFC5827]) defined
   for TCP in IETF or in Research with ad hoc extension to support
   selected tail loss patterns by addition of the tail loss probe
   mechanism and the therefrom driven activation of the mechanisms.

1.2.3.  Packet Re-ordering

   The solution proposed is an enhancement of the existing mis
   indication counting based Fast Recovery operation of SCTP, [RFC4960],
   and as such the solution inherits the fundamental vulnerability to




Nielsen, et al.          Expires April 21, 2016                 [Page 6]

Internet-Draft                  SCTP TLR                    October 2015


   packet re-ordering that the SCTP Fast Retransmission algorithm of
   [RFC4960] embeds.

   For deployment of SCTP in environments where the Fast Retransmission
   algorithm of [RFC4960] gives rise to spurious entering of Fast
   Recovery it would be relevant to look into remedies which may detect
   such and undo the effects of such.  Possibly following the approaches
   taken for TCP (and SCTP) in this area.

   OPEN ISSUE: In severe packet re-ordering situations where the second
   packet of two subsequently sent packets outrace the first packet in
   arrival with more than PTO time, then such may tricker the SCTP TLR
   function to enter spurious Fast Recovery.  It is conjectured that the
   this situation does not significantly increase the vulnerability of
   Loss Recovery to packet-reordering.  To be determined and evaluated.

1.2.4.  Congestion Control

   In its very nature of prompting for activation of Fast Recovery
   instead of T3-Retransmission Recovery then the benefit of the
   solution proposed versus the existing solution of [RFC4960] will
   depend on the CC operation not only during the recovery process but
   also after exit of the recovery process.  In this context it is noted
   that the prior approach taken for TCP, [DUKKIPATI01], has been
   documented for a TCP implementation running CUBIC, e.g., see
   [zimmermann01], whereas SCTP runs a CC algorithm more similar to TCP
   Reno CC as defined by [RFC5681].

   The solution at present is defined within the constraints of existing
   Congestion Control principles of STCP as defined by [RFC4960].  It is
   anticipated that Congestion Control improvements are desirable for
   SCTP in general as well as for the functions defined here in
   particular.

1.2.5.  CMT-SCTP Applicability

   The SCTP TLR specification in this document applies to a SCTP
   implementation following the [RFC4960] principles of using one shared
   SACK clock spanning the data transfer over multiple paths.  It is
   noted that in its nature of maintaining the common SACK clock
   principles of [RFC4960] then the SCTP TLR mechanism specified here
   retains some of the vulnerabilities from [RFC4960] to spurious (or
   delayed) entering of Fast Recovery operation caused by path changes
   in inhomogeneous environments (change of data transfer among paths of
   significantly different RTTs).  The validity of this choice is
   motivated by that concurrent data transfer on multiple paths is the
   exception case in [RFC4960] MH SCTP and remains the exception also
   with the enhancements of [RFC4960] specified here.



Nielsen, et al.          Expires April 21, 2016                 [Page 7]

Internet-Draft                  SCTP TLR                    October 2015


   It is envisaged that the SCTP TLR mechanism specified is readably
   applicable also to a SCTP implementation supporting concurrent multi
   path transfer in line with the specification of [CMT-SCTP].  Though
   is it emphasized that SCTP-TLR, when applied to [CMT-SCTP], needs
   some adjustments as it should be applied in a split manner following
   the principles of SFR of [CMT-SCTP].

2.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   For the purposes of defining the SCTP TLR function, we use the
   following terms and concepts:

      "DupThresh": The number of miss indication counts on an
      outstanding TSN at the reach of which SCTP declares the TSN as
      lost and enters Fast Recovery for the TSN if not in Fast Recovery
      already.

      "Flight size": At any given time we define the "Flight size" to be
      the number of bytes that a SCTP sender considers to be in flight
      in the network from the sender to the receiver.  It is noted that
      the bytes of a message, which is considered lost and which has not
      been retransmitted, is not contained in the Flight size.  Further
      it is noted that the bytes of a message which has been
      retransmitted (once) will count either once or twice in the Flight
      size depending on whether SCTP considers the first transmission of
      the message as having been lost (dropped) in the network.

      "Outstanding TSN": A TSN (and the associated DATA chunk) that has
      been sent by the SCTP sender for which it has not yet received an
      acknowledgement and which the SCTP sender has not abandoned (e.g.,
      abandoned as a result of [RFC3758]).

      "highTSN": The highest outstanding TSN at this point in time.

      "lowTSN": The lowest outstanding TSN at this point in time.

      "Scoreboard": An SCTP sender need maintain a data structure to
      store various information on a per outstanding TSN basis.  This
      includes the selective acknowledgment information, miss indication
      counts, bytes counts and other information defined [RFC4960], in
      this document and in other SCTP specifications.  This data
      structure we refer to as "scoreboard".  The specifics of the
      scoreboard data structure are out of scope for this document (as




Nielsen, et al.          Expires April 21, 2016                 [Page 8]

Internet-Draft                  SCTP TLR                    October 2015


      long as the implementation can perform all functions required by
      this specification).

3.  Description of Algorithms

3.1.  SCTP Scoreboard and miss indication Counting Enhancement

   Entering of Fast Recovery in SCTP, as specified by [RFC4960]), is
   driven by miss indication counts.  When a TSN has received
   DupThresh=3 miss indication counts, the TSN is declared lost and will
   be eligible for fast retransmission via Fast Recovery procedure.

   miss indication counts are in RFC4960 SCTP driven entirely by receipt
   of SACKs in accordance with the Highest TSN Newly Acknowledged
   algorithm (section 7.2.4 of [RFC4960]):

      Highest TSN Newly Acknowledged (HTNA): For each incoming SACK,
      miss indications are incremented only for missing TSNs prior to
      the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.

   An evident issue with the HTNA algorithm is that it is vulnerable to
   loss of SACKs.  In many situations loss of SACKs will result only in
   a slight delayed entering of Fast Recovery for a dropped TSN, but
   generally, then by relying on HTNA algorithm only, loss of SACKs will
   further broaden the traffic tails situations where Fast Recovery
   either not be activated in a timely manner or not be activated at all
   due to the receipt of an insufficient number SACKs only.

   In order to make SCTP Fast Recovery more robust towards drop of
   SACKs, the following extension of the HTNA algorithm SHOULD be
   supported by an SCTP implementation:

      Newly Acked Packets ahead-of-line (NAPahol): For each incoming
      SACK, miss indications are incremented only for missing TSNs prior
      to the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.  For each missing TSN thus potentially eligible for
      additional miss indication counts, the number of miss indications
      to be given shall follow the number of newly acknowledged packets
      ahead of line of the packet of the missing TSN.

   The solution is robust towards split SACK.  The solution requires for
   the SCTP implementation to keep track of the relationship in between
   data chunks (TSN numbers) and packets.  One solution is for the SCTP
   implementation to maintain a packet id as a monotonically
   incrementing packet sequence number to map chunks to packets and for



Nielsen, et al.          Expires April 21, 2016                 [Page 9]

Internet-Draft                  SCTP TLR                    October 2015


   each outstanding chunk to keep state of the packet id that the chunk
   was sent in as well as (incrementally updated) the packet ids of up
   to DupThresh-1 (=2) packets ahead of line for which chunks have been
   SACKed.

   For accurate PTO-timer management, using the restart principles of
   [HURTIG] and [Rajiullah], see Section 3.3, an SCTP TLR implementation
   is required to keep track of the time at which packets/TSNs are
   transmitted (or strictly speaking to be able to deduce the time since
   a packet/a TSN was last transmitted).  An implementation may exploit
   timestamps for the generation of (part of) the packet id as well as
   for the mentioned time management thereby limiting the additional
   overhead required for the packet id storage.

   As an alternative to the above accurate packet counting then an SCTP
   implementation MAY, to reduce implementation complexity, instead
   support the following bytes counting based extension of the RFC4960
   HTNA algorithm:

      Highest Bytes Newly Acknowledged (HBNA): For each incoming SACK,
      miss indications are incremented only for missing TSNs prior to
      the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.  For each missing TSN thus eligible for additional mis
      indication counts, the number of miss indications to be given
      shall follow the number of newly acknowledged bytes in the SACK
      ahead of line of the missing TSN in the following manner Add-miss
      indication-count(TSN) = Ceiling((Newly bytes ahead of
      line(TSN))/PMTU).

   The HBNA approach as specified above is vulnerable to split of SACK.
   An implementation choice which is robust to split of SACK is to
   recalculate the total amount of selectively acknowledged bytes ahead
   of line of an outstanding TSN and update the miss indication count of
   the TSN as Ceiling((Selectively Acked bytes ahead of line
   (TSN))/PMTU).  This more robust implementation choice however demands
   either for maintain of additional state per TSN, namely the
   Selectively Acked bytes ahead of line (TSN) or for extensive repeated
   computations.  Risk of split SACK may not be weighty enough to worth
   such implementation complexity.

   The HBNA approach follows the approach taken for TCP, Islost(), in
   [RFC6675].  It is noted, however, that due to the message based
   approach of SCTP, then a byte based approach generally will be less
   accurate as a measure for the number of packet received ahead of line
   than it is for byte stream based TCP.





Nielsen, et al.          Expires April 21, 2016                [Page 10]

Internet-Draft                  SCTP TLR                    October 2015


3.1.1.  Multi-Path Considerations

   In multi-homed [RFC4960] SCTP, data that potentially will be subject
   to fast retransmission may be in flight on multiple paths.  This
   (exception) situation can occur as a result of a change of the data
   transfer path, which may come about, e.g., as a result of a
   switchback operation performed autonomously by SCTP or as a result of
   a management operation setting a new primary path.  The situation can
   also occur as a result of destination directed data transfer where
   the destination address specified is different from the present data
   transfer path destination.  In an [RFC4960] SCTP implementation,
   SACKs of data sent on one path will increase the miss indication
   counts of data with lower TSN in flight on a different path.  As such
   SACKs of data sent on one path may actually result in generation of
   (potentially spurious) loss event reactions on a different path.
   This fundamental aspect of [RFC4960] miss indication counting is not
   changed in this document.  Meaning that it is not intended for the
   miss indication counting improvements defined above, i.e., the
   NAPahol and the HBNA mechanisms, to discriminate among the paths on
   which the SACK'ed data contributing to the miss indication counting
   has been sent.

3.2.  RFC6675 nextseg() Tail Loss Enhancements for SCTP FR

   The Fast Retransmission algorithm for TCP as specified in [RFC6675]
   implements some differences compared to the Fast Retransmission
   algorithm specified for SCTP by [RFC4960].  Of particular
   significance for recovery of losses in traffic tail scenarios are the
   fact that the [RFC6675] algorithm, once Fast Recovery has been
   activated, takes two "last resort" retransmission measures, step 3)
   and step 4) of Nextseg() of [RFC6675].  These measures facilitate the
   recovery of losses in situations where only an insufficient number of
   SACKs would be able to be generated to complete the Fast Recovery
   process without resorting to T3-timeout.  For SCTP Fast Recovery we
   formulate the equivalent measures as follows:

   Last Resort Retransmission:  If the following conditions are met:

      *  there are no outstanding TSN's eligible for fast retransmission
         due to DupThresh or more miss indications

      *  there is no new data available for transmission

      then an outstanding TSN less than or equal to the Fast Recovery
      Exit Point, for which there exists SACKs of chunks ahead of line
      of the TSN, may be retransmitted provided the CWND allow.  The
      bytes of a TSN which is retransmitted in this manner are not
      subtracted from the Flight size prior to this action be taken nor



Nielsen, et al.          Expires April 21, 2016                [Page 11]

Internet-Draft                  SCTP TLR                    October 2015


      as a result of this action.  If the miss indication count of the
      TSN subsequently reaches the DupThresh value, the bytes of the TSN
      shall be subtracted from the Flight size.  Once acknowledged the
      remaining contribution of this TSN in the Flight size (whether it
      be there counted once or twice at this point in time) is
      subtracted.  A TSN which is retransmitted in this manner will be
      marked as ineligible for a subsequent fast retransmit (see
      considerations on Multiple Fast Retransmission operation in
      Section 3.3.1.3).

      An SCTP implementation which implements the Unambiguous SACK
      feature of Appendix A may implement a more accurate calculation of
      the flightsize when doing Last Resort Retransmission.  That is,
      instead of subtracting the contribution from the retransmitted TSN
      from the flightsize once the acknowledgement of the TSN arrives,
      the SCTP implement may distinguish where the acknowledgment is for
      the original TSN or for the retransmitted TSN and in case the
      acknowledgement is not for the retransmitted TSN, SCTP should
      delay the subtract of the bytes of the retransmitted TSN from the
      flightsize until either an acknowledgement of the retransmitted
      TSN is received (see Appendix A) or until PTO2-T_latest(TSN) time
      has elapsed (see Section 3.3.1).

   Rescue:  If all of the following conditions are met:

      *  there are no outstanding TSN's eligible for fast retransmission
         due to DupThresh or more miss indications

      *  there is no new data available for transmission and no data is
         outstanding on the association beyond the Fast Recovery Exit
         Point

      *  there are no outstanding TSNs eligible for Last Resort
         Retransmission

      *  the cumack has progressed since this entering of Fast Recovery

      and there exist non-SACKed, non fast retransmitted TSNs, within
      the Fast Recovery Exit point, then for this entry of Fast
      Recovery, conditionally to that the CWND allows, we allow for fast
      retransmission of one packet of consecutive outstanding non fast
      retransmitted TSNs up to PMTU size, the highest TSN of which MUST
      be the highest outstanding TSN within the Fast Recovery Point.
      The bytes of a TSN which is retransmitted in this manner are not
      subtracted from the Flight size prior to this action be taken nor
      as a result of this action.  If the miss indication count of the
      TSN subsequently reaches the DupThresh value, the bytes of the TSN
      shall be subtracted from the Flight size.  Once acknowledged the



Nielsen, et al.          Expires April 21, 2016                [Page 12]

Internet-Draft                  SCTP TLR                    October 2015


      remaining contribution of this TSN in the Flight size (whether it
      be there counted once or twice at this point in time) is
      subtracted.  A TSN which is retransmitted in this manner will be
      marked as ineligible for a subsequent fast retransmit(see
      considerations on Multiple Fast Retransmission operation in
      Section 3.3.1.3).

   An implementation of the Rescue operation may be accomplished by
   maintain of an RescueRTX parameter as described for TCP in [RFC6675].

   An SCTP implementation which implements the Unambiguous SACK feature
   of Appendix A may implement a more accurate calculation of the
   flightsize when performing Rescue operation.  That is, instead of
   subtracting the contribution from the retransmitted TSN from the
   flightsize once the acknowledgement of the TSN arrives, the SCTP
   implement may distinguish where the acknowledgment is for the
   original TSN or for the retransmitted TSN and in case the
   acknowledgement is not for the retransmitted TSN, SCTP should delay
   the subtract of the bytes of the retransmitted TSN from the
   flightsize until either an acknowledgement of the retransmitted TSN
   is received (see Appendix A) or until PTO2-T_latest(TSN) time has
   elapsed (see Section 3.3.1).

   DISCUSSION: [RFC4960] in addition to the HTNA algorithm demand for
   additional miss indication counting to be performed during Fast
   Recovery according to the following prescription (section 7.2.4 of
   [RFC4960]):

   (#)  If an endpoint is in Fast Recovery and a SACK arrives that
      advances the Cumulative TSN Ack Point, the miss indications are
      incremented for all TSNs reported missing in the SACK.

   It is noted that under special circumstances then (#) makes SCTP Fast
   Recovery complete in situations where TCP Fast Recovery would only
   complete by virtue of the measure 3) or 4) of [RFC6675] and as such
   these measures are more critically demanded for TCP Fast Recovery
   operation than for the SCTP Fast Recovery operation.  However as
   documented by (OPEN ISSUE: to be filled in) the Last Resort
   Retransmission operation and the Rescue operation also for SCTP
   significantly improve the Loss Recovery operation; the latency of the
   individual loss recovery operation as well as the ability of the
   operation to complete without resort to T3-timeout.  Consequently
   this document prescribes for SCTP TLR to implement these procedures.
   Conversely even when the measures 3) and 4) of [RFC6675] are
   implemented, (#) gives benefits in terms of releasing flight size
   space allowing Fast Recovery to progress.





Nielsen, et al.          Expires April 21, 2016                [Page 13]

Internet-Draft                  SCTP TLR                    October 2015


   As the algorithm extension is limited by the existing congestion
   control algorithm of SCTP, these extensions of SCTP Fast Recovery do
   not compromise the TCP fairness of the SCTP Fast Recovery Operation.

3.2.1.  Multi-Path Considerations

   In multi-homed [RFC4960] SCTP, data that potentially will be subject
   to Fast Retransmission may be in flight on multiple paths.  This
   (exception) situation in particular can occur as a result of a change
   of the data transfer path as a result of a switchback operation to a
   primary path.  Here SACKs of data sent on one path (e.g., the new
   data transfer path) may result in generation of (potentially
   spurious) loss event reactions on a different path (the prior data
   transfer path).  The [RFC4960] miss indication counting based on a
   common SACK clock is not changed in this document, nevertheless the
   protocol operation, here the operation of the Last Resort
   Retransmission and the Rescue operation in this situation, need to be
   specified.

   The specification in this document is based on the following
   fundamental goals:

   o  an [RFC4960] SCTP implementation must appropriately react to loss
      events observed by means of miss indication counting, by
      performing appropriate adjustments of CWND and sstresh, an all
      paths where such loss events are observed.

   o  The observation of a loss event on one path should not for
      [RFC4960] SCTP MH impact the congestion control operation on a
      different path.

   For the implementation of the Last Resort Retransmission and the
   Rescue operations for [RFC4960] MH SCTP then the following
   specifications are given:

   o  For a TSN to be eligible for Last Resort Retransmission a loss
      event MUST have been observed on the path on which this TSN is in
      flight.

   o  For a TSN to be eligible for the Rescue operation a loss event
      MUST have been observed on the path on which this TSN is in
      flight.

   An implementation of the above may be accomplished by the
   implementation of a Fast Recovery state and Fast Recovery Exit point
   on a per path basis with the following particulars:





Nielsen, et al.          Expires April 21, 2016                [Page 14]

Internet-Draft                  SCTP TLR                    October 2015


   o  A path enters the Fast Recovery State based on loss event
      observation of TSNs in flight on the path.

   o  When a loss event is observed on a path the Fast Recovery Exit
      point on the path is set to the highest TSN in flight of the path.

   o  Fast Retransmission of TSNs in flight on the path terminates once
      the Fast Recovery Exit Point on the path has been reached (i.e.,
      has been cumulative SACK'ed) at which point the Fast Recovery
      process on the path is terminated.

   o  The eligibility of a TSN for the Last Resort Retransmission and
      the Rescue operation shall follow the prescriptions given above
      with adherence to the Fast Recovery Exit point set on the path on
      which the TSN is in flight.

   The data retransmission process of data chunks in itself is
   prescribed to happen on the present data transfer path of the
   association regardless of which path the data chunks were in flight
   on when they became eligible for Fast Retransmission.  This follows
   [RFC4960] and the preceding [CARO02].

   With the above per path modelling of the Fast Recovery operation,
   SCTP may have multiple fast recovery exit points at any given time
   (though at most one per path) and the fast recovery operation may
   terminate at different times on the different paths.  Further it is
   noted that a path may be in Fast Recovery even if no data is in
   flight on the path or even if the only data in flight on the path is
   beyond the Fast Recovery Exit Point of the path.  The latter can
   occur in the very peculiar case where fast retransmission of data
   declared lost on the path happens on a different path as well as that
   the user performs a data directed data transfer on the path in
   question.

   An SCTP implementation fulfilling the goals described above may also
   be achieved by other means than by maintain of a per path Fast
   Recovery Exit point.  For example it might be achieved by maintain of
   a common association Fast Recovery Point spanning multiple paths, but
   still the implementation must ensure appropriate per destination
   address congestion control operation.

3.3.  SCTP-TLR Description

3.3.1.  Principles

   The SCTP TLR function is based on the following principles.





Nielsen, et al.          Expires April 21, 2016                [Page 15]

Internet-Draft                  SCTP TLR                    October 2015


3.3.1.1.  Retransmission Timers Management

   This document is specified as if there is a single retransmission
   timer per destination transport address, but implementations MAY have
   a retransmission timer for each DATA chunk.

   This document specifies usage of new PTO timer for SCTP TLR.  The
   document is specified as if the PTO timer functions are implemented
   by means of the existing retransmission timer of [RFC4960] SCTP,
   i.e., under certain conditions the retransmission-timer is activated
   with special PTO values rather than with the standard T3-timer value.
   The document is specified as if there is a single PTO timer per
   destination transport address, equivalently a single PTO timer per
   path.  Implementations MAY choose to implement a PTO timer per DATA
   chunk.

   For an outstanding TSN we define the time T_latest(TSN) to be the
   time that has elapsed since the TSN was last sent.  When a TSN is
   first sent, or when it is retransmitted, T_latest(TSN)=0.  An SCTP
   TLR implementation must be able to deduce this value for any
   outstanding TSN.

3.3.1.2.  Timer driven entering of Fast Recovery

   Timer driven entering of Fast Recovery in SCTP TLR is based on the
   following principles:

   o  Maintain of a Tail Loss Probe Timer (PTO) which in certain
      situations (generally when retransmission is not performed) is
      running on a path.  At any given time the value of the PTO timer
      is related to the lowest TSN in flight on the path.  The PTO timer
      value used will depend on the situation:

         By default the following timer value is used:

              PTO1:  PTO=MIN(RTO, 1.5*SRTT+MAX(RTTVAR, DELAY_ACK))

         Whereas the following value is used:

              PTO2:  PTO=MIN(RTO, 1.5*SRTT+RTTVAR)

         when it is known that subsequent SACKs not acknowledging the
         TSN for which the PTO is running will be (or will have been)
         returned immediately.  For more details see Section 3.3.2.

         By design the probe timer is kept lower or equal to the RTO,
         thereby aiming to prevent a potential unnecessary and damaging
         RTO, as well as generally larger than an anticipated RTT



Nielsen, et al.          Expires April 21, 2016                [Page 16]

Internet-Draft                  SCTP TLR                    October 2015


         thereby preventing that it kicks in prematurely.  I.e., the
         timer only kicks in at a time where one would have expected to
         have received a SACK of the lowest TSN in flight were there no
         problems.

         A minimal PTO value, PTO_MIN, is applied to the above formulas
         (particularly important for PTO2).  I.e., the effective PTO1 =
         MAX(PTO_MIN, PTO1) and the effective PTO2 = MAX(PTO_MIN, PTO2).
         The suggested value of PTO_MIN is 10 msec.  In the following
         when referring to PTO1 and PTO2 we refer to the effective PTO1
         and PTO2 values.

         For an SCTP implementation which performs RTT measurements
         during the association set-up, the PTO set on the path on which
         the first data chunk is sent shall be initialized from the RTT
         measured on the path during the association set-up.  If no such
         RTT measurement is performed or is available on the particular
         path in question, the PTO shall be initialized as RTO_INIT.

   o  PTO timer driven transmittal of Tail Loss Probe Packet: Once data
      is outstanding on a path and the PTO timer of the path kicks and
      no SACKs of any chunks with higher TSN number have arrived, a
      probe packet, denoted a Tail Loss Probe Packet (TLPP), is sent to
      probe for network responsiveness (i.e., for SACK of the TLPP) in
      order to potentially drive proactive entering of Fast Recovery.

      *  For a SCTP sender that supports the Immediate SACK feature,
         [RFC7053], the I-bit MUST be set on chunks sent in a TLPP
         packet.

   o  PTO timer driven entering of Fast Recovery: Process is enforced
      when network responsiveness is proven (SACK of later sent data
      than lowest TSN in flight on the path is available) and (at least)
      PTO time has elapsed since transmittal of this lowest TSN in
      flight on the path.

   Comment: The lowest outstanding TSN on an association may under
   special circumstances not be in flight on any path of the
   association.  This can happen when the lowest outstanding TSN has
   been declared lost but the transmittal of the TSN is prevented due to
   congestion window limitations (e.g., during Fast Recovery).  In this
   case, as well as generally for TSNs that are being retransmitted due
   to fast retransmission or T3-timeout, no PTO timer is running on the
   TSN.  Conversely when the lowest outstanding TSN on a path is not
   subject to Fast Recovery or T3-Recovery, then this lowest outstanding
   TSN is also in flight on the path.





Nielsen, et al.          Expires April 21, 2016                [Page 17]

Internet-Draft                  SCTP TLR                    October 2015


3.3.1.3.  Fast-Recovery and Loss Detection

   Fast Recovery and miss indication counting for the SCTP TLR function
   MUST embed the enhancements described in Section 3.2.  In addition
   SCTP TLR implements the following loss detection during Fast
   Recovery:

   o  If in Fast Recovery, then an outstanding TSN in flight on the
      path, with TSN lower that the Fast Recovery Exit Point on the
      path, is declared lost when the following conditions are
      satisfied:

      *  The TSN has not been fast retransmitted.

      *  T_latest(TSN) > PTO2.

      *  The TSN is lower than the highest outstanding SACK'ed TSN.

   When declared lost by this procedure the TSN is subtracted from the
   flight size as well as it becomes eligible for fast retransmission as
   if it had been declared lost by reach of Dupthresh miss indication
   counts.

   Such loss detection during SCTP TLR Fast Recovery shall at a minimum
   be done at receipt of SACK as well as at times where the possibility
   to transmit new data is being evaluated.  An implementation
   maintaining PTO timers on a per data chunk basis may make further
   evaluation based on timer expiration.

   Following [RFC4960] it is assumed that a data chunk should only be
   fast retransmitted once.  I.e., subsequent retransmissions of the
   data chunk must proceed as T3-retransmission.  An SCTP TLR
   implementation MAY possibly implement Multiple Fast Retransmission
   operation following the principles described in [CARO01] extended to
   include the Last Resort Retransmission and Rescue operations.  Such
   however is not covered by the specification given here.

3.3.1.4.  T3-Recovery

   [RFC4960] does not explicitly specify for an T3-Recovery phase to be
   supported for SCTP, nor does [RFC4960] explicitly demand for that a
   data chunk which has been T3-retransmitted cannot undergo fast
   retransmission.  It can be an advantage that a lost T3-retransmitted
   data chunk may be recovered by timely fast retransmission rather than
   by a subsequently, potentially back-off'ed T3-retransmission.  For
   [RFC4960] MH SCTP, however, reliable implementation of such fast
   recovery of lost T3-retransmitted data is difficult to achieve given
   the usage of one common SACK clock as new data on one path may trick



Nielsen, et al.          Expires April 21, 2016                [Page 18]

Internet-Draft                  SCTP TLR                    October 2015


   spurious fast retransmission of data that has been/is being
   T3-retransmitted on a different path.  Here it is important to
   emphasize that concurrent T3-retransmission and new data transmission
   on different paths is the standard operation of MH SCTP [RFC4960].
   (Though implementations might possibly mitigate such effects by only
   sending new data after completion of the T3-retransmission operation
   as well as the implementation of SCTP-PF, [SCTP-PF], would further
   decrease the likelihood of such concurrent data transfer occurring.)

   In this document we assume that an SCTP implementation follows either
   of the following implementation choices:

   o  A data chunk which has underwent T3-retransmission cannot
      subsequently be subject to Fast Retransmission whether such
      entering of Fast Recovery be driven alone by miss indication
      counting or by the SCTP TLR mechanism.  This implementation choice
      corresponds to implementing a T3-Recovery phase for SCTP
      equivalent with the RTO-recovery phase of TCP.

   o  A data chunk, which has underwent T3-retransmission, will be
      eligible for subsequent Fast Retransmission if such is driven by
      miss indication counts from SACKs of new data chunks sent after
      all data outstanding for T3-retransmission have been sent and the
      new data is sent on the same path as the T3-retransmission data.

   One implementation choice may be to follow the first implementation
   choice for SCTP MH and the second implementation choice for SCTP SH.
   Regardless of this implementation choice then in SCTP TLR a data
   chunk that has been subject to T3-retransmission SHOULD NOT by
   subject to the timer driven entering of Fast Recovery specified
   below.  The motivation for this choice is that the SRTT may not be
   appropriately refreshed during the T3-retransmission process.  OPEN
   ISSUE/TO DO: Ideally the PTO timer used after the exit of the
   T3-recovery phase should be updated based on a fresh RTT measurement.
   E.g., from the last acknowledged TSN.  If no new SRTT calculation is
   made based on a scheduled RTT measurement, then the PTO timer values
   could be made sure to be appropriately adjusted, if necessary, by a
   last measured RTT by 1,5*SRTT + RTTVAR --> MAX(1*5 RTT, 1,5*SRTT +
   RTTVAR).

3.3.2.  SCTP - TLR Statemachine

   The SCTP Tail Loss Recovery function defines 3 states: The SCTP TLR
   OPEN state, the SCTP TLR PROBE WAIT state and the SCTP TLR DELAY WAIT
   state.  At any given time the SCTP transmission logic for the lowest
   outstanding TSN on a path will be in one of these 3 states or the TSN
   is sought being recovered by means of Fast Recovery or T3-Recovery.




Nielsen, et al.          Expires April 21, 2016                [Page 19]

Internet-Draft                  SCTP TLR                    October 2015


   Figure 1 illustrates the states and the state transitions.

   (to be inserted)



          Figure 1, Enhanced Loss Recovery State Machine Diagram

   In the following we describe the states and the actions taken.

3.3.2.1.  SCTP TLR OPEN STATE

   This is the state the SCTP transmission logic is in on any path when
   no TSN is outstanding on the association as well as it is the state
   when SCTP sends the first data on a path after idle/no TSN
   outstanding.  It also more generally is the state the transmission
   logic is in when there are no gaps in the SACK scoreboard beyond the
   lowest outstanding TSN on the path.

   In this state SCTP is not performing Fast Recovery nor T3-Recovery on
   the lowest TSN outstanding on the path and no SACKs of any chunks
   with higher TSN number have arrived.  In this state, when SCTP has
   outstanding data on the path, a PTO timer is running relative to the
   lowest TSN outstanding on the path.

   The PTO set on a (new) lowest outstanding TSN on the path in this
   state will follow PTO1 when less than 2 packets are outstanding
   beyond the TSN at the time when the timer is set and follow PTO2 when
   2 or more packets are outstanding beyond the TSN when the PTO timer
   is set or when the Immediate SACK feature is known to be supported by
   both sender and receiver (see Section 4) and the I-bit has been set
   on the TSN or on an outstanding TSN of higher number.

   In the OPEN state the following may happen:

   o  A SACK commutatively acknowledging the lowest outstanding TSN and
      resulting in no gaps in the SACK scoreboard may arrive.  In this
      case the state remains in OPEN state.  If there still is
      outstanding data on the path, the PTO timer is set on the new
      lowest outstanding TSN.  The PTO timer value set will be the value
      PTO - T_latest(TSN) where the PTO value is calculated either from
      PTO1 or PTO2 according to the evaluation criteria given above.

   o  A SACK with gap(s) may arrive, thus proving network responsiveness
      while still not cumulatively acknowledging all lower (than the
      SACK'ed gap) outstanding TSNs on the path.  The SACK may or may
      not move the cumulative ACK point.  This indicates that either




Nielsen, et al.          Expires April 21, 2016                [Page 20]

Internet-Draft                  SCTP TLR                    October 2015


      packets are being re-ordered or the (new) lowest outstanding TSN
      on the path has been lost.

      *  If the SACK makes the miss indication count on the (new) lowest
         outstanding TSN reach Dupthresh the SCTP OPEN state is
         terminated and Fast Recovery is started.

      *  If Dupthresh miss indication count is not reached on the (new)
         lowest outstanding TSN, the state will now transit to SCTP TLR
         DELAY WAIT state for potential entering of SCTP TLR driven Fast
         Recovery if the PTO timer kicks prior to the (new) lowest
         outstanding TSN has been acknowledged or for potential later
         entering of Fast Recovery by reach of Dupthresh miss indication
         counts.  When transiting to SCTP TLR DELAY WAIT the PTO timer
         relative to the (new) lowest outstanding TSN is reset to PTO2 -
         T_latest(TSN).  In case PTO2 - T_latest(TSN) <= 0, the DELAY
         WAIT state is immediately terminated, the packet containing the
         lowest outstanding TSN is declared lost, and Fast Recovery is
         started.

   o  The PTO timer relative to the lowest outstanding TSN may kick, in
      which case SCTP TLR will send a TLPP, reset the PTO timer relative
      to the lowest outstanding TSN to a T3 timer and transit to SCTP
      TLR PROBE WAIT state to await either the kick of the T3 relative
      to the lowest outstanding TSN (network is persistently
      unresponsive) or proof of network responsiveness and potential
      entering of SCTP TLR driven Fast Recovery unless the network
      responsiveness proof comes in form of cumulative acknowledgement
      of the TSN.  The T3-value set relative to the lowest outstanding
      TSN when sending the TLPP probe and entering this state shall be:

      *  MAX(PTO1, RTO - T_latest(TSN))), when receiver side support for
         Immediate SACK has not been confirmed for the association, see
         Section 4.

      *  MAX(PTO2, RTO - T_latest(TSN)), when receiver side support for
         Immediate SACK has been confirmed for the association, see
         Section 4, and the SCTP sender itself deploys the Immediate
         SACK feature.

      For further details on the TLPP transmission see Section 3.3.3.

3.3.2.2.  SCTP TLR PROBE WAIT STATE

   In this state the lowest outstanding TSN has remained unSACK'ed for
   more than PTO time and no indication (no SACK of higher outstanding
   TSNs have been received) thus resulting in the transmittal of a TLPP
   to probe for the network responsiveness.



Nielsen, et al.          Expires April 21, 2016                [Page 21]

Internet-Draft                  SCTP TLR                    October 2015


   The T3-value set relative to the lowest outstanding TSN when sending
   the TLPP probe and entering this state is:

   o  MAX(PTO1, RTO - T_latest(TSN))), when receiver side support for
      Immediate SACK has not been confirmed for the association, see
      Section 4.

   o  MAX(PTO2, RTO - T_latest(TSN)), when receiver side support for
      Immediate SACK has been confirmed for the association, see
      Section 4, and the SCTP sender itself deploys the Immediate SACK
      feature.

   For further details on the TLPP transmission see Section 3.3.3.
   Observe that in some special cases no TLPP is sent even if this state
   is entered and conceptually is handled as if a TLPP has been sent.

   In the PROBE WAIT state the following may happen:

   o  SACKs may arrive that makes the miss indication count on the
      lowest outstanding TSN/lowest TSN in flight reach Dupthresh in
      which case the PROBE WAIT state is terminated and Fast Recovery is
      started.

   o  A SACK cumulatively acknowledging all holes including the lowest
      outstanding TSN may bring the SCTP TLR STM state back to SCTP TLR
      OPEN state.  In this case a new PTO timer will be started on the
      new lowest outstanding TSN following the PTO timer setting in the
      SCTP TLR OPEN state.  In this situation "PTO restart principles"
      (i.e., yielding PTO-T_latest(TSN)) shall not be deployed.
      Spurious entering of PROBE WAIT state can happen if the PTO is too
      short, in such a situation it would not be prudent to deploy PTO
      restart principles when returning to OPEN state.  OPEN ISSUE:
      Possibly PTO restart principles shall be refrained from until new
      RTT measurements are available.

   o  A SACK may arrive for a higher outstanding TSN with lowest
      outstanding TSN on the path remaining unSACK'ed.  This will result
      in declaration of the packet of the lowest outstanding TSN as lost
      and will make SCTP enter Fast Recovery.

   o  A SACK may arrive that acknowledges the lowest outstanding TSN,
      but also data of higher TSN than the new lowest outstanding TSN
      are acknowledged in the SACK.  In this case there is indication
      that either packet re-ordering has occurred or the new lowest
      outstanding TSN has been lost.  The state will now transit to SCTP
      TLR DELAY WAIT state for potential entering of SCTP TLR driven
      Fast Recovery if the PTO timer kicks prior to the new lowest
      outstanding TSN has been acknowledged.  The PTO timer set on the



Nielsen, et al.          Expires April 21, 2016                [Page 22]

Internet-Draft                  SCTP TLR                    October 2015


      new lowest outstanding TSN will be PTO2 - T_latest(TSN).  In case
      PTO2 - T_latest(TSN) <= 0, the DELAY WAIT state is immediately
      terminated, the packet containing the lowest outstanding TSN is
      declared lost, and Fast Recovery is started.

   o  The T3-timer may kick.  In this case the PROBE WAIT state will be
      terminated and T3-recovery will start on non-SACK'ed outstanding
      data.

3.3.2.3.  SCTP TLR DELAY WAIT STATE

   In this state network responsiveness has been received (in form of a
   SACK of higher TSN than the lowest outstanding TSN) and the PTO timer
   relative to the lowest outstanding TSN is running for potential
   entering of SCTP TLR driven Fast Recovery.

   The PTO set on a new lowest outstanding TSN in this state will be
   according to PTO2 in form of PTO2-T_latest(TSN).

   In the DELAY WAIT state the following may happen:

   o  SACKs may arrive that will make the miss indication count on the
      lowest TSN in flight reach Dupthresh, the DELAY WAIT state is
      terminated and SCTP enters Fast Recovery.

   o  The PTO timer relative to the lowest outstanding TSN may kick.
      This will result in declaration of packet of the lowest
      outstanding TSN as lost and will make SCTP enter Fast Recovery.

   o  A SACK cumulatively acknowledging all holes including the lowest
      outstanding TSN may arrive and bring the SCTP TLR STM state back
      to SCTP TLR OPEN state and the PTO timer will be restarted on the
      new lowest outstanding TSN.  The PTO timer value set will be the
      value PTO - T_latest(TSN) where the PTO value is calculated either
      from PTO1 or PTO2 according to the evaluation criteria given for
      the OPEN state.

   o  A SACK may arrive that acknowledges the lowest outstanding TSN,
      but also data of higher TSN than the new lowest outstanding TSN
      are acknowledged in the SACK.  In this case there is indication
      that either packet re-ordering has occurred or the new lowest
      outstanding TSN has been lost.  The state will remain in SCTP TLR
      DELAY WAIT state for potential entering of SCTP TLR driven Fast
      Recovery if the PTO timer kicks prior to the new lowest
      outstanding TSN has been acknowledged.  The PTO timer set on the
      new lowest outstanding TSN will be PTO2 - T_latest(TSN).  In case
      PTO2 - T_latest(TSN) <= 0, the DELAY WAIT state is terminated, the




Nielsen, et al.          Expires April 21, 2016                [Page 23]

Internet-Draft                  SCTP TLR                    October 2015


      packet containing the lowest outstanding TSN is declared lost and
      Fast Recovery is started.

   o  A SACK may arrive that does not acknowledge the lowest outstanding
      TSN and still do not make the miss indication count reach the
      Dupthresh value.  In this situation no changes are done to the PTO
      timer running and the state will remain in SCTP TLR DELAY WAIT
      state for potential entering of SCTP TLR driven Fast Recovery if
      the PTO timer kicks prior to the lowest outstanding TSN has been
      acknowledged.

3.3.2.4.  Exit of Loss Recovery

   After exit of Fast Recovery or completion of T3-retransmission then
   if data is outstanding a PTO timer is started relative to the lowest
   outstanding TSN on the path and the state transits to either SCTP TLR
   OPEN state or to SCTP TLR DELAY Wait state depending on the status of
   the SACK scoreboard (i.e., do gaps exist or not).  The PTO timer set
   will follow the rules described above.  PTO-restart principles shall
   not be deployed in this situation as fresh RTT measurements might not
   be available.  OPEN ISSUE: Possibly PTO restart principles shall be
   refrained from until new RTT measurements are available.

3.3.2.5.  RTO-Restart Principles for the T3-timer

   When the lowest TSN in flight on a path is undergoing Fast Recovery
   or T3-retransmission a T3-timer is running on the path (relative to
   this lowest TSN in flight).  For SCTP TLR the RTO-restart principles
   as of [HURTIG] SHOULD unconditionally be applied to the T3-timer.
   Thus the T3-timer set on a path in this case SHOULD be the value RTO-
   T_latest(TSN) relative to the lowest TSN in flight on the path.

3.3.3.  TLPP Transmission Rules

   The transmission of a Tail Loss Probe Packet (TLPP), done just prior
   to entering the SCTP TLR PROBE WAIT state from SCTP OPEN, is governed
   by the following details:

   o  TLPP of new data is always preferred if such is available for
      transmission.  If such exists, the TLPP sent is chosen as the
      lowest unsent TSNs that fit into one packet

   o  Alternatively if no new data is available for transmission, either
      due to application or receiver side limitations, the presently
      outstanding packet with highest TSN number is chosen as the TLPP.

   o  TLPP of retransmission data counts twice in the in-flight until
      acknowledged or detected as lost.



Nielsen, et al.          Expires April 21, 2016                [Page 24]

Internet-Draft                  SCTP TLR                    October 2015


   o  The transmittal of a TLPP of sub-PMTU size is not blocked by
      Nagle-like bundling.

   The highest (new) outstanding TSN is chosen for probing in order to
   best possibly interface with standard Fast Recovery, i.e., to create
   a loss pattern situation that corresponds best possibly with how Fast
   Recovery algorithm retransmits, and is invoked to retransmit, lost
   packets.

   TLPP Transmission conditions:

   A TLPP is not sent unconditionally when SCTP enters PROBE WAIT state
   on a path.

   No explicit limit is applied to the number of TLPP probe packets
   (i.e., the number of unacknowledged packets sent as TLPP) that may be
   outstanding at any given time but the number of such will in most
   situations be effectively limited to a very few (very often only one)
   by the following rules based on latency and congestion control
   principles; Generally a TLPP will not be allowed to breach the CWND
   more than once per RTT and further a TLPP is omitted to be sent if an
   already outstanding packet is considered to serve "good enough" from
   a network probing perspective.  In addition special considerations
   are given for the transmittal of a TLPP consisting of retransmission
   data to ease loss masking detection (see Section 3.3.4).  It is
   further noted that the frequency of TLPP transmittal is limited by
   how often a transition can happen out of and back into the PROBE WAIT
   state.

   The conditional transmission of a TLPP is specified as follows:

   o  If the highest outstanding TSN has been sent only a little while
      ago, this TSN effectively serves as a probe and no TLPP need to be
      send.  This condition aims to prevent unnecessary retransmission
      of just sent data and unnecessary transmittal of small sub-PMTU
      packets of new data.  The exact condition to apply is:

      *  If T_Latest(highTSN) < gamma * SRTT

      then no TLPP is sent. gamma = 1/2 is recommended.  A special
      condition arise when little data is outstanding and the SACK of
      the outstanding data may be lost by a single loss of SACK.  In
      this case the transmittal of a TLPP packet will make the SACK
      return be robust toward single loss of SACK.  For added robustness
      to SACK return an SCTP TLR implementation MAY disregard the above
      condition if only 2 packets are outstanding.





Nielsen, et al.          Expires April 21, 2016                [Page 25]

Internet-Draft                  SCTP TLR                    October 2015


   o  If no TLPP is outstanding, a probe is sent unconditionally of
      CWND.

   o  If a TLPP is outstanding, a probe is sent conditionally to that
      there is room in CWND.  Otherwise no TLPP is sent.  I.e., the CWND
      is not breached when a TLPP is outstanding.

   o  If no new data exists, a probe of retransmission data is sent
      conditional to whether a TLPP of retransmission data is already
      outstanding.  I.e.,:

      *  If no TLPP of retransmission data is outstanding, send TLPP
         consisting of highest outstanding TSN.

      *  If a TLPP of retransmission data is outstanding, no TLPP is
         sent.

   The above rules on probes of retransmission data are defined to ease
   the detection of TLPP recovered losses by the algorithm described in
   Section 3.3.4.

3.3.3.1.  Multi-Path Considerations for TLPP Transmission

   In multi-homed [RFC4960] SCTP, multiple paths may have a PTO timer
   running on data in flight.  E.g., two paths may be in SCTP OPEN state
   and SCTP will have two PTO timers running, each relative to the
   lowest outstanding TSN on the respective path.  This (exception)
   situation in particular can occur as a result of a change of the data
   transfer path as a result of a switchback operation to a primary
   path.  The handling of TLPP transmission for SCTP MH is described in
   the following.  The underlying philosophy of the solution is, as far
   as possible, to have the SCTP TLR probing mechanism be undertaken on,
   and by, the data transfer path.  Thus best possibly avoiding
   conflicts that may arise due to concurrent data transfers on multiple
   paths.  As follows:

   o  When the PTO timer kicks on a path in SCTP OPEN state and the TLPP
      selected by the rules above consists of new data, then if the path
      is the present data transfer path of the association the TLPP will
      be sent and in this case the TLPP is sent on the data transfer
      path of the association.  When in this situation the path is not
      the present data transfer path of the association, then

      *  if there is no outstanding data on the present data transfer
         path, the TLPP of new data is sent there.

      *  if there is outstanding data on the data transfer path, the
         TLPP is not sent.  Instead the potential transmittal of a TLPP



Nielsen, et al.          Expires April 21, 2016                [Page 26]

Internet-Draft                  SCTP TLR                    October 2015


         is deferred to be driven by a later kick of the PTO timer on
         the data transfer path.

      The first situation that data is available for transmittal on the
      data transfer path but has not been sent, is an unlikely
      situation, but it might possibly occur in some implementations.

   o  When the PTO timer kicks on a path in SCTP OPEN state and the TLPP
      selected by the rules above consist of retransmission of the
      presently highest outstanding TSNs on the association, then if and
      only if these TSNs are outstanding on the path in question is the
      TLPP allowed to be sent.  The following guidelines are given for
      the path selection for the TLPP:

      *  An SCTP implementation which does not implement the Unambiguous
         SACK feature of Appendix A should send the TLPP on the path on
         which the TNSs are presently outstanding (i.e., on the path on
         which the PTO kicked).

      *  An SCTP implementation which implements the Unambiguous SACK
         feature of Appendix A may send the TLPP on the data transfer
         path of the association.

      The reason a TLPP of retransmitted data in the first case above is
      sent on the path on which the data was first sent, even if this
      path is not the present data transfer path (special corner case
      with change of data transfer path or destination adders directed
      data transfer), is that the TLPP Loss Mask Detection mechanism,
      see Section 3.3.4 could not infer on which path to perform a
      congestion window reduction if the TLPP and original data is sent
      on different paths.  An SCTP implementation which implements the
      Unambiguous SACK feature of Appendix A can better distinguish the
      SACK of the original TSN and the retransmitted TSN and can
      therefore operate differently.  The choice of sending the TLPP on
      the data transfer path may be motivated by that the Fast Recovery
      procedure, which the SACK of the TLPP may result in, would use the
      data transfer path.  On the other hand then differences in the RTT
      on the different paths may make it suboptimal to send the TLPP on
      the data transfer path as well as it can give rise to potential
      uncertainty in the TLPP Loss Recovery Mask detection and reaction
      process (see Section 3.3.4).

   It is emphasized that the deferral of the transmission of a TLPP does
   not prevent entering of the PROBE WAIT state on the path where the
   PTO kicked.






Nielsen, et al.          Expires April 21, 2016                [Page 27]

Internet-Draft                  SCTP TLR                    October 2015


3.3.4.  Masking of TLPP Recovered Losses

   If a single SCTP packet is lost, there is a risk that the TLPP packet
   itself might repair the loss if that particular lost packet is used
   as probe.  The masking problem is only present if the TLPP is based
   on retransmission data.  The TLPP might mask the loss and thus
   interfere with the congestion control principle that requires for
   CWND halving when a loss is detected.

   At present the solution in this document operates with the algorithm
   defined for this purpose in [DUKKIPATI01] with adjustment to SCTP to
   rely on the D-SACK (duplicate TSN received) information available
   from SCTP SACK or alternatively to the information available from the
   Unambiguous SACK information of Appendix A.  The solution operates
   with a conceptual TLPP Retransmission Episode.  As follows:

   o  Once a TLPP packet consisting of retransmission data is sent a
      TLPP Retransmission Episode is started.

   o  A TLPP Retransmission Episode is abruptly terminated if Fast
      Recovery or T3-Recovery is entered.

   o  For an SCTP implementation which does not implement the
      Unambiguous SACK feature of Appendix A, as well as for an SCTP
      association where the Unambiguous SACK feature of Appendix A is
      not in use, the TLPP Retransmission Episode terminates when an
      incoming SACK cumulatively acknowledges a sequence number higher
      than the sequence number of the TLPP probe with retransmission
      data.  If at this time in stage the number of times the TLPP TSN
      has been received, according to the D-SACK information received,
      is lower than the number of times the TLPP TSN has been sent, CWND
      halving is done on the unique path on which the retransmission
      TLPP TSN has been sent.  Further at this stage in time the
      contribution from the TSN is subtracted from the flight size in
      accordance to the number of times the TSN has been sent.

   o  For an SCTP implementation which implements the Unambiguous SACK
      feature of Appendix A the following actions are taken at the time
      of acknowledgement of the TSN used as TLPP:

      *  If the TLPP TSN is first cumulatively acknowledged in a SACK
         with CUMACK TSN = TLPP TSN and with no SACK (or CUMACK) of
         higher TSNs, then from the Unambiguous SACK information SCTP
         sender can classify to be in the following cases:

         +  The original TSN has not (yet) been received, the
            retransmission TSN (the TLPP) has been received.




Nielsen, et al.          Expires April 21, 2016                [Page 28]

Internet-Draft                  SCTP TLR                    October 2015


            -  In this case the original TSN is judged as lost, CWND
               halving is performed on the path on which the original
               TSN was sent and the sent TSNs are subtracted from the
               flight size(s).  This concludes the TLPP Retransmission
               Episode.

         +  Both the original transmission as well as the retransmission
            (the TLPP) have been received.

            -  In this case the sent TSNs are subtracted from the flight
               size(s).  This concludes the TLPP Retransmission Episode.

         +  The original TSN has been received, the retransmission TSN
            (the TLPP) has not yet been received:

            -  In this case a special timer is started with value PTO-
               T_latest(TSN)and the bytes of the retransmitted TSN (the
               TLPP) remains in the flightsize of the path on which it
               was sent until either of the following happens -
               whichever happens first:

               o  Unambiguous SACK of the TSN is received in which case
                  the TSN is subtracted from the flightsize and the
                  timer is stopped.  This concludes the TLPP
                  Retransmission Episode.

               o  A SACK of a higher TSN than the TLPP arrives with
                  unambiguous SACK information indicating that the TLPP
                  has not been received.  Now marking is made on the
                  path so that, if when the timer kicks, the TSN has
                  still not been acknowledged, the TSN is judged as
                  lost, CWND halving is done and the TSN is subtracted
                  from the flightsize.  This then concludes the TLPP
                  Retransmission Episode.

               o  The timer kicks, the TSN is subtracted from the
                  flightsize (but no CWND halving is done).  This
                  concludes the TLPP Retransmission Episode.

      *  If the TLPP TSN is first cumulatively acknowledged in a SACK
         with highest SACK'ed (or CUMACK'ed) TSN > TLPP TSN, then from
         the Unambiguous SACK information SCTP sender can classify the
         same cases as above and take corresponding actions.  One
         additional situation can arise in this situation:

         +  Only one of the transmissions of the TSN has been received,
            but no clear Unambiguous SACK indication of which that was
            received is available from the SACK.  This uncertainty can



Nielsen, et al.          Expires April 21, 2016                [Page 29]

Internet-Draft                  SCTP TLR                    October 2015


            only result from situations where SACKs are lost,
            potentially in combination with that more data chunks than
            the TSN it self were outstanding at the time when the TLPP
            was sent and some of this data arrived later at the receiver
            than the original TSN or the TLPP.

            -  In this case the original TSN is judged as having been
               received and it is subtracted on the flightsize of the
               path on which it was sent.  The timer PTO-T_latest(TSN)
               is set and handling of potential CWND reduction caused by
               loss of the TLPP is handled following the principles
               described above.

   DISCUSSION of Unambiguous SACK Case Handling: CWND halving is not
   prescribed to be done for a potential lost retransmitted TSN used as
   TLPP in all cases above as there is no guarantee that a SACK
   confirming a potential arrival of the retransmitted TSN will arrive
   in time (i.e., this SACK may be lost).  CWND halving is done if SACK
   of a higher TSN number than the TLPP number has arrived, PTO time has
   elapsed since the transmittal of the TLPP and the TLPP in it self
   cannot be determined to be received from the Unambiguous SACK
   information.

3.3.5.  Elimination of unnecesary DELAY-ACK delays

   The negative impact of DELAY_ACK on the loss recovery delay is
   partially mitigated by setting of the I-bit on TLPP.

   OPEN ISSUES:

   o  It is to be determined if the Immediate SACK feature shall be
      relied on more aggressively.  Possible options are:

      *  Immediate SACK flag to be set on all retransmitted TSNs.

      *  Immediate SACK flag to be set on all TSNs that are sent where
         the transmittal of an immediate following subsequent packet
         cannot be foreseen.  This effectively would result in that the
         I-bit is set on a sent TSN whenever either of the following is
         true:

         +  no more chunks can be sent right after this chunk due to
            CWND limitations.

         +  no more chunks can be sent right after this due to RCV
            window limitations





Nielsen, et al.          Expires April 21, 2016                [Page 30]

Internet-Draft                  SCTP TLR                    October 2015


         +  no more chunks can be sent right after this as no more
            chunks are available in the SND buffer.

         +  no more chunks can be sent right after this due to Nagle.
            (May depend on the exact Nagle-like implementation).

      For the second choice it would be relevant to use PTO1 setting for
      the PTO timer on all TSNs sent with the I-bit set, when the
      receiver is known to support the Immediate SACK feature.  The
      downside of this choice is that it very severely limits the
      effectiveness of the DELAY_ACK feature.

   o  Ideally the PTO timer relative to the lowest outstanding TSN
      should be adjusted to follow PTO2 when a subsequent packet is
      transmitted.  The downside of this choice is the implementation
      impacts of such detailed - potentially per packet transmission -
      logic.  To be elaborated further.

4.  Confirmation of support for Immediate SACK

   Confirmation of receiver support of the Immediate SACK function,
   [RFC7053] is established by an SCTP TLR sender by the following
   means:

   o  In case the data chunk of [RFC4960] is in use on the association,
      confirmation of [RFC7053] support by the SCTP receiver is assumed
      if SCTP TLR sender receives a data chunk with the I-bit flag set.

   o  [TO DE CONFIRMED:] In case the I-data chunk of [SCTP-IDATA] is in
      use on the association, SCTP sender can by [SCTP-IDATA] assume
      that SCTP receiver supports [RFC7053].

5.  Socket API Considerations

   This section will describe how the socket API defined in [RFC6458] is
   extended to provide a way for the application to control the
   retransmission algorithms in operation in the SCTP layer.

   Socket option for control of the features is yet to be defined.

   Please note that this section is informational only.

6.  Security Considerations

   There are no new security considerations introduced by the functions
   defined in this document.





Nielsen, et al.          Expires April 21, 2016                [Page 31]

Internet-Draft                  SCTP TLR                    October 2015


7.  Acknowledgements

   The author acknowledges Henrik Jensen for his very significant
   contribution for the definition of, the implementation of and the
   experiments with function.

   The work heavily draws on prior art work done for TCP, [DUKKIPATI01]
   in particular.  The contributors of that work should be credited for
   many of the ideas put forward here for SCTP.

8.  IANA Considerations

   This document does not create any new registries or modify the rules
   for any existing registries managed by IANA.

9.  Discussion and Evaluation of function

   Experiments in progress.  Details to be filled in.

   Right now we use this section to retain a number of issues that are
   to further elaborated on:

   o  A significant number of spurious TLR probes have been observed in
      tests.  It is to be determined if this is a fact of the function
      or whether it may be improved with adjustment of the PTO timer
      calculations.

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC4960]  Stewart, R., Ed., "Stream Control Transmission Protocol",
              RFC 4960, DOI 10.17487/RFC4960, September 2007,
              <http://www.rfc-editor.org/info/rfc4960>.

   [RFC5061]  Stewart, R., Xie, Q., Tuexen, M., Maruyama, S., and M.
              Kozuka, "Stream Control Transmission Protocol (SCTP)
              Dynamic Address Reconfiguration", RFC 5061,
              DOI 10.17487/RFC5061, September 2007,
              <http://www.rfc-editor.org/info/rfc5061>.






Nielsen, et al.          Expires April 21, 2016                [Page 32]

Internet-Draft                  SCTP TLR                    October 2015


   [RFC5062]  Stewart, R., Tuexen, M., and G. Camarillo, "Security
              Attacks Found Against the Stream Control Transmission
              Protocol (SCTP) and Current Countermeasures", RFC 5062,
              DOI 10.17487/RFC5062, September 2007,
              <http://www.rfc-editor.org/info/rfc5062>.

   [RFC7053]  Tuexen, M., Ruengeler, I., and R. Stewart, "SACK-
              IMMEDIATELY Extension for the Stream Control Transmission
              Protocol", RFC 7053, DOI 10.17487/RFC7053, November 2013,
              <http://www.rfc-editor.org/info/rfc7053>.

   [SCTP-IDATA]
              R. Stewart et al, , "Stream Schedulers and User Message
              Interleaving for the Stream Control Transmission Protocol
              draft-ietf-tsvwg-sctp-ndata-04.txt", IETF Work In
              Progress , 07 2015.

10.2.  Informative References

   [CARO01]   A. Caro et al, , "Retransmission Policies with Transport
              Layer Multihoming", ICON , 2003.

   [CARO02]   A. Caro et al, , "Retransmission Schemes for End-to-end
              Failover with Transport Layer Multihoming", GLOBECOM , 11
              2004.

   [CMT-SCTP]
              Amer et al., P., "Load Sharing for the Stream Control
              Transmission Protocol (SCTP) draft-tuexen-tsvwg-sctp-
              multipath-10.txt", IETF Work In Progress , 5 2015.

   [DUKKIPATI01]
              Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis,
              "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of
              Tail", Work Expired , 2 2013.

   [DUKKIPATI02]
              Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi,
              "Proportional Rate Reduction for TCP", Proceedings of the
              11th ACM SIGCOMM Conference on Internet Measurement  , 11
              2011.

   [HURTIG]   P. Hurtig et al., , "TCP and SCTP RTO Restart, draft-ietf-
              tcpm-rtorestart-08", IETF Work In Progress , 3 2015.

   [MATHIS]   Mathis, M., "FACK", ACM SIGCOMM Computer Communication
              Review 26,4, 10 1996.




Nielsen, et al.          Expires April 21, 2016                [Page 33]

Internet-Draft                  SCTP TLR                    October 2015


   [Rajiullah]
              M. Rajiullah et al., , "An Evaluation of Tail Loss
              Recovery Mechanisms for TCP", ACM SIGCOMM Computer
              Communication Review 45,1, 1 2015.

   [RFC3758]  Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P.
              Conrad, "Stream Control Transmission Protocol (SCTP)
              Partial Reliability Extension", RFC 3758,
              DOI 10.17487/RFC3758, May 2004,
              <http://www.rfc-editor.org/info/rfc3758>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <http://www.rfc-editor.org/info/rfc5681>.

   [RFC5827]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
              P. Hurtig, "Early Retransmit for TCP and Stream Control
              Transmission Protocol (SCTP)", RFC 5827,
              DOI 10.17487/RFC5827, May 2010,
              <http://www.rfc-editor.org/info/rfc5827>.

   [RFC6458]  Stewart, R., Tuexen, M., Poon, K., Lei, P., and V.
              Yasevich, "Sockets API Extensions for the Stream Control
              Transmission Protocol (SCTP)", RFC 6458,
              DOI 10.17487/RFC6458, December 2011,
              <http://www.rfc-editor.org/info/rfc6458>.

   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
              and Y. Nishida, "A Conservative Loss Recovery Algorithm
              Based on Selective Acknowledgment (SACK) for TCP",
              RFC 6675, DOI 10.17487/RFC6675, August 2012,
              <http://www.rfc-editor.org/info/rfc6675>.

   [SCTP-PF]  Y. Nishida et al, , "SCTP-PF: Quick Failover Algorithm in
              SCTP, draft-ietf-tsvwg-sctp-failover-13.txt", IETF Work In
              Progress , 09 2015.

   [zimmermann01]
              Zimmermann, A., "CUBIC for Fast Long-Distance Networks,
              draft-ietf-tcpm-cubic-00", IETF Work In Progress , 6 2015.

   [zimmermann02]
              Zimmermann, A., "The TCP Echo and TCP Echo Reply Option,
              draft-zimmermann-tcpm-echo-option-00", IETF Work In
              Progress , 6 2015.






Nielsen, et al.          Expires April 21, 2016                [Page 34]

Internet-Draft                  SCTP TLR                    October 2015


   [zimmermann03]
              Zimmermann, A., "Using the TCP Echo Option for Spurious
              Retransmission Detection, draft-zimmermann-tcpm-spurious-
              rxmit-00", IETF Work In Progress , 7 2015.

Appendix A.  Unambuiguous SACK

   When receiving a SACK of a TSN it is not possible to unambiguously
   determine if the receiver hereby acknowledges the first transmission
   of the TSN or possible subsequent retransmissions of the TSN, when
   such multiple transmissions of the same TSN have been made.  The
   duplicate TSN information in the SCTP SACK chunk does help to provide
   information about how many times the same TSN has been received at
   the received side, but still it is not possible to unequivocally link
   the SACK information to the different transmissions of the same TSN.
   An additional source of ambiguity comes from the fact that packets
   may be duplicated in the network.

   Unambiguous SACK information is generally beneficial for many SCTP
   protocol aspects, e.g., for improved RTT measurements, for more
   accurate loss detection, maintain of flightsize and congestion
   control operation.

   Providing full accurate SACK information from receiver to sender side
   requires a reliable (and ordered) SACK feedback channel thus
   overcoming the information gap that may arise from loss (or from re-
   ordering) of SACKs.  The establishment of such a reliable feedback
   Chanel is not proposed but the proposal implements measures that
   allow for some robustness towards information loss due to SACK loss.

   NOTE for AUTHORS: The solution is independent from a potential split
   of the SACK TSN Gap information in SACK and NR-SACK gaps respectively
   following [CMT-SCTP].

A.1.  TSN Retransmission ID in Data Chunk Header

   It is a prerequisite that the SCTP association deploy, and has
   negotiated usage of, the new I-data chunk of [SCTP-IDATA].

   We define a new 4-bit Retransmission ID (RTX ID) in the I-data Chunk
   header.  The 4 bits consume 4 bits of the new reserved 16-bit filed
   of the I-data chunk header.  See Figure 1.









Nielsen, et al.          Expires April 21, 2016                [Page 35]

Internet-Draft                  SCTP TLR                    October 2015


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Type = 64   |  Res  |I|U|B|E|           Length              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                              TSN                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |        Stream Identifier      |   Reserved            | RTX-ID|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Message Identifier                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Payload Protocol Identifier / Fragment Sequence Number     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      \                                                               \
      /                           User Data                           /
      \                                                               \
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



                  Figure 1: RTX-ID in I-DATA chunk format

A.1.1.  Sender side behaviour

   New data MUST be sent with RTX-ID =0.  Whenever SCTP retransmits a
   data chunk it SHOULD step up the RTX ID.  The highest RXT ID = 15 is
   used for all retransmissions of the same TSN beyond the 15-th
   retransmission or when the RTX ID last used fort his TSN is 15.  An
   SCTP sender MAY step the RTX ID up with more than one count when
   retransmitting a TSNs in order to have all TSNs within the SCTP
   packet use the one and the same RTX ID.

A.1.2.  Receiver side behaviour

   An SCTP receiver supporting this feature MUST process the RTX ID for
   all received TSNs in accordance with the prescriptions for
   Unambiguous SACK return below.

A.2.  Unambuiguous SACK Chunk












Nielsen, et al.          Expires April 21, 2016                [Page 36]

Internet-Draft                  SCTP TLR                    October 2015


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   Type = x    |Chunk  Flags   |      Chunk Length             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Cumulative TSN RTX (CUMACK TSN)                    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          Advertised Receiver Window Credit (a_rwnd)           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Number of Gap Ack Blocks = N  |  Reserved (future NR-SACK ?)  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | NewlyCACK RTX ID Blocks = N   |   CACK Dupl TSN Blocks = N    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | NewlySACK RTX ID Blocks = N   |   SACK Dupl TSN Blocks = N    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Number of RTX SACK Blocks = N |  Reserved                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Highest CUMACK 'ed TSN received duplicated                    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  Gap Ack Block #1 Start       |   Gap Ack Block #1 End        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       /                                                               /
       \    format to be changed to cover more than 16-bits ?          \
       /                                                               /
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   Gap Ack Block #N Start      |  Gap Ack Block #N End         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       /                                                               /
       \      New Blocks in order set above ... to be filled in        \
       /                                                               /
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



                 Figure 2: Unambuiguous SACK chunk format

   Newly CACK RTX ID block:

      This block provides information on the newly acknowledged TSNs
      that were cumulatively acked in this SACK and for which the
      following hold:

      *  The TSN is newly acked in this SACK.  I.e., the TSN has not
         been received before (or if it has been received before it was
         since reneged).



Nielsen, et al.          Expires April 21, 2016                [Page 37]

Internet-Draft                  SCTP TLR                    October 2015


      *  The newly acknowledged TSN was received with RTX ID different
         from zero.

      The RTX ID received with the TSN is returned in this block.  The
      information returned in a CACK RTX ID block is a consecutive range
      of TSN fulfilling the above for which identical RTX ID has been
      received.  Proposed format is off-set from CUMACK TSN (lower than
      CUMACK TSN), length of range and RTX ID.

   Newly SACK RTX ID block:

      This block provides information on the newly acknowledged TSNs
      that were selectively acknowledged in this SACK and for which the
      following hold:

      *  The TSN is newly acked in this SACK.  I.e., the TSN has not
         been received before (or if it has been received before, it was
         since reneged).

      *  The newly acknowledged TSN was received with RTX ID different
         from zero.

      The RTX ID received with the TSN is returned in this block.  The
      information returned in a SACK RTX ID block is a consecutive range
      of TSN fulfilling the above for which identical RTX ID has been
      received.  Proposed format is off-set from CUMACK TSN (higher than
      CUMACK TSN), length of range and RTX ID - OR alternatively format
      of present SACK blocks with off set bounded by 16-bit to CUMACK
      TSN.

   Newly CACK Dupl TSN block:

      This block provides information on the TSNs received since last
      returned SACK for which following hold:

      *  The TSN is lower than or equal to the CUMACK TSN.

      *  The TSN is a duplicate.  Meaning that a data chunk with same
         TSN, but possibly different RTX ID, has been received.

      The RTX ID received with the TSN is returned in this block.  The
      information returned in a CACK Dupl TSN block is a consecutive
      range of TSN fulfilling the above for which identical RTX ID has
      been received.  Proposed format is off-set from CUMACK TSN (lower
      than CUMACK TSN), length of range and RTX ID.  The RTX ID may be
      zero.

   Newly SACK Dupl TSN block:



Nielsen, et al.          Expires April 21, 2016                [Page 38]

Internet-Draft                  SCTP TLR                    October 2015


      This block provide information on the TSNs received since last
      returned SACK for which the following hold:

      *  The TSN is higher than the CUMACK TSN.

      *  The TSN is a duplicate.  Meaning that a data chunk with same
         TSN, but possibly different RTX ID, has been received.

      The RTX ID received with the TSN is returned in this block.  The
      information returned in a SACK Dupl TSN block is a consecutive
      range of TSN fulfilling the above for which identical RTX ID has
      been received.  Proposed format is off-set from CUMACK TSN (higher
      than CUMACK TSN), length of range and RTX ID - OR - format of
      present SAC blocks with off set bounded by 16-bit to CUMACK TSN.
      The RTX ID may be zero.

   Together with the existing SACK information, the Newly CACK/SACK RTX
   ID and the CACK/SACK Dupl TSN blocks provide unambiguous SACK
   information for all received TSNs differentiating on the RTX ID
   received with the TSN.  The information may be partially lost from
   the receiver to the sender if a SACK is lost.  The RTX SACK Block and
   the Highest CUMACK Received Duplicated information is returned in
   order to provide means to recover part of the information that can be
   lost when a SACK is lost.

   RTX SACK block:

      This block provides information on the TSNs for which the
      following hold:

      *  The TSN has been received and has been selectively acked in
         prior SACKs (OPEN: alternatively in SACKs including this one).

      *  The TSN is higher than the CUMACK TSN.

      *  The TSN has been received only with RTX IDs different from
         zero.

      The information returned in an RTX block is a consecutive range of
      TSN fulfilling the above.  Proposed format is off-set from CUMACK
      TSN (higher than CUMACK TSN) and length of range - OR - format of
      present SACK blocks with off set - bounded by 16-bit to CUMACK
      TSN.

   Highest CUMACK'ed TSN received Duplicated:

      Here the highest TSNs that fulfill the following condition is
      inserted:



Nielsen, et al.          Expires April 21, 2016                [Page 39]

Internet-Draft                  SCTP TLR                    October 2015


      *  The TSN has been received duplicated

      *  The TSN is lower than or equal to the CUMACK TSN.

      When no duplicates have been seen or when no duplicates have been
      seen in last 2^31 window of TSNs that have been cumulatively
      acknowledged, CUMACK TSN +1 is returned.

   By means of the RTX SACK block an SCTP sender may recover the
   information that a SACK'ed TSN does not represent the original TSN
   first sent.  I.e., the TSN sent with RTX ID = 0.

   By means of the "Highest CUMACK'ed TSN received Duplicated" an SCTP
   receiver may recover the information that more than one incarnation
   of a TSN has been received when the SACK, which cumulatively
   acknowledged the arrival of the different incarnations of the TSN, in
   it self was lost.  The particular example of special interest is the
   case where the one and the same SACK would contain information on
   receipt of both the original TSN and a spurious retransmission of the
   TSN.  Such can happen in scenarios where DELAY_ACK handling at the
   receiver side delays the return of SACK information and a SACK is
   lost, even if the original data and the spurious retransmission data
   was sent with reasonable spacing in time.

A.2.1.  Receiver side behaviour

   The RTX SACK Block and the Highest CUMACK information to be returned
   in SACKs demand for an SCTP receiver to keep track (state) of the
   following information on a per association basis:

   o  A list (or ranges) of TSNs that have been SACK'ed, but not yet
      cumulatively acknowledged and for which RTX ID = 0 has not been
      seen.  It is noted that the TSN data chunk itself may have been
      delivered to the application.

   o  The highest TSN lower than CUMACK TSN for which a duplicate has
      been received.

A.3.  Unambuigous SACK return

   Whenever Unambiguous SACKs are in use on an association and SCTP
   receives a valid data chunk with RTX-ID different from zero it shall
   not delay the return of the Unambiguous SACK.  Otherwise Unambiguous
   SACKs are returned at any time when an [RFC4960] implementation would
   return a SACK.

   A window opener MUST include Unambiguous SACK information.




Nielsen, et al.          Expires April 21, 2016                [Page 40]

Internet-Draft                  SCTP TLR                    October 2015


A.4.  Negotiation

   An SCTP receiver MUST NOT send an Unambiguous SACK chunk unless both
   peers have indicated its support of the Unambiguous SACK feature
   within the Supported Extensions Parameter as defined in [RFC5061].
   If Unambiguous SACK has been negotiated on an association,
   Unambiguous SACKs MUST be returned whenever a SCTP receiver would
   return SACK information.  If Unambiguous SACK has not been negotiated
   on an association, the RTX-ID field in the chunk header of incoming
   data chunks MUST be ignored and [RFC4960] SACK format and return
   policies MUST be adhered to.

Authors' Addresses

   Karen E. E. Nielsen
   Ericsson
   Kistavaegen 25
   Stockholm  164 80
   Sweden

   Email: karen.nielsen@tieto.com


   Rafaelle De Santis
   Ericsson
   xx
   xx  xx
   Italy

   Email: rafaele.de.santis@ericsson.com


   Anna Brunstrom
   Karlstad University
   Universitetsgatan 2
   Karlstad  651 88
   Sweden

   Email: anna.brunstrom@kau.se


   Michael Tuexen
   Muenster Univ. of Appl. Science
   Stegerwaldstrasse 39
   Steinfurt  48565
   Germany

   Email: tuexen@fh-muenster.de



Nielsen, et al.          Expires April 21, 2016                [Page 41]

Internet-Draft                  SCTP TLR                    October 2015


   Randall Stewart
   Netflix, Inc.
   xx
   Chapin  29036 SC
   United States

   Email: randall@lakerest.net












































Nielsen, et al.          Expires April 21, 2016                [Page 42]