Internet DRAFT - draft-stein-pwe3-congcons

draft-stein-pwe3-congcons






PWE3                                                           YJ. Stein
Internet-Draft                                   RAD Data Communications
Intended status: Informational                                  D. Black
Expires: January 16, 2013                                EMC Corporation
                                                              B. Briscoe
                                                                      BT
                                                           July 15, 2012


                      PW Congestion Considerations
                      draft-stein-pwe3-congcons-01

Abstract

   Pseudowires (PWs) have become a common mechanism for tunneling
   traffic, and may be found competing for network resources both with
   other PWs and with non-PW traffic, such as TCP/IP flows.  It is thus
   worthwhile specifying under what conditions such competition is safe,
   i.e., the PW traffic does not significantly harm other traffic or
   contribute more than it should to congestion.  We conclude that PWs
   transporting responsive traffic behave as desired without the need
   for additional mechanisms.  For inelastic PWs (such as TDM PWs) we
   derive a bound under which such PWs consume no more network capacity
   than a TCP flow.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 16, 2013.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal



Stein, et al.           Expires January 16, 2013                [Page 1]

Internet-Draft                PW-CONGESTION                    July 2012


   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  PWs Comprising Elastic Flows . . . . . . . . . . . . . . . . .  4
   3.  PWs Comprising Inelastic Flows . . . . . . . . . . . . . . . .  5
   4.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   6.  Informative References . . . . . . . . . . . . . . . . . . . . 10
   Appendix A.  Loss Probabilities for TDM PWs  . . . . . . . . . . . 11
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12































Stein, et al.           Expires January 16, 2013                [Page 2]

Internet-Draft                PW-CONGESTION                    July 2012


1.  Introduction

   A pseudowire (PW) is a construct for tunneling a native service over
   a Packet Switched Network (PSN)(see [RFC3985]), such as IPv4, IPv6,
   or MPLS.  The PW packet encapsulates a unit of native service
   information by prepending the headers required for transport in the
   particular PSN (which must include a demultiplexer field to
   distinguish the different PWs) and preferably the 4 byte PWE3 control
   word.  PWs have no bandwidth reservation mechanism, meaning that when
   multiple PWs are transported in parallel there is no defined means
   for guaranteeing network resources for any particular PW.  This
   competition for resources may translate to a particular PW not being
   able to deliver the QoS required to emulate the native service.  For
   example, MPLS-TE enables achieving a particular desired allocation of
   resources between multiple LSPs; however, when multiple Ethernet PWs
   are placed in a single MPLS tunnel, there is no way to similarly
   divide resources amongst them (although DiffServ QoS prioritization
   may be available for PWs).  The use of PWs in service provider MPLS
   networks is well understood and will not be discussed further here.

   While PWs are most often placed in MPLS tunnels, there are several
   mechanisms that enable transporting PWs over an IP infrastructure.
   These include:
      TDM PWs ([RFC4553][RFC5086][RFC5087]) that define UDP/IP
      encapsulations,
      L2TPv3 PWs,
      MPLS PWs directly over IP according to RFC 4023 [RFC4023],
      MPLS PWs over GRE over IP according to RFC 4023 [RFC4023].
   Whenever PWs are transported over IP, they may compete with
   congestion-responsive flows (e.g., TCP flows).  Hence in order to
   prevent congestion collapse the PWs MUST behave in a fashion that
   does not cause undue damage to the throughput of such congestion-
   responsive flows [RFC2914].

   At first glance one may think that this would require a PW
   transported over IP to be considered as a single flow, on a par with
   a single TCP flow.  Were we to accept this tenet, we would require a
   PW to back off under congestion to consume no more bandwidth than a
   single TCP flow under such conditions (see [RFC5348]).  However,
   since PWs may carry traffic from many users, it makes more sense to
   consider each PW to be equivalent to multiple TCP flows.  We will
   discuss whether PWs consisting of elastic flows need a back-off
   strategy in Section 2.

   TDM PWs ([RFC4553][RFC5086][RFC5087]) represent inelastic constant
   bit-rate (CBR) flows that may require lower or higher throughput than
   that consumed by an otherwise-unconstrained TCP flow would under the
   same network conditions.  In any case a TDM PW is not able to respond



Stein, et al.           Expires January 16, 2013                [Page 3]

Internet-Draft                PW-CONGESTION                    July 2012


   to congestion in a TCP-like manner; on the other hand, the total
   bandwidth they consume remains constant and does not increase to
   consume additional bandwidth as TCP rates back off.  If the bandwidth
   consumed by a TDM PW is considered detrimental, the only available
   remedy is to completely shut down the PW.  Such a shutdown would
   impact multiple users, and the service restoration time would in
   general be lengthy.  We will discuss when the shut down of inelastic
   PWs can be avoided in Section 3.


2.  PWs Comprising Elastic Flows

   In this section we consider Ethernet PWs that primarily carry
   congestion-responsive traffic.  We will show that we automatically
   obtain the desired congestion avoidance behavior, and that additional
   mechanisms are not needed.

   Let us assume that an Ethernet PW aggregating several TCP flows is
   flowing alongside several TCP/IP flows.  Each Ethernet PW packet
   carries a single Ethernet frame that carries a single IP packet that
   carries a single TCP segment.  Thus, if congestion is signaled by an
   intermediate router dropping a packet, a single end-user TCP/IP
   packet is dropped, whether or not that packet is encapsulated in the
   PW.

   The result is that the individual TCP flows inside the PW experience
   the same drop probability as the non-PW TCP flows.  Thus the behavior
   of a TCP sender (retransmitting the packet and appropriately reducing
   its sending rate) is the same for flows directly over IP and for
   flows inside the PW.  In other words, individual TCP flows are
   neither rewarded nor penalized for being carried over the PW.  On the
   other hand, the PW does not behave as a single TCP flow; it will
   consume the aggregated bandwidth of its component flows, and backs
   off much less sharply than a single flow would.

   We claim that this is precisely the desired behavior.  Any fairness
   considerations should be applied to the individual TCP flows, and not
   to the aggregate.  Were individual TCP flows rewarded for being
   carried over a PW, this would create an incentive to create PWs for
   no operational reason.  Were individual flows penalized, there would
   be a deterrence that could impede pseudowire deployment.

   There have been proposals to add additional TCP-friendly mechanisms
   to PWs, for example by carrying PWs over DCCP.  In light of the above
   arguments, it is clear that this would force the PW to behave as a
   single flow, rather than N flows, and penalize the constituent TCP
   flows.  In addition, the individual TCP flows would still back off
   due to their end points being oblivious to the fact that they are



Stein, et al.           Expires January 16, 2013                [Page 4]

Internet-Draft                PW-CONGESTION                    July 2012


   carried over a PW.  This will further degrade the flow's throughput
   as compared to a non-PW-encapsulated flow.  Thus, such additional
   mechanisms contradict the behavior previously described as desirable.


3.  PWs Comprising Inelastic Flows

   TDM PWs ([RFC4553][RFC5086][RFC5087]) are more problematic than the
   elastic PWs of the previous section.  Being constant bit-rate (CBR),
   they can not be made responsive to congestion.  On the other hand,
   being CBR, they also do not attempt to capture additional bandwidth
   when TCP flows back off.

   Since a TDM PW continuously consumes a constant amount of bandwidth,
   if the bandwidth occupied by a TDM PW endangers the network as a
   whole, the only recourse is to shut it down, denying service to all
   customers of the TDM native service.  We should mention in passing
   that under certain conditions it may be possible to reduce the
   bandwidth consumption of a TDM PW.  A prevalent case is that of a TDM
   native service that carries voice channels that may not all be
   active.  Using the AAL2 mode of [RFC5087] (perhaps along with
   connection admission control) can enable bandwidth adaptation, at the
   expense of more sophisticated native service processing (NSP).

   In the following we will show that for many cases of interest a TDM
   PW, treated as a single flow, will behave in a reasonable manner
   without any additional mechanisms.  We will focus on structure-
   agnostic TDM PWs [RFC4553] although our analysis can be readily
   applied to structure-aware PWs (see Appendix A).

   There are two network parameters relevant to our discussion, namely
   the one-way delay D and the loss probability p.  The one-way delay of
   a native TDM service consists of the physical time-of-flight plus 125
   microseconds for each TDM switch traversed.  This is very small as
   compared to PSN network-crossing latencies.  Many protocols and
   applications running over TDM circuits thus require low delay, and we
   need thus only consider delays of up to about 32 milliseconds.

   The TDM PW RFCs specify the egress behavior upon experiencing packet
   loss.  Structure-agnostic transport has no alternative to outputting
   an "all-ones" AIS pattern towards the TDM circuit, which if long
   enough in duration is recognized by the receiving TDM device as a
   fault indication (see Appendix A).  International standards place
   stringent limits on the number of such faults tolerated.
   Calculations presented in the appendix show that only loss
   probabilities in the realm of fractions of a percent are relevant for
   structure-agnostic transport (see Appendix A).




Stein, et al.           Expires January 16, 2013                [Page 5]

Internet-Draft                PW-CONGESTION                    July 2012


   Structure-aware transport regenerates frame alignment signals thus
   hiding AIS indications resulting from infrequent packet loss.
   Furthermore, for TDM circuits carrying voice channels the use of
   packet loss concealment algorithms is possible (such algorithms have
   been previously described for TDM PWs).  However, even structure-
   aware transport ceases to provide a useful service at about 2 percent
   loss probability.

   RFC 5348 on TCP Friendly Rate Control (TFRC) [RFC5348] provides the
   following simplified formula for throughput that is used as the basis
   for TFRC's sending rate control.

                                    S
       X_Bps = ------------------------------------------------
                 R  ( sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2) )

   where
      X_Bps is average sending rate in Bytes per second,
      S is the segment (packet payload) size in Bytes,
      R is the round-trip time in seconds,
      p is the loss probability.

   We can use this formula to determine when a TDM PW consumes no more
   bandwidth than a TCP flow between the same endpoints would consume
   under the same conditions.  Replacing the round-trip delay with twice
   the one-way delay D, setting the bandwidth to that of the TDM service
   BW, and the segment size to be the TDM fragment TDM plus 4 Bytes to
   account for the PWE3 control word, we obtain the following condition
   for a TDM PW.

              (TDM + 4)
       D < ---------------
             BW f(p) / 4

   where
      D is the one-way delay,
      TDM is the TDM segment size in Bytes,
      BW is TDM service bandwidth in bits per second,
      f(p) = sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2).

   One may view this condition as defining a safe operating envelope for
   a TDM PW, as a TDM PW that consumes no more bandwidth than a TCP flow
   would not affect congestion more than were it to be TCP traffic.
   Under this condition it should hence be safe to mix the TDM PW with
   congestion-responsive traffic such as TCP, without causing
   significant additional congestion problems.  Were the TDM PW to
   consume significantly more bandwidth a TCP flow, it could contribute
   disproportionately to congestion, and its mixture with congestion-



Stein, et al.           Expires January 16, 2013                [Page 6]

Internet-Draft                PW-CONGESTION                    July 2012


   responsive traffic may be inappropriate.

   We derived the condition assuming steady-state conditions, and thus
   two caveats are in order.  First, the condition does not specify how
   to treat a TDM PW that initially satisfies the condition, but is then
   faced with a deteriorating network environment.  In such cases one
   additionally needs to analyze the reaction times of the responsive
   flows to congestion events.  Second, the derivation assumed that the
   TDM PW was competing with long-lived TDM flows, because under this
   assumption it was straightforward to obtain a quantitative comparison
   with something widely considered to offer a safe response to
   congestion.  Short-lived TCP flows may find themselves disadvantaged
   as compared to a long-lived TDM PW satisfying the condition.  These
   dynamic cases will be considered in future versions of this draft.

   The results are displayed in the accompanying figures (available only
   in the PDF version of this document).  TCP compatible behavior is
   obtained for the area under curves appropriate for each TDM fragment
   size.


   --------------------------------------------------------------------
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                    E1 compatibility regions                      I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                     (only in PDF version)                        I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   --------------------------------------------------------------------


   Figure 1 TCP Compatibility areas for E1 SAToP










Stein, et al.           Expires January 16, 2013                [Page 7]

Internet-Draft                PW-CONGESTION                    July 2012


   --------------------------------------------------------------------
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                    E3 compatibility regions                      I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                     (only in PDF version)                        I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   I                                                                  I
   --------------------------------------------------------------------


   Figure 2 TCP Compatibility areas for E3 SAToP































Stein, et al.           Expires January 16, 2013                [Page 8]

Internet-Draft                PW-CONGESTION                    July 2012


   We see in Figure 1 that a TDM PW carrying an E1 native service (2.048
   Mbps) satisfies the condition for all parameters of interest if each
   packet carries at least S=512 Bytes of TDM data.  For the SAToP
   default of 256 Bytes, as long as the one-way delay is less than 10
   milliseconds, the loss probability can exceed 0.3 percent.  For
   packets containing 128 or 64 Bytes the constraints are more
   troublesome, but there are still parameter ranges where the TDM PW
   consumes less than a TCP flow under similar conditions.  Similarly,
   Figure 2 demonstrates that an E3 native service (34.368 Mbps) with
   the SAToP default of 1024 Bytes of TDM per packet satisfies the
   condition for delays up to about 5 milliseconds.

   Note that violating the condition for a short amount of time is not
   sufficient justification for shutting down the TDM PW.  While TCP
   flows react within a round trip time, PW commissioning and
   decommissioning are time consuming processes that should only be
   undertaken when it becomes clear that the congestion is not
   transient.  Future versions of this draft will provide guidance as to
   when a TDM PW should be terminated.


4.  Security Considerations

   This document does not introduce any new congestion-specific
   mechanisms and thus does not introduce any new security
   considerations above those present for PWs in general.


5.  IANA Considerations

   This document requires no IANA actions.




















Stein, et al.           Expires January 16, 2013                [Page 9]

Internet-Draft                PW-CONGESTION                    July 2012


6.  Informative References

   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
              RFC 2914, September 2000.

   [RFC3985]  Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-
              Edge (PWE3) Architecture", RFC 3985, March 2005.

   [RFC4023]  Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating
              MPLS in IP or Generic Routing Encapsulation (GRE)",
              RFC 4023, March 2005.

   [RFC4553]  Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time
              Division Multiplexing (TDM) over Packet (SAToP)",
              RFC 4553, June 2006.

   [RFC5086]  Vainshtein, A., Sasson, I., Metz, E., Frost, T., and P.
              Pate, "Structure-Aware Time Division Multiplexed (TDM)
              Circuit Emulation Service over Packet Switched Network
              (CESoPSN)", RFC 5086, December 2007.

   [RFC5087]  Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi,
              "Time Division Multiplexing over IP (TDMoIP)", RFC 5087,
              December 2007.

   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
              Friendly Rate Control (TFRC): Protocol Specification",
              RFC 5348, September 2008.

   [G775]     International Telecommunications Union, "Loss of Signal
              (LOS), Alarm Indication Signal (AIS) and Remote Defect
              Indication (RDI) defect detection and clearance criteria
              for PDH signals", ITU Recommendation G.775, October 1998.

   [G826]     International Telecommunications Union, "Error Performance
              Parameters and Objectives for International Constant Bit
              Rate Digital Paths at or above Primary Rate",
              ITU Recommendation G.826, December 2002.













Stein, et al.           Expires January 16, 2013               [Page 10]

Internet-Draft                PW-CONGESTION                    July 2012


Appendix A.  Loss Probabilities for TDM PWs

   ITU-T Recommendation G.826 [G826] specifies limits on the Errored
   Second Ratio (ESR) and the Severely Errored Second Ratio (SESR).  For
   our purposes, we will simplify the definitions and understand an
   Errored Second (ES) to be a second of time during which a TDM bit
   error occurred or a defect indication was detected.  A Severely
   Errored Second (SES) is an ES second during which the Bit Error Rate
   (BER) exceeded one in one thousand (10^-3).  Note that if the error
   condition AIS was detected according to the criteria of ITU-T
   Recommendation G.775 [G826] a SES was considered to have occurred.
   The respective ratios are the fraction of ES or SES to the total
   number of seconds in the measurement interval.

   For both E1 and T1 TDM circuits, G.826 allows ESR of 4% (0.04), and
   SESR of 1/5% (0.002).  For E3 and T3 the ESR must be no more than
   7.5% (0.075), while the SESR is unchanged.

   Focusing on E1 circuits, the ESR of 4% translates, assuming the worst
   case of isolated exactly periodic packet loss, to a packet loss event
   no more than every 25 seconds.  However, once a packet is lost,
   another packet lost in the same second doesn't change the ESR,
   although it may contribute to the ES becoming a SES.  Assuming an
   integer number of TDM frames per PW packet, the number of packets per
   second is given by packets per second = 8000 / (frames per packet),
   where prevalent cases are 1, 2, 4 and 8 frames per packet.  Since for
   these cases there will be 8000, 4000, 2000, and 1000 packets per
   second, respectively, the maximum allowed packet loss probability is
   0.0005%, 0.001%, 0.002%, and 0.004% respectively.

   These extremely low allowed packet loss probabilities are only for
   the worst case scenario.  In reality, when packet loss is above
   0.001%, it is likely that loss bursts will occur.  If the lost
   packets are sufficiently close together (we ignore the precise
   details here) then the permitted packet loss rate increases by the
   appropriate factor, without G.826 being cognizant of any change.
   Hence the worst-case analysis is expected to be extremely pessimistic
   for real networks.  Next we will go to the opposite extreme and
   assume that all packet loss events are in periodic loss bursts.  In
   order to minimize the ESR we will assume that the burst lasts no more
   than one second, and so we can afford to lose no more than packet per
   second packets in each burst.  As long as such one-second bursts do
   not exceed four percent of the time, we still maintain the allowable
   ESR.  Hence the maximum permissible packet loss rate is 4%.  Of
   course, this estimate is extremely optimistic, and furthermore does
   not take into consideration the SESR criteria.

   As previously explained, a SES is declared whenever AIS is detected.



Stein, et al.           Expires January 16, 2013               [Page 11]

Internet-Draft                PW-CONGESTION                    July 2012


   There is a major difference between structure-aware and structure-
   agnostic transport in this regards.  When a packet is lost SAToP
   outputs an "all-ones" pattern to the TDM circuit, which is
   interpreted as AIS according to G.775 [G775].  For E1 circuits, G.775
   specifies for AIS to be detected when four consecutive TDM frames
   have no more than 2 alternations.  This means that if a PW packet or
   consecutive packets containing at least four frames are lost, and
   four or more frames of "all-ones" output to the TDM circuit, a SES
   will be declared.  Thus burst packet loss, or packets containing a
   large number of TDM frames, lead SAToP to cause high SESR, which is
   20 times more restricted than ESR.  On the other hand, since
   structure-aware transport regenerates the correct frame alignment
   pattern, even when the corresponding packet has been lost, packet
   loss will not cause declaration of SES.  This is the main reason that
   SAToP is much more vulnerable to packet loss than the structure-aware
   methods.

   For realistic networks, the maximum allowed packet loss for SAToP
   will be intermediate between the extremely pessimistic estimates and
   the extremely optimistic ones.  In order to numerically gauge the
   situation, we have modeled the network as a four-state Markov model,
   (corresponding to a successfully received packet, a packet received
   within a loss burst, a packet lost within a burst, and a packet lost
   when not within a burst).  This model is an extension of the widely
   used Gilbert model.  We set the transition probabilities in order to
   roughly correspond to anecdotal evidence, namely low background
   isolated packet loss, and infrequent bursts wherein most packets are
   lost.  Such simulation shows that up to 0.5% average packet loss may
   occur and the recovered TDM still conform to the G.826 ESR and SESR
   criteria.


Authors' Addresses

   Yaakov (Jonathan) Stein
   RAD Data Communications
   24 Raoul Wallenberg St., Bldg C
   Tel Aviv  69719
   ISRAEL

   Phone: +972 (0)3 645-5389
   Email: yaakov_s@rad.com









Stein, et al.           Expires January 16, 2013               [Page 12]

Internet-Draft                PW-CONGESTION                    July 2012


   David L. Black
   EMC Corporation
   176 South St.
   Hopkinton, MA  69719
   USA

   Phone: +1 (508) 293-7953
   Email: david.black@emc.com


   Bob Briscoe
   BT
   B54/77, Adastral Park
   Martlesham Heath
   Ipswich  IP5 3RE
   UK

   Phone: +44 1473 645196
   Email: bob.briscoe@bt.com
   URI:   http://bobbriscoe.net/































Stein, et al.           Expires January 16, 2013               [Page 13]