villamizar-rtgwg-route-stability-00.txt

Internet DRAFT - draft-villamizar-rtgwg-route-stability
draft-villamizar-rtgwg-route-stability

Last Version:	draft-villamizar-rtgwg-route-stability-00.txt	Tracker Entry
Date:	`15-Mar-2014`
Disposition:	expired





RTGWG                                                      C. Villamizar
Internet-Draft                                                     OCCNC
Intended status: Informational                            March 14, 2014
Expires: September 15, 2014


      Stability of Interior Routing and Load Balancing Techniques
               draft-villamizar-rtgwg-route-stability-00

Abstract

   This is an informational document intended to outline what is known
   about routing stability of interior gateway protocols (IGPs), traffic
   engineering techniques (TE) used in MPLS, and load balancing
   techniques.  Known stable and unstable conditions provide bounds of
   known behavior.  Network operators and equipment suppliers have
   relied heavily on operational experience as existence proof of
   instability or evidence of apparent stability.  Between the bounds of
   known stable and unstable behavior there may be conditionally stable
   protocols and deployment scenarios.  There is some discussion of the
   difficulty of mathematical stability proof for some techniques and a
   challenge to the research community.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 15, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of



Villamizar             Expires September 15, 2014               [Page 1]

Internet-Draft     Routing and Load Balance Stability         March 2014


   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
     1.2.  Document Scope  . . . . . . . . . . . . . . . . . . . . .   6
   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     2.1.  Intererior Routing  . . . . . . . . . . . . . . . . . . .   7
       2.1.1.  Distance Vector (DV) Protocols  . . . . . . . . . . .   8
       2.1.2.  Link State (LS) Protocols . . . . . . . . . . . . . .   8
     2.2.  MPLS Traffic Engineering (TE) . . . . . . . . . . . . . .   9
     2.3.  Label Distribution Protocol (LDP) . . . . . . . . . . . .   9
     2.4.  IP Fast Reroute (IPFRR) and LDP Fast Reroute (LDPFRR) . .  10
     2.5.  Protection and Restoration  . . . . . . . . . . . . . . .  10
     2.6.  Multipath Load Balancing  . . . . . . . . . . . . . . . .  11
     2.7.  Common Deployments  . . . . . . . . . . . . . . . . . . .  12
   3.  Problems to Avoid . . . . . . . . . . . . . . . . . . . . . .  12
     3.1.  Transient Loops and Black Holes . . . . . . . . . . . . .  12
     3.2.  Long Convergence Times  . . . . . . . . . . . . . . . . .  13
     3.3.  Impact of Multiple Failures . . . . . . . . . . . . . . .  13
     3.4.  Oscillations and Slosh  . . . . . . . . . . . . . . . . .  14
     3.5.  Positive Feedback Loops . . . . . . . . . . . . . . . . .  14
     3.6.  Scaling Limitations . . . . . . . . . . . . . . . . . . .  15
   4.  Link State (LS) Protocol Behavior . . . . . . . . . . . . . .  15
     4.1.  Transient Black Holes and Route Loops . . . . . . . . . .  15
     4.2.  Excessive Link Up and Link Down Transitions . . . . . . .  16
     4.3.  LSPDU fragment boundary change  . . . . . . . . . . . . .  17
     4.4.  Convergence Time  . . . . . . . . . . . . . . . . . . . .  17
     4.5.  Scaling Issues  . . . . . . . . . . . . . . . . . . . . .  18
       4.5.1.  Flooding and Dense Mesh Topologies  . . . . . . . . .  19
       4.5.2.  SPF Computation . . . . . . . . . . . . . . . . . . .  20
       4.5.3.  External Routes . . . . . . . . . . . . . . . . . . .  21
       4.5.4.  Forwarding Entry Installation . . . . . . . . . . . .  21
     4.6.  LDP Impact  . . . . . . . . . . . . . . . . . . . . . . .  22
       4.6.1.  IPFRR and LDPFRR Impact . . . . . . . . . . . . . . .  22
     4.7.  Delay, Jitter, and Loss Metrics . . . . . . . . . . . . .  22
   5.  MPLS-TE behavior  . . . . . . . . . . . . . . . . . . . . . .  23
     5.1.  Transient Behavior  . . . . . . . . . . . . . . . . . . .  23
     5.2.  Time Complexity of TE Restoration . . . . . . . . . . . .  24
     5.3.  Scaling issues  . . . . . . . . . . . . . . . . . . . . .  25
     5.4.  Delay, Jitter, and Loss Metrics . . . . . . . . . . . . .  25
   6.  Multipath Load Balancing  . . . . . . . . . . . . . . . . . .  25



Villamizar             Expires September 15, 2014               [Page 2]

Internet-Draft     Routing and Load Balance Stability         March 2014


     6.1.  Existing Multipath Techniques . . . . . . . . . . . . . .  26
     6.2.  Dynamic Load Balancing Feedback Loop  . . . . . . . . . .  26
   7.  Summary of Factors in Load Balancing Stability  . . . . . . .  27
     7.1.  Load Balance Granularity  . . . . . . . . . . . . . . . .  28
     7.2.  Load Balance Adjustment Certainty . . . . . . . . . . . .  29
     7.3.  Load Balance Measurement Compatibility  . . . . . . . . .  30
     7.4.  Load Balancing Decision Locality  . . . . . . . . . . . .  30
       7.4.1.  Local Decision Load Balance . . . . . . . . . . . . .  31
       7.4.2.  Distributed Load Balance  . . . . . . . . . . . . . .  31
     7.5.  Load Balance Timing and Traffic Adjustment  . . . . . . .  32
     7.6.  Preventing Feedback Problems  . . . . . . . . . . . . . .  33
   8.  Open Area for Research  . . . . . . . . . . . . . . . . . . .  35
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  36
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  36
   11. Security Considerations . . . . . . . . . . . . . . . . . . .  36
   12. Informative References  . . . . . . . . . . . . . . . . . . .  36
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  41

1.  Introduction

   The consequences of deploying a marginally stable solution can be a
   sudden and catastrophic transition to an unstable state.  Events
   related to instability vary in severity.  Network meltdowns are often
   documented only in standards forum discussions or operational forum
   discussions and therefore being otherwise undocumented become the
   topic of Internet Folklore.

   Route oscillations can have a very detrimental effect on the network
   performance perceived by end users.  In the mid-1990s, severe route
   oscillations could cause routers to fail, resulting in more route
   oscillation and eventually a provider meltdown.  While router
   software today is more robust and therefore meltdown is very
   unlikely, performance degregation due to route oscillation as
   perceived by user traffic can be quite severe.  Slow route
   oscillation, also known as route slosh, results in performance
   degregation though not as severe.

   This document outlines what the routing community knows and doesn't
   know about the topic of route stability in the types of networks
   deployed today and in potential deployment based on existing IETF
   work.  A brief section also mentions areas for research (Section 8).
   This document tries to avoid straying from well understood routing
   protocol behaviors.

1.1.  Terminology

   Related terms are grouped together.




Villamizar             Expires September 15, 2014               [Page 3]

Internet-Draft     Routing and Load Balance Stability         March 2014


   The terms listed below describe undesireable routing behaviors.  The
   terms are listed in order of decreasing severity.

   routing meltdown
           A routing meltdown is a sustained period when a very
           substantial number of routers either repeatedly crash and
           reboot or repeatedly lose routing adjacencies.

   route instability
           Route instability is a sustain period when route change has a
           cascading effect and causes widespread transient problems
           such as route black holes or routing loops.

   route loop
           A route loop occurs when a cycle exists in forwarding
           entries.  Some protocols are prone to creation of transient
           route loops.

   black hole
           A black hole exists when forwarding entries do not get
           packets to their destination.  Transient black holes may
           occur during route change in some routing protocols.

   route oscillation
           Route oscillation is a condition where routes keep changing
           in the absense of change in any conditions which should
           legitimately cause route change.

   route slosh
           Route slosh is a proper subset of route oscillation.  When
           route slosh occurs routes keep switching among two or more
           states where limited brief stability occurs between state
           changes due to protocol timers or other factors.  Route slosh
           is essentially just slow route oscillation.

   The set of terms below describe normal routing behaviors and other
   routing terms.  Related terms are grouped together within this list.

   route change
           Route change is as the name implies when routes and the
           corresponding forwarding entries change.  Route change may
           occur for many reasons.

   shared risk link group (SRLG)
           An SRLG is is a group of links which due to a shared physical
           resource are likely to fail simultaneously or nearly
           simultaneously.




Villamizar             Expires September 15, 2014               [Page 4]

Internet-Draft     Routing and Load Balance Stability         March 2014


   network fault
           A network fault occurs when a node or link fails.  Node
           protection is modeled using SRLGs covering all of the links
           adjacent to a given node.

   multiple fault
           A multiple fault refers to the failure of multiple resources
           causing links that are not covered by a common SRLG to fail.

   link metric
           Interior gateway protocols (IGPs) commonly use link metrics,
           which is synonymous with link costs, to determine a best
           route.  See Section 2.1 and Section 4 for discussion of IGPs.

   traffic engineering
           Traffic engineering (TE) involves capacity information as
           well as the simple link metrics used in link state protocols.
           The primary focus here is the type of traffic engineering
           commonly practiced in RSVP-TE networks.  See Section 2.2 and
           Section 5 for discussion of MPLS TE.

   route recovery
           Route recovery refers to getting traffic flowing after some
           event such as a network fault disrupts forwarding.  Route
           recovery may involve protection and/or restoration.

   route protection
           Route protection refers to a form of recovery using an
           alternate path, generally involving no signaling after the
           fault to begin using the alternate path, and involving either
           local detection of the fault or a simple proactive
           connectivity verification.

   zero bandwidth protection
           In MPLS TE, the use of protection paths with zero bandwidth
           allocation is common.  This is discussed in Section 2.5.

   route restoration
           Route restoration refers to recovery that involves exchange
           of routing information and/or some form of signaling to set
           up an alternate path.

   multipath








Villamizar             Expires September 15, 2014               [Page 5]

Internet-Draft     Routing and Load Balance Stability         March 2014


           A definition of the term "multipath" can be found in
           [I-D.ietf-rtgwg-cl-requirement].  Common forms of multipath
           include Equal Cost Multipath (ECMP), Ethernet Link
           Aggregation, MPLS Link Bundling.  See
           [I-D.ietf-rtgwg-cl-use-cases] for discussion of existing
           forms of multipath.

   local decision
           A local decision is one in which a single network element is
           involved in the decision.

   distributed decision
           A distributed decision is one where the independent decisions
           of multiple network elements have an effect on the outcome.

   offline route computation
           Offline route computation is where a centralized global
           routing decision with the results of the decision passed to
           all affected network elements.  Global optimization is often
           cited as its advantage, though in IP networks it is seldom
           used due to very slow route restoration.

1.2.  Document Scope

   This document focuses primarily on routing stability of the interior
   protocols used in service provider (SP) networks and on load
   balancing techniques used in SP networks.

   Service providers almost exclusively use link state (LS) routing
   protocols as interior gateway protocols (IGPs) rather than distance
   vector (DV) protocols.  The two link state routing protocols commonly
   used are OSPF [RFC2328] [RFC5340] and IS-IS [ISO.10589.1992]
   [RFC5308].  The MPLS LDP protocol [RFC5036] creates MPLS forwarding
   entries which either follow IGP routes or behaves in a manner similar
   to a DV protocol.  Techniques such as IPFRR and LDPFRR [RFC5714]
   [RFC6981] alter the transient behavior of the IGPs or LDP and are
   discussed briefly.  MPLS traffic engineering (TE) as practiced using
   RSVP-TE [RFC3209] makes use of IGP extensions based on [RFC3630]
   [RFC5305] to carry TE information.  MPLS-TE has inherently different
   behavior than IGP routes alone or MPLS using LDP.

   There has been activity in the IETF related to using secondary link
   metrics related to delay, jitter, and loss
   [I-D.ietf-ospf-te-metric-extensions]
   [I-D.ietf-isis-te-metric-extensions].  This includes use of these
   secondary metrics for MPLS TE path selection
   [I-D.ietf-mpls-te-express-path].




Villamizar             Expires September 15, 2014               [Page 6]

Internet-Draft     Routing and Load Balance Stability         March 2014


   Load balancing using hashing techniques date back to at least the
   mid-1980s, but were not formally documented until [RFC2991] and
   [RFC2992] though focusing solely on already outdated use of hashing
   in load balancing.  It was not until documents such as
   [I-D.ietf-rtgwg-cl-use-cases] and [I-D.ietf-mpls-forwarding] that
   these load balancing techniques were well documented.  There is also
   interest in dynamic load balancing which will also make use of delay,
   jitter, and loss information, either locally available or IGP
   advertised [I-D.ietf-rtgwg-cl-requirement].  The topic of stability
   with regard to this type of work is also discussed.

   Much has been written about the stability of the BGP routing
   protocol.  However, since the focus of this document is on interior
   routing stability, discussion of BGP stability is out of scope.
   There is some brief discussion of the impact of interior routing
   stability on BGP routes in Section 4.4 and Section 4.5.

   Load balancers typically used by enterprise and to some extent in
   data centers are not within scope.  This type of load balancer is not
   used in SP networks are the balancing algorithms are largely
   proprietary.

2.  Background

   The greatest concern regarding stability and oscillations has been in
   service provider (SP) networks, simply because any instability which
   occurs more readily at large scale has in the past affected SP
   networks first.  It may be that today's very large data centers now
   pose unique scalability and stability problems as well, but the focus
   remains on SP networks.  Some forms of instability can occur in
   enterprise networks and even very small networks, however common
   practice has evolved to make such occurances acheivable only through
   severe misconfiguration.

   This section provides a brief introduction to a set of protocols in
   use in SP networks today.  The focus is mainly on link-state routing
   protocols and LDP, on MPLS TE, and on multipath load balancing.
   These are the common underpinnings of SP networks and a recent topic
   of stability concerns due to proposed extensions to today's common
   practice.

2.1.  Intererior Routing

   Interior routing protocols are commonly referred to as interior
   gateway protocols (IGPs).  IGP is somewhat of a historic term but
   still widely used.  There are two types of IGPs in widespread use in
   IP networks.  There are numerous alternate techniques described in
   research, but with no deployment.  The two common types are Distance



Villamizar             Expires September 15, 2014               [Page 7]

Internet-Draft     Routing and Load Balance Stability         March 2014


   Vector (DV) and Link State (LS).  In later sections only the behavior
   of LS protocols is discussed since DV protocols are seldom if ever
   used in large SP networks.

2.1.1.  Distance Vector (DV) Protocols

   Distance vector (DV) routing protocols are rarely if ever used by
   service providers.  Distance vector routing protocols exchange
   distances to remote destinations.

   One of the earliest DV protocols in widespread use was Routing
   Information Protocol (RIP).  The first version is now known as RIP-1
   [RFC1058].  RIP-1 was used in the 1980s and 1990s but replaced by
   RIP-2 [RFC2453] due to RIP-1 shortcomings, most severe at the time
   being lack of CIDR [RFC4632] support.  RIP-ng [RFC2080] provides an
   extension to RIP-2 which supports IPv6.

   RIP is generally only used in small networks such as homes, small
   offices, and small enterprise.  RIP's advantage is simplicity.  RIP's
   disadvantages are many.  Among them are slow convergences as networks
   become large.

   Another DV routing protocol is Enhanced Interior Gateway Routing
   Protocol (EIGRP).  EIGRP is a much more advanced DV routing protocol
   than RIP, eliminating most if not all of RIP's shortcomings.  EIGRP
   was a Cisco proprietary protocol roughly until 2013 when Cisco gave
   permission for others to use EIGRP, recognizing that no SP would use
   EIGRP due to the vendor lock-in.

   Yet another DV routing protocol is Babel [RFC6126] which is well
   suited for wired and wireless network and is particular well suited
   for ad-hoc wireless networks.  Babel is not expected to see any
   deployment on SP networks, but could perhaps displace RIP on small
   networks such as home, small office, and enterprise.

2.1.2.  Link State (LS) Protocols

   Link state (LS) routing protocols build a topology database based on
   each router advertising a set of neighbor adjacencies and providing
   information about the router's capabilities and about each adhacency.
   This topology database is known as a link state database (LSDB).
   Among the advantages of LS protocols is the ability to carry
   additional topology information useful for other purposes, such as
   MPLS traffic engineering (TE) (see Section 2.2).







Villamizar             Expires September 15, 2014               [Page 8]

Internet-Draft     Routing and Load Balance Stability         March 2014


   The two major LS protocols are open shortest path first (OSPF)
   [RFC2328][RFC5340], defined by the IETF, and intermediate system to
   intermediate system (IS-IS) [ISO.10589.1992], defined by ISO but
   extended by IETF for IP use and for MPLS TE use.

   Differences between OSPF and IS-IS have sparked heated discussion,
   though most experts concede that use of either is more a matter of
   preference than strong technical merit.  OSPF sends a single router
   link state advertisement (LSA) per router with multiple opaque LSAs
   [RFC5250] carrying additional information.  IS-IS sends a single link
   state protocol data unit (LSPDU) per router with TLV packed into
   multiple LSPDU fragments.  Multiple LSPDU fragments are needed
   because the size of the complete LSPDU often far exceeds the link
   MTU.

   Arguments regarding the technical merit of one of these two LS
   protocols over the other are usually focused on information packing
   and flooding efficiency.  In both protocols the link state database
   (LSDB) can become very large in large networks, particularly those
   using MPLS TE.  Nearly identical MPLS TE extensions exist in OSPF and
   ISIS, with information exchanged in relatively small type-value-
   length tuples (TLVs) extensions carried in OSPF opaque LSA or in the
   IS-IS LSPDU fragments.  In OSPF related TLVs are packed into LSA.  In
   IS-IS many related and unrelated TLVs are packed into a given LSDB
   fragment.

   The behavior of LS protocols is discussed in detail in Section 4.

2.2.  MPLS Traffic Engineering (TE)

   MPLS traffic engineering (TE) makes use of additional information
   carried in the link state (LS) routing protocols OSPF and IS-IS (see
   Section 2.1.2).  MPLS label switched paths (LSPs) are set up using TE
   extensions to RSVP, known collectively as RSVP-TE.  The base RSVP-TE
   protocol specification is [RFC3209].

   The behavior of MPLS TE is discussed in detail in Section 5.

2.3.  Label Distribution Protocol (LDP)

   The MPLS Label Distribution Protocol (LDP) [RFC5036] can either make
   use of the best path as computed by the IGP or act as an independent
   DV protocol.  IETF has decided not to add TE extensions to LDP
   [RFC3468].  MPLS provides a light weight tunneling encapsulation and
   LDP provides a relatively light weight protocol to support MPLS
   forwarding and therefore to efficiently carry pseudowires (PW) and
   virtual private network (VPN) services.  LDP creates point to
   multipoint (P2MP) LSPs directed to specific forwarding equivalence



Villamizar             Expires September 15, 2014               [Page 9]

Internet-Draft     Routing and Load Balance Stability         March 2014


   class (FEC) rather that the point to point (PTP) LSPs commonly used
   in RSVP-TE.

2.4.  IP Fast Reroute (IPFRR) and LDP Fast Reroute (LDPFRR)

   IP/LDP Fast Reroute (IP/LDR FRR) [RFC5714] is also applicable in MPLS
   networks.  ECMP and Loop-Free Alternates (LFA) [RFC5286] are well
   established IP/LDP FRR techniques and were the first methods to be
   widely deployed.  Work on IP/LDP FRR is ongoing within the IETF
   RTGWG.  Two topics actively discussed in RTGWG are microloops and
   partial coverage of the established techniques in some network
   topologies.  [RFC5715] covers the topic of IP/LDP Fast Reroute
   microloops and microloops prevention.  RTGWG has developed additional
   IP/LDP FRR techniques to handle coverage concerns.  RTGWG is
   extending LFA through the use of remote LFA
   [I-D.ietf-rtgwg-remote-lfa].  Other techniques that require new
   forwarding paths to be established are also under consideration,
   including the IPFRR "not-via" technique defined in [RFC6981] and
   maximally redundant trees (MRT)
   [I-D.ietf-rtgwg-mrt-frr-architecture].  ECMP, LFA (but not remote
   LFA) and MRT swap the top label to an alternate MPLS label.  The
   other methods operate in a similar manner to RFC 4090 facility backup
   and push an additional label.  IP/LDP FRR methods which push more
   than one label have been suggested but are in early discussion.

   The IPFRR techniques apply to LDP fast reroute (LDPFRR).  Some
   techniques make use of tunneling to improve coverage.  The tunneling
   could in principle be provided by GRE but in practice MPLS with LDP
   is most often used to avoid a potential failure of a specific next-
   hop or specific SRLG.  The techniques which use tunneling require
   that the router known the path that other routers will be using to
   know that the potential failure will be avoided and therefore require
   the use of a LS IGP rather than a DV IGP or no IGP and use of LDP as
   a DV protocol.

2.5.  Protection and Restoration

   Network protection and restoration are collectively know as recovery.
   Getting traffic to flow again after a fault is network recovery,
   regardless of how it is accomplished.

   Route protection is often handled largely in hardware or forwarding
   card firmware rather than control plane software and can acheive
   recovery within a few tens of milliseconds.  While protection is
   fast, protection against multiple faults is highly impractical.

   Restoration may involve flooding updated routing information and may
   involve signaling.  Therefore restoration is inherently slower than



Villamizar             Expires September 15, 2014              [Page 10]

Internet-Draft     Routing and Load Balance Stability         March 2014


   protection but covers all variations of multiple faults except those
   multiple faults which cause network partitioning and are inherently
   irrepairable.

   The practice of zero bandwidth protection in MPLS FRR [RFC4090]
   relies on traffic class (TC) markings [RFC5462] and diffserv
   [RFC3270] [RFC4124] to provide very fast recovery for high priority
   traffic with no cost due to resource reservation of the protection
   paths.  This protection of high priority traffic may occur at the
   expense of transient congestion for lower priority traffic.  When
   combined with restoration, particularly if restoration is relatively
   fast, any congestion of lower priority traffic is eliminated or at
   least minimized shortly after protection goes into effect.

   A very cost effective deployment, and therefore quite common, is to
   use MPLS FRR [RFC4090] for very fast (tens of msec) single fault
   protection and to optimize router code for fast (well under a second)
   restoration.  Zero bandwidth reservation on the protection paths is
   commonly and perhaps exclusively used today.  The fast restoration
   covers arbitrary multiple faults and relieves congestion after
   initial zero bandwidth FRR protection.  This type of deployment is
   aimed at providing low cost and high availability for preferred
   services and also providing low cost and reasonably short disruptions
   for lower priority services when faults occur.

2.6.  Multipath Load Balancing

   Multipath load balancing in use today includes Equal Cost Multipath
   (ECMP), Ethernet Link Aggregation [IEEE-802.1AX], and MPLS Link
   Bundling [RFC4201].  In each case the load balancing involves a local
   decision and generally uses static load balancing.  In all of these
   cases the load balancing is based on IP header and MPLS label stack
   information as described in [I-D.ietf-mpls-forwarding].  Ethernet
   Link Aggregation using Ethernet MAC addresses is entirely useless in
   SP networks or any routed network.  Therefore Ethernet Link
   Aggregation as implemented by routers also uses IP header and MPLS
   label stack information rather than Ethernet MAC addresses.

   Static load balancing, being static, is inherently stable.  Dynamic
   load balancing is still a local decision, but where measurements are
   made and the load balance is adjusted based on those measurements.
   Measurements can be based on output link utilization, queue occupancy
   or queuing delay, or queuing loss including active queue management
   (AQM) loss, or some combination of these.  Dynamic load balancing has
   the potential to be unstable but has been successfully deployed in SP
   networks providing existance proof that stable implementations are
   possible.




Villamizar             Expires September 15, 2014              [Page 11]

Internet-Draft     Routing and Load Balance Stability         March 2014


   The behavior of multipath load balancing is discussed in Section 6.

2.7.  Common Deployments

   Rarely if ever are service provider (SP) network deployments exactly
   alike, but most have a great deal in common.  Nearly all major SP
   (possibly all) use either OSPF or IS-IS.  Some use no MPLS at all,
   though this subset is shrinking.  Many use LDP but not RSVP-TE.  Some
   use both LDP and RSVP-TE.  Some use RSVP-TE but not LDP though this
   may be less common than a decade ago.  The use of MPLS FRR [RFC4090]
   is thought to be more common than IPFRR or LDPFRR at least among
   larger providers.

   Protection is not always provided.  Fast restoration is considered
   essential even where protection is used due to the high occurance of
   multiple faults despite efforts to account for shared resources using
   SRLG.

   All large SP use multipath techniques, typically ECMP and Link
   Aggregation.  Static load balancing has generally been used.  Though
   dynamic load balancing has been successfully deployed, today static
   load balancing may be exclusively used due to the disappearance of
   the one core router supplier supporting dynamic load balancing.

   Given only these set of tools and given fixed link metrics, it is
   difficult to build a network which is unstable.  It is possible to
   build an unstable network, even with fixed link metrics, if scaling
   limitations described in Section 3 are ignored and exceeded, but as
   mentioned in Section 2, with today's common practices this situation
   is acheivable only through severe misconfiguration.

3.  Problems to Avoid

   Various existing routing protocols exhibit different transient
   behaviors and different tendencies toward specific problems.  Any
   problem which is transient is generally considered less severe than a
   problem that is persistent.  This section describes some common
   problems that new protocols, implementations, and deployments need to
   avoid where possible.  Section 4, Section 5, and Section 6 discuss
   the tendency of some routing protocols or techniques to exhibit these
   various problems.

3.1.  Transient Loops and Black Holes

   After a fault a transient black hole at the failed resource may be
   unavoidable.  Some routing protocols have a tendency to form
   transient black holes elsewhere when a topology change occurs and can
   also form transient loops in some circumstances.  Transient loops can



Villamizar             Expires September 15, 2014              [Page 12]

Internet-Draft     Routing and Load Balance Stability         March 2014


   cause severe congestion within the loop, affecting traffic which
   should not have been impacted from the initial fault.

3.2.  Long Convergence Times

   The most severe long convergence time problems are almost always due
   to poor implementations or badly designed networks.  However, even
   among competent routing protocol implementations and competent
   network designs, some routing protocols are more prone to long
   convergence times than others.

   While IGP protocols are not entirely immune to long convergence
   times, MPLS TE is particularly prone to long convergence times unless
   care is taken to avoid scaling issues.  The shortest path first
   (SPF), sometimes called shortest path tree (SPT) computation in a LS
   protocol is time complexity order NlogL, where N is the number of
   nodes and L is the number of links.  For MPLS TE, one CSPF
   computation is needed for each LSP and each CSPF computation is order
   NlogL.  A fault may affect the majority of the LSPs for which a given
   LSR is ingress and therefore may involve order N^2logL (N squared log
   L) computation for that LSR.

3.3.  Impact of Multiple Failures

   There are many situations in which the most common mode of failure or
   a common mode of failure causes simultaneous failure of multiple
   nodes or more often multiple links.  For example, many links may
   traverse a common physical fiber or fiber segment using wave division
   multiplexing (WDM).  These shared resources can be taken into account
   using shared risk link groups (SRLG) when computing protection paths.
   A multiple failure is one where network elements (nodes and/or links)
   which do not share any common SRLG fail simultaneously.

   Multiple failure occur on a regular basis.  These occur as a result
   of a potential shared resource failure which was either unknown or
   overlooked or considered so improbable as to make it more practical
   to ignore it.  Examples of completely unavoidable multiple failures
   are fibers on the same earthquake fault line, fibers crossing the
   same bridge (where only bridge collapse would cause simultaneous
   failure), and fibers on both sides of railroad tracks (where only a
   train derailment would cause simultaneous failure), widespread
   flooding.  There are many other less dramatic failure scenarios which
   are more common.

   Protection does not cover multiple faults.  Covering even double
   faults using protection adds significant network complexity.  Where
   protection requires resource reservation such as in transport
   networks, the need to reserve a multiple of the resource being



Villamizar             Expires September 15, 2014              [Page 13]

Internet-Draft     Routing and Load Balance Stability         March 2014


   protected make covering multiple faults prohibitably expensive in
   most cases.

   Using protection covering only single faults and having no
   restoration capability is unacceptable for IP networks and
   unacceptable for many of today's services.  Protection covering only
   single faults with no restoration is common practice for transport
   protected services largely accounting for steep decline in use of
   protected transport services.  Using restoration with excessively
   long convergence times is also unacceptable for IP and for many of
   today's services.

3.4.  Oscillations and Slosh

   An extremely severe problem is persistent route oscillations.  Other
   than buggy implementations or exceeding implementation scaling
   limits, producing oscillations in today's SP networks would require
   misuse of link metrics which vary with traffic load or some other
   form of positive feedback (see Section 3.5).

   When protocol timers or other protocol features limit oscillation to
   a fairly long period, the resulting low frequency route oscillation
   is known as route slosh.  Route slosh is also a severe problem if it
   is persistent.  If damped such that the slosh impact fairly quickly
   reduces to zero, then some slosh may be acceptable.

3.5.  Positive Feedback Loops

   Any system with positive feedback is unstable.  For example, consider
   a simple electronic amplifier.  With DC positive feedback, any small
   signal would cause a saturated output.  Amplifiers employ negative
   feedback to improve linearity.  However if the feedback is delayed a
   little, then positive feedback can occur at high frequencies which
   can result in a fried amplifier.

   The classic case of positive feedback in routing is the delay based
   link metrics used briefly in the 1980s [DBSPRA].  Various forms of
   damping are discussed in the paper describing this use of delay based
   link metrics, some with limited success, though in the end ARPANET
   discontinued using delay based link metrics.

   Use of delay, jitter, or loss has the potential for creating a
   situation where a positive feedback loop exists and therefore where
   oscillation occurs.  This problem is discussed in Section 4.7,
   Section 7.4.2 and Section 7.

   A similar feedback may occur in MPLS TE if an "auto-bandwidth"
   feature is used, though occurance unlikely.  This feature varies the



Villamizar             Expires September 15, 2014              [Page 14]

Internet-Draft     Routing and Load Balance Stability         March 2014


   requested LSP resource reservation based on observed traffic measured
   at the LSP ingress.  Increasing the LSP resource reservation could
   put an LSP on a longer delay path.  The additional delay may reduce
   the offerred load slightly and allow the LSP to be moved back to its
   original path.  In practice this problem is avoided by using only
   very long term filtered measurements, though the result may be very
   low frequency route slosh.

3.6.  Scaling Limitations

   Routers have finite resources.  Scaling problems can occur in the
   following situations.

   1.  When protocols are poorly designed and inherently consume
       excessive resources.

   2.  When specific types of deployments are used with protocols which
       consume excessive resources in those configurations.

   3.  When implementations provide inadequate resources to support what
       today are considered normal deployments.

   Resources which may become exhausted include specialized buffering,
   storage, and computational resources.

   Specific examples of scaling problems are given in Section 4.5 and
   Section 5.3.

4.  Link State (LS) Protocol Behavior

   Link state (LS) protocols are simple and effective.  Transient
   behavior during route change is somewhat prone to route loops but
   with reasonably good implementations convergence is fast and
   therefore transient problems are short lived.

   The two prominent link state (LS) protocols, OSPF and IS-IS, exhibit
   very similar behavior.  Only where there is a significant difference
   will either of the two protocols be mentioned in this section.

4.1.  Transient Black Holes and Route Loops











Villamizar             Expires September 15, 2014              [Page 15]

Internet-Draft     Routing and Load Balance Stability         March 2014


   Transient problem arise from the hop by hop nature of forwarding with
   direct application of LS protocols.  Each router makes a decision to
   change routes independently and the order in which routers learn of a
   LSDB change and make a decision based on changes can vary.  Transient
   situations can arise where routers in a subset of the topology have
   installed routes based on a slightly different LSDB and either route
   traffic in a two router loop which is suppressed resulting in a black
   hole, or in a multirouter loop causing a forwarding loop.

   As long as convergence is fast, the impact of transient problems is
   small.  A forwarding loop can create very severe but temporary
   congestion.  The temporary congestion created by a forwarding loop
   will often affect traffic to destinations that should not have been
   affected by the route change.  LS convergence, including flooding,
   routing decision, and installation of forwarding entries today can be
   well under a second but does not rival the few tens of milliseconds
   acheived by protection mechanisms.

4.2.  Excessive Link Up and Link Down Transitions

   Links are considered in an "up" or "down" state depending on whether
   some criteria is met.  The most simple criteria, successful delivery
   of routing protocol hello packets, is among the most problematic.
   Under some conditions, a link may undergo excessive state changes
   unless the implementation takes some action to prevent this
   situation.  For example, hysterisis in the criteria may be used or
   link state may be held in a "down" state as a result of frequent
   change.  Excessive transitions may occur when link state "down"
   transitions are quickly triggerred by some number of errors and are
   quickly reverted to the "up" state if the error rate falls below the
   target rate.  This can be solved by one or more of a number of means:

   1.  Hold the link in the down state for a minimum time period which
       is long compared to the measurement interval.

   2.  Make use of hysteresis, such as a long period of completely error
       free operation before changing link state back to "up".

   3.  Make use of a technique which progressively keeps the link down
       for longer periods depending on the rate at which the error
       threshholds are crossed.  For example, [RFC2439] can be applied
       to IGP link state at the originating router as well as applied to
       BGP.

   If link state changes are frequent, even just for a single link, then
   a network running a LS protocol may spend a significant amount of
   time in a transient state where route loops and black holes are
   possible.



Villamizar             Expires September 15, 2014              [Page 16]

Internet-Draft     Routing and Load Balance Stability         March 2014


4.3.  LSPDU fragment boundary change

   A problem unique to IS-IS is related to LSPDU fragement packing and
   shifting boundaries.  If an implementation maintains a sorted order
   among TLVs, maintains tight packing of TLVs into LSPDU fragements,
   and moves TLVs from one LSPDU fragment to another to maintain the
   tight packing, then a transient condition can occur where the LSPDU
   fragement from which a set of TLV is withdrawn arrives first and an
   SPF is triggerred before the change to the LSPDU fragement where the
   set of TLVs have been added is learned.

   A worst case implementation strategy tries to pack the set of LSPDU
   fragments as tightly as possible and in keeping them tightly packed
   frequently moves TLVs across LSPDU boundaries when changes occur.
   The boundary shift due to a link changing may result in erroneous
   withdrawl of a set of unrelated TLVs.  In addition, a larger set of
   LSPDU fragments have to be advertised if boundaries are shifted than
   if less tight packing is targeted.

   There are numerous strategies to avoid moving fragment boundaries and
   to eliminate the erroneous withdrawl of TLVs.  One strategy is to set
   a target in which LSPDU fragments are packed within a fractional
   range of full.  For example a target 1/2 to 3/4 might be used.  A
   second strategy is to separate the action of updating TLVs from the
   action of moving boundaries to keep within the target packing range
   and when moving boundaries delay sending the withdrawl of the TLVs
   relative to their addition in the adjacent fragment.  Other
   strategies to avoid this problem are possible.

4.4.  Convergence Time

   Link state (LS) convergence time includes flooding, routing decision
   (SPF plus changes to inter-area or inter-level routes plus changes to
   external routes), and forwarding information base (FIB) update.  The
   forwarding entries used by external routes carried within the IGP and
   by other routing protocols such as BGP are often affected by the
   underlying IGP intra-area or intra-level routes.  This second set of
   routes may have to be changed, though with multistage forwarding
   lookups used in many routers today most secondary changes are
   completed automatically.

   Operational experience has shown that flooding should be prioritized
   over route recomputation.  The SPF computation on fast hardware with
   a small topology may occur in under 10 msec, and in some cases under
   1 msec, however on slower hardware and with larger topologies, an SPF
   may exceed 100 msec.  If reflooding at each hop is delayed by SPF
   time, the overall convergence is slowed.  In addition, a single
   resource fault may result in multiple topology changes (see



Villamizar             Expires September 15, 2014              [Page 17]

Internet-Draft     Routing and Load Balance Stability         March 2014


   Section 3.3 for background).  Prioritizing reflooding and slightly
   delaying SPF helps ensure that the next SPF will be based on the full
   set of changes.

   With a decision to prioritized reflooding and to delay SPF the
   question arises as to how long to delay SPF.  For most networks an
   approximate worst case fault detection time is known.  This detection
   time often has a lower bound dictated by speed of light propogation
   delay (about 1 msec per 200 km or 125 miles of fiber) and a practical
   upper bound that is close to the lower bound.  For long haul
   terestrial networks a lower bound of 10-20 msec is common.  For
   trans-oceanic fiber this can be 50-100 msec.  A common practice is to
   set a timer when the first IGP change is received, reflood as quickly
   as possible, and perform an SPF when the timer elapses.  With a timer
   that is a small multiple of worst case fault detection time the SPF
   will tend to occur with a full set of LSDB changes rather than a
   partial set of changes.

   In a multithreaded control processor architecture, it is possible to
   continue reflooding while an SPF is in progress, as long as newly
   arriving changes do not affect the SPF in progress.  In such a case a
   delay in starting an SPF is often useful, but the delay can be made
   shorter, particularly with a very fast SPF relative to worst case
   fault detection time.  With a very fast SPF, the limiting factor may
   be forwarding entry install time.

   Convergence time is also impacted by scaling issues discussed in
   Section 4.5.

4.5.  Scaling Issues

   Scaling issues result from exceding the practical capacity of a given
   resource.  Some of the affected resources include the following:

   socket buffer
           Severe scaling problems can occur in full mesh or very dense
           mesh networks due to limited short term buffering of flooded
           data.  See Section 4.5.1.

   computation
           Computation issue seldom occur except with poor
           implementations or networks containing a very large single
           OSPF area or IS-IS level.  See Section 4.5.2.

   LSDB size
           An extremely large number of external routes carried by a LS
           protocol can cause scaling problems.  See Section 4.5.3.




Villamizar             Expires September 15, 2014              [Page 18]

Internet-Draft     Routing and Load Balance Stability         March 2014


   FIB install speed
           The amount of time required to install an extremely large
           number of forwarding entries can cause problems.  See
           Section 4.5.4.

   Any of these problems can be made slight better by an exceptionally
   good implementation or much worse by an unusually bad implementation.
   Each of the subsections that follow illustrate a type of scaling that
   is capable of overwhelming even the best implementations through
   careless network design.  Particularly bad implementations can have
   their own unique scaling limitations.  The set of scaling issues
   described here is almost certain to be incomplete.

4.5.1.  Flooding and Dense Mesh Topologies

   A physical fault may affect a number of neighbor adjacencies
   advertised by a number of routers.  This is generally a relatively
   small number, perhaps tens of adjacencies and a smaller number of
   routers.  In OSPF each router advertises a new router LSA plus it may
   advertise opaque LSAs with additional information about the affected
   links.  In IS-IS the equivalent of the router LSA is generally packed
   into one LSPDU fragement.  One or more additional LSPDU fragements
   may carry the TLVs with additional information about the affected
   links.  IS-IS passes each LSPDU fragment as a full MTU packet, which
   for SP core networks is often in excess of 8 KB.

   Each router originating a set of changes passes this updated
   information, perhaps a few 10s of KB, to each remaining neighbor.  If
   the information is not already known from some other source, then any
   given neighbor will then reflood to each of its neighbors.

   Most SP networks have an average of about 2.5-4 core neighbors per
   core router.  For this type of topology, known as a sparse mesh, LS
   protocol flooding (absent any bad implementation) is fast, efficient,
   and free of scaling problems.

   A very dense mesh network or a full mesh network represents a
   pathologic case.  Attempts to use an overlay model often result in a
   full mesh or a very dense mesh.

   In a full mesh of N routers, each router originating a set of changes
   is adjacent to every other router, or N-1 routers.  Each of the N-1
   routers will reflood to its N-2 neighbors.  The original 10s of KB
   origninated as a result of a single fault is multiplied by a factor
   that is order N^2 (N squared).  Any given LSR may receive N feeds of
   the same information.  For example, if N is 1000 and 40 KB of change
   was originated, a given router may receive 40 MB of route updates
   when a single link changes state.  This can overflow the memory



Villamizar             Expires September 15, 2014              [Page 19]

Internet-Draft     Routing and Load Balance Stability         March 2014


   allocated for socket buffers and cause a lot of this flooded
   information to drop and have to be retransmitted, lengthening
   convergence.  Any part of the feed that is dropped at one router from
   one neighbor but successfully received from another neighbor will be
   sent the other way toward the router from which the information was
   dropped.

   If a router fails in a full mesh network of N nodes, the situation is
   much worse than for a link failure.  That one router is adjacent to
   N-1 other routers and each must originate at least one LSA or LSPDU
   fragement and likely a few.  We'll call the number of LSAs or LSPDU
   fragments originated per router K.  Each of these N-1 routers
   originate K LSAs or LSPDU fragments and send these to their N-2
   neighbors which will reflood to N-3 of their neighbors.  The total
   number of LSAs or LSPDU fragements sent is order K*N^3.  Each router
   could receive a feed of K*N-1 LSAs or LSPDU fragments from each of
   N-2 neighbors in a short time period.  For example, for K=5, the MTU
   is 8KB (IS-IS), and N=1000, a router can receive a set of 40 MB feeds
   totaling about 40 GB.  No router today or in the forseeable future
   has that amount of socket buffer space.

   Flooding provides a fast way to distribute route changes and works
   extremely well on a typical SP network with a sparse mesh.  Flooding
   simply does not work well in large full mesh topologies or even large
   very dense mesh topologies.  For example, some attempts to overlay IP
   networks with a LS protocol over an ATM full mesh in the mid 1990s
   yielded severely unstable (and therefore short lived) networks.  Full
   mesh or dense mesh topologies should be avoided.

4.5.2.  SPF Computation

   The shortest path first (SPF) computation used in LS protocols
   requires time order NlogL, where N is the number of nodes and L is
   the number of links.  In network designs with a node degree that is
   relatively independent of N, for example a node degree of 3-5 on
   average, computation time can be expressed as KNlogN, where K is a
   constant.  A rule of thumb which provided a reasonable estimator
   around the year 2000 used KNlogN as an estimator with K=1usec.  This
   was actually much faster than most SPF of that day, but today K might
   be smaller.  For K=1usec and N=1000, KNlogN would be about 10,000
   usec or 10 msec.










Villamizar             Expires September 15, 2014              [Page 20]

Internet-Draft     Routing and Load Balance Stability         March 2014


   From the prior example, it is clear that a network with close to
   1,000 nodes in a single OSPF area or IS-IS level was supportable at
   that time (circa 2000).  Some five to ten years earlier and with
   slower processors and some poorly written and therefore slower SPF
   code, a rule of thumb of 400 nodes was commonly cited as the upper
   limit for the size of an OSPF area.  The figure 400 was said to be
   emperically determined.

   Though processors are getting faster and SPF code gradually improves
   as well, there will always remain a practical limit to the number of
   nodes that can be supported in a given OSPF area or IS-IS level.  At
   some point convergence times become excessive.  In extreme cases
   protocol timers of other protocols such as BGP (configured to use
   faster timers than the BGP defaults) can expire due to excessive
   convergence times and cause further network disruption.

4.5.3.  External Routes

   Due to OSPF and IS-IS periodic route refresh and due to the way in
   which both protocols implement flooding, a very large number of
   external routes should not be carried by the IGP.  Implementations
   have been known to tolerate up to tens of thousands of external
   routes, but generally will become unstable with a BGP global routing
   feed or worse yet, multiple BGP feeds into the IGP.

4.5.4.  Forwarding Entry Installation

   Operational experience supports implementations installing forwarding
   entries for OSPF intra-area or ISIS intra-level routes first, then
   installing forwarding entries for any inter-area routes or routes
   from another level, and then external routes.  External routes may be
   carried in the IGP or carried by another protocol, most commonly BGP.
   BGP may carry a very large number of routes.

   By installing intra-area or intra-level forwarding entries first,
   accurate forwarding with an area or level is better ensured.  By
   giving the next priority to inter-as or inter-level routes, accurate
   forwarding within the infrastructure of the autonomous system (AS) is
   better ensured.  This ensures that protocols that depend on this
   infrastructure will continue to operate in time of high route change.

   Many forwarding engines allow either two stage, multistage, or fully
   recursive forwarding lookup.  Supporting multiple stages allows the
   forwarding for external routes which depend on internal routes to
   automatically change if only the internal route has changed.  This
   can dramatically decrease the number of forwarding entries that need
   to be changed and reduce effective convergence time




Villamizar             Expires September 15, 2014              [Page 21]

Internet-Draft     Routing and Load Balance Stability         March 2014


   In a particularly bad implementation if forwarding entry installation
   is slow, and if only single stage forwarding entries are supported,
   and if IGP routes are not prioritized, forwarding entries can remain
   incorrect long enough for IBGP peerings to go down (particularly if
   BGP timers are reduced from the BGP default values).  In such cases
   it is possible for the route change resulting from IBGP peerings to
   cause further churn.

   Not since the late 1990s has there been implementations with severe
   forwarding entry install time problems deployed in SP networks.  The
   problem is only worth mentioning in hopes that new vendors don't
   repeat these mistakes.

4.6.  LDP Impact

   The behavior of MPLS using LDP is very similar to the behavior of LS
   protocols.  Some DV protocols support loop suppression through
   maintaining a sequence of hops and this too is supported by LDP.  The
   LDP loop suppression avoids creating transient loops in LDP when
   transient loops occur in the LS routing protocol.  LDP therefore has
   better transient behavior than a LS routing protocol alone.  Services
   which are carried only by LDP, such as PW and VPN services, cannot
   loop.  When IP is carried by LDP and where there is a fallback to IP
   forwarding, transient loops will remain a problem for that IP
   traffic.

4.6.1.  IPFRR and LDPFRR Impact

   IPFRR and LDPFRR are described in Section 2.4.

   IPFRR provides protection against single faults.  IPFRR can include
   SRLG consideration.  Timer mechanisms help avoid microloops by
   delaying SPF based on distance from a fault, in that way sequencing
   the transition from protection to restoration [RFC6976].  If multiple
   faults occur IPFRR has the potential to be slightly
   counterproductive.  IPFRR may have a greater potential to create
   transient loops in the event of a multiple fault.  For example, LDP
   without LDPFRR can effectively suppress loops, but with tunnels and
   label stacking can create transient loops.  In addition any loops
   that form when a multiple fault occurs may be made to persist longer
   due to the timers introduced to suppress microloops.

   The protection against single faults offered by IPFRR and LDPFRR
   greatly outways any transient misbehavior that may occur when
   multiple faults occur.

4.7.  Delay, Jitter, and Loss Metrics




Villamizar             Expires September 15, 2014              [Page 22]

Internet-Draft     Routing and Load Balance Stability         March 2014


   Past use of delay link metrics in LS protocols has resulted in
   oscillations [DBSPRA] (see Section 3.5 for background).  The impact
   of path change due to link metric change has a major effect on
   traffic levels and therefore an effect on the link metric.  The
   interdependency of path, traffic levels, and link metric creates the
   potential for strong positive feedback and severe oscillations.

   Oscillations due to the use of link metrics based on delay, jitter,
   or loss in a LS protocols may be unavoidable.  This is discussed
   further in Section 7.

5.  MPLS-TE behavior

   MPLS traffic engineering (TE) forwarding is not decided hop by hop.
   MPLS TE Label Switched Path (LSP) path selection is determined
   entirely by the ingress LSR and nearly always include only strict
   hops in the explicit route object (ERO) which dicates the LSP path in
   RSVP-TE signaling [RFC3209].  The ingress LSR makes its path decision
   based on estimated bandwidth requirements of the LSP and link
   bandwidth availability advertised in the IGP.  MPLS TE traffic cannot
   flow on an LSP until the LSP is setup using RSVP-TE signaling.  At
   setup time, the estimated bandwidth requirements serves as a
   bandwidth reservation.  Due to the bandwidth accounting and the
   ability for ingress LSR to direct traffic, the behavior of MPLS TE is
   significantly different than that of LS protocols.

   MPLS TE makes use of resource accounting, avoiding attempts to
   utilize more capacity than any part of the topology can support.  In
   this way networks using RSVP-TE differs from those using LDP.  In
   addition, some MPLS TE implementations allow multiple LSP between a
   given ingress and egress, each potentially taking different paths,
   with load split across these LSP in proportion to their reserved
   bandwidth.  This is a form of static load balancing (see Section 6
   and Section 7).

5.1.  Transient Behavior

   MPLS TE LSP are created using an explicit path from ingress LSR to
   egress LSR.  There is no opportunity for differences in the LSDB to
   cause forwarding loops.  MPLS TE therefore has better transient
   response than LS protocols in that it is always loop free.

   MPLS FRR [RFC4090] provides protection against single faults.  Since
   it is initiated at the point where the fault is detected, known as
   the point of local repair (PLR), MPLS FRR is very fast, typically a
   few tens of milliseconds.  This provides an excellent response to
   single faults, with restoration time often only slightly longer than
   the fault detection time.



Villamizar             Expires September 15, 2014              [Page 23]

Internet-Draft     Routing and Load Balance Stability         March 2014


   When multiple faults occur, MPLS FRR may provide only partial
   protection, unable to protect LSP where the fault affects both the
   primary path and protection path.  In this case recovery relies
   entirely on restoration.  In either cases of single fault or multiple
   fault, restoration is used to return to a close to optimal traffic
   layout and restore protection to LSP for which protection is needed.
   Restoration times can be long if care is not taken in network design
   or when using implementations which have not been optimized for very
   fast restoration.

5.2.  Time Complexity of TE Restoration

   MPLS TE relies heavily on fast restoration, though fast restoration
   is not easily achieved.  Only the best implementations acheive
   restoration times in the low number of seconds for regions of MPLS TE
   deployment employing a full mesh of LSP with a hundred to a few
   hundred nodes.

   The reason that acheiving fast MPLS TE relies restoration is
   difficult is the computational complexity of TE restoration.  This is
   discussed in Section 3.2.

   Providers can subdivide their topology into multiple interconnected
   regions with a full mesh of LSP within each region, greatly improving
   convergence time, even if the topology is served by a single OSPF
   area or IS-IS level.  For example, an OSPF area of 1,000 nodes can be
   roughly subdivided into 10 regions of about 100 nodes, plus a core
   which interconnects the 10 regions using two or more LSR from each
   region and where the core may have well under 100 nodes.  Being ten
   times smaller, the convergence in each region can be expected to be
   more than a hundred times faster.  If edge-to-edge LSP are needed, as
   they often are, either LDP can be run over the RSVP-TE LSP or RSVP-TE
   hierarchy [RFC4206] can be used.

   Further compounding MPLS TE convergence time problems is the
   possibility of crankback.  Generally proposals in IETF which give an
   ingress LSR too little information to compute a path and rely too
   heavily on crankback are rejected for that reason.  However in a
   situation where a fault has occurred and multiple ingress LSR may try
   to make use of a next best resource, crankback is likely to occur if
   the resource is too limited to accommodate all of the LSP setups.

   The impact of crankback can be reduced by implementations which
   sequence LSP setups, setting up the larger LSP first and delaying
   slighly those LSP setup for which a viable path still exists.  The
   amount of crankback that occurs can be greatly reduced by these small
   delay between non-urgent LSP setup requests, prompt origination of
   LSDB changes particularly for links which are approaching capacity



Villamizar             Expires September 15, 2014              [Page 24]

Internet-Draft     Routing and Load Balance Stability         March 2014


   depletion, and quick reflooding.  This short list only touches on the
   types of optimizations that have been implemented.  A full treatment
   of the topic of MPLS-TE optimizations is beyond the scope of this
   document.

5.3.  Scaling issues

   The most common scaling problem in RSVP-TE is due to the CSPF
   computational complexity described in Section 3.2 and Section 5.2.

   A less common problem is overload of resources related to flooding
   similar to the scaling problems described in Section 4.5.1 but
   potentially made more severe by overly aggressive origination of LSDB
   changes, in the worst case sending an LSDB change for each LSP setup.

   The scaling limitations of RSVP-TE are dramatically impacted by
   network topology.  Providers can cause scaling problems through poor
   network design.  The result of poor network design will be highly
   excessive convergence times.  MPLS regions supporting a full mesh of
   LSP should generally not exceed 100-200 nodes, and keeping well under
   100 where possible is advisable.  Building a RSVP-TE full mesh over a
   lower layer full mesh overlay (for example, a layer-2 overlay) will
   cause scaling problems related to flooding.  If the overlay is large,
   CSPF related flooding problems will also occur.

5.4.  Delay, Jitter, and Loss Metrics

   The problems with link metrics based on delay, jitter, and loss are
   not as severe with MPLS TE as they are with link state (LS) protocols
   (see Section 3.5 and Section 4.7) if individual LSP are slow to move
   relative to readvertisement.  One or perhaps a few of many LSP may
   change a path to avoid a congested link, unlike LS protocols where a
   change in metric causes all traffic for which another path has become
   a lower cost to move.

   Where there is only a single LSP from any ingress to egress the
   problems of coarse granularity of traffic adjustment remains.  This
   is discussed further in Section 7.

6.  Multipath Load Balancing

   Multipath load balancing is briefly introduced in Section 2.6.  The
   minimal requirement for multipath load balancing is that packet order
   is maintained within any given microflow, as called for in the
   diffserv architecture [RFC2475].  MPLS-TP adds additional
   requirements that make it necessary to treat an MPLS-TP LSP as a
   microflow.  Carrying MPLS-TP LSPs within an MPLS LSP is discussed in
   [I-D.ietf-mpls-multipath-use].



Villamizar             Expires September 15, 2014              [Page 25]

Internet-Draft     Routing and Load Balance Stability         March 2014


6.1.  Existing Multipath Techniques

   Common practice today involves static multipath load balancing based
   on the assumption that the hash that is performed on incoming traffic
   will evenly distribute the traffic over the hash space.  For the case
   where a very large number of small microflows exists, this is a valid
   assumption and under 5% imbalance is common.  In static load
   balancing a hash is performed over the input traffic and the hash
   space is divided proportionally to the capacity of component links.

   [I-D.ietf-rtgwg-cl-requirement] uses the term flow rather than
   microflow to acknowledge that the hash used in load balancing quite
   often is not based on the defined characteristics of an IP microflow
   in [RFC2475].  Often only label stack information is used, which only
   works well if entropy labels [RFC6790] or PW flow labels [RFC6391]
   are used.  Using a hash does not generally identify single flows but
   rather identifies sets of flows which all hash to the same value.

   Static load balance has an advantage of being simple and also being
   inherently stable.  When a small number of very large flows exist,
   usually mixed with a very large number of very small flows, static
   load balancing will perform poorly.  These large flows may be due to
   large PW flows, high capacity IPsec gateways or other applications.
   Dynamic load balancing can accomodate these large flows but has the
   potential to be unstable.  Dynamic load balance stability is discused
   further in Section 6.2 and Section 7.

6.2.  Dynamic Load Balancing Feedback Loop

   In dynamic load balancing the same problem that exists for failed
   prior techniques [DBSPRA] but with significant differences.  In both
   cases a reaction to excessive load moves traffic to another part of
   the network and in both cases moving traffic away from a congested
   part of the network changes the parameters being meaasured.  There
   are differences between dynamic load balancing used in multipath and
   the load shifts in delay based link metrics in a LS protocol.  These
   differences make it possible to ensure stability in dynamic load
   balancing where stability may not be possible for delay based link
   metrics in a LS protocol.

   Briefly some differences between delay based link metrics in a LS
   protocol and dynamic load balancing are the following.

   1.  In a LS routing protocol a change to a link metric moves a lot of
       traffic.  In dynamic load balancing based on hashing, the amount
       of traffic that can be moved at a time can be as little as the
       traffic in one hash bucket.




Villamizar             Expires September 15, 2014              [Page 26]

Internet-Draft     Routing and Load Balance Stability         March 2014


   2.  In dynamic load balancing it is possible to determine the amount
       of traffic that will be moved before making an adjustment if
       counters are available in the load balancing hardware.

   3.  When dynamic load balancing is used solely in local decisions,
       the measurement and adjustment is made within the same router.
       No protocol exchange outside the router is required.

   Stability in dynamic load balancing when used in local decisions has
   been successfully deployed and is more easily provably stable than
   dynamic load balancing used in a distributed decision.  See
   Section 7.4.

   The topic of preventing feedback problems in dynamic load balancing
   is discussed further in Section 7.

7.  Summary of Factors in Load Balancing Stability

   Delay, jitter, loss, and link utilization based link metrics used in
   LS protocols Section 4 or used in MPLS TE Section 5 and multipath
   load balancing Section 6 are all forms of load balancing.  All of
   these forms of load balancing can be categorized in term of the
   following characteristics.

   static or dynamic
           Static load balancing is inherently stable.  Dynamic load
           balancing is not inherently stable but neither is it
           inherently unstable.

   balance granularity
           Dynamic load balancing granularity may be uncontrolled such
           as LS routing protocols or controlled.  A controlled load
           balancing granularity may be very coarse such as MPLS TE
           where entire LSP are moved, or fine such as hash based
           multipath techniques.

   adjustment certainty
           The amount of traffic moved in an adjustment to dynamic load
           balancing may be unknown or known.  For example, a link
           metric change in a LS protocol forces a load adjustment but
           the magnitude of the adjustment is unknown until after the
           adjustment is made.  When moving an LSP, the amount of
           traffic moved is known.  When an adjustment is made in a hash
           based technique, the amount of traffic moved is unknown if
           there is no measurement available to determine the amount of
           traffic falling within a group of hash values, but can be
           known if counters have been provided in the forwarding
           hardware.



Villamizar             Expires September 15, 2014              [Page 27]

Internet-Draft     Routing and Load Balance Stability         March 2014


   measurement compatibility
           The units of measurement and the units of load balance
           traffic adjustment may not be entirely compatible.  For
           example, if delay, jitter, and loss are measured at the point
           of congestion and the measurements at the point of traffic
           adjustment are offerred traffic load, it is hard to predict
           how movement of traffic measured in Mb/s or Gb/s will impact
           delay, jitter, or loss, particularly given the behavior of
           TCP congestion avoidance.

   decision locality
           Measurements may be made locally and traffic adjusments may
           be applied locally in which case no protocol extensions are
           needed.  Prior implementations of dynamic multipath operated
           in this way and were deployed successfully.  Measurements may
           also be made and advertised by a LS protocol such that
           decisions made by numerous other nodes affect the traffic at
           the links being measured.  Stability is more difficult to
           ensure with this type of distributed load balance decision.

   timing and traffic admustment
           Any dynamic load balancing technique has timing factors that
           can impact stability.  For example, the frequency of
           measurements and any filtering or hysteresis applied to
           measurement can be dictated by specifications or made
           advisory and modified in specific deployments.  The frequency
           of adjustments to load balance and changes to the adjustment
           amounts can also be dictated by specifications or made
           advisory and modified in specific deployments.

   Each of the above characteristics affect whether a specific load
   balancing technique is stable.  Since static load balancing is
   inherently stable, the remainder of this section focuses on dynamic
   load balancing.  The following subsections focuses on the
   characteristics listed above.  Section 7.6 discusses the interaction
   of these characteristics and discusses factors in protocol design
   which affect the tendency toward feedback problems and potential
   instability.

7.1.  Load Balance Granularity











Villamizar             Expires September 15, 2014              [Page 28]

Internet-Draft     Routing and Load Balance Stability         March 2014


   It is not possible to ensure stability in a technique which uses
   uncontrolled load balancing, such as LS routing protocols.  It may be
   possible to introduce timing changes or other changes which tend
   toward stability or reduce the impact somewhat (such as long timers).
   This seemed to be a goal of [DBSPRA].  With no way to ensure
   stability and very limited success in mitigating it, prior efforts to
   load balance in LS routing protocols were abandoned and future
   attempts strongly discouraged.

   Load balancing can generally acheive better results with a fine
   granularity than a coarse or very coarse granularity.  However
   techniques which use a coarse granularity can be made stable.  MPLS
   TE is inherently a load balancing technique which is somewhat static
   by virtue of the normally configured bandwidth reservations and with
   the coarse granularity of any given LSP bandwidth reservation.
   Stability is ensured if LSP remain on their current path even if
   better paths become available, subject to relatively long timers
   preferably with timer jitter (timer jitter is a random component
   included in the timer value).

   Hash based load balancing is a fine granularity load balancing with
   the granularity determined by the number of usable bits in the hash.
   For example, entropy labels [RFC6790] or PW flow labels [RFC6391] are
   limited to 20 bits of usable hash.  This allows dividing traffic into
   potentially a million groups of flows.  Most implementations today
   make use of tables which are much smaller than 20 bits and therefore
   provide a somewhat more coarse granularity.

7.2.  Load Balance Adjustment Certainty

   The amount of traffic that will be moved when an adjustment to
   dynamic load balancing is made may be either unknown or known prior
   to making the adjustment.

   Dynamic load balancing with an unknown traffic adjustment amount has
   been deployed successfully.  This success relies on there being a
   reasonably good correlation between the fraction of the hash space
   moved and the fraction of total traffic which falls into that hash
   space.

   When the traffic adjustment amount is unknown, very large flows
   occasionally cause a large change in traffic when only a small change
   was intended.  In such cases a single hash value may have a
   disproportionately large amount of traffic.  A pathologic behavior
   would be to continuously move a small range of hash values back and
   forth as a result of this.  This problem is avoidable.





Villamizar             Expires September 15, 2014              [Page 29]

Internet-Draft     Routing and Load Balance Stability         March 2014


   A far better situation exists when the traffic adjustment amount is
   known.  For MPLS TE LSP, the reservable bandwidth is known and most
   LSR can measure traffic on a per LSP basis.  For hash base multipath
   few existing implementations have measurements over the hash space
   for a particular multipath.  Making such measurement is not difficult
   but requires that the designer of the forwarding hardware had the
   forsight to provide counters in the appropriate places.

7.3.  Load Balance Measurement Compatibility

   Measurements of delay, jitter, and loss at a point of congestion
   cannot be readily translated into measurements of utilization at a
   point where an adjustment to traffic load balance is made.  For
   example, one or more large TCP microflows may be in the TCP slow
   start phase, where traffic flow increases exponentially until a loss
   or ECN notification is detected.  Prior to congestion, rising
   utilization can be measured directly.  After the onset of congestion,
   utilization hits 100% and can no longer be used as a measure of
   loading at the point of congestion.  The amount of rise in delay and
   jitter or the rate of rise in delay cannot easily be used to
   determine the amount traffic to move.  If loss is occurring, the
   amount of loss cannot easily be be used to determine the amount
   traffic to move but increasing loss may be correlated to the degree
   of overutilization.

   The measurement of delay and jitter is a measure of time, generally
   measured in milliseconds (msec).  Loss is measured as a fraction of
   traffic.  There is no reliable means to determine how much traffic
   should be moved from one link with a given queuing delay to another
   link with a different amount of queuing delay.  The same applies to
   two links with different amounts of loss or one link with queuing
   delay but no loss and another link with loss.

   Jitter is the most problematic of these measures.  Jitter is a
   function both of the link utilization and the burstiness of traffic
   sources.  A rise in jitter can occur at very low link utilizations.
   Jitter has never been used as the basis of load balancing and
   questions remain about the feasibility of doing so.

7.4.  Load Balancing Decision Locality

   Delay, jitter, and loss based link metrics used in LS protocols
   Section 4 or used in MPLS TE Section 5 and multipath load balancing
   Section 6 are all forms of load balancing.  Load balancing can be
   applied in two ways.






Villamizar             Expires September 15, 2014              [Page 30]

Internet-Draft     Routing and Load Balance Stability         March 2014


   1.  Some types of load balancing can be applied locally, using only
       measurement made at local interfaces to impact the proportion of
       traffic going out those local interfaces.

   2.  Load balancing can be applied in a distributed fashion where
       measurements are advertised, usually using additional information
       carried by a LS protocol and action is taken by many routers
       which make use of the resources for which measurements have been
       made.

   Some techniques are inherently local or inherently distributed.
   Delay, jitter, and loss based link metrics used in LS protocols
   Section 4 or used in MPLS TE Section 5 are inherently distributed.
   Ethernet Link Aggregation [IEEE-802.1AX] is a multipath technique
   which is inherently local.  Other forms of multipath such as MPLS
   Link Bundling [RFC4201] or any multipath technique which can make use
   of MPLS LSP as component links are neither inherently local or
   inherently distributed.

7.4.1.  Local Decision Load Balance

   It is easier to ensure stability when load balancing is entirely a
   local decision.  There is no protocol exchange required to do this
   sort of load balancing and there are no interoperability issues with
   neighbor routers.  Therefore there has been no need for IETF to
   dictate how this is done.

   Common practice has evolved which is well known within the industry,
   widely deployed, and very similar across implementations.  It has
   remained poorly documented in the IETF, being a local decision.

   Load balancing in Ethernet Link Aggregation [IEEE-802.1AX] is
   inherently a local decision.  As commonly practiced today, ECMP and
   Link Bundling also make use of local decisions.

7.4.2.  Distributed Load Balance

   Few attempts to make use of distributed load balancing have reached
   the IETF.  A prerequisite for distributed load balancing is a means
   to advertise the measurement results.  This prerequisite is met by
   [I-D.ietf-ospf-te-metric-extensions] and
   [I-D.ietf-isis-te-metric-extensions] but no means of applying these
   metrics has been defined.

   A very early attempt to make use of distributed load balancing was
   OSPF-OMP, ISIS-OMP, and MPLS-OMP [I-D.ietf-ospf-omp]
   [I-D.ietf-isis-omp] [I-D.villamizar-mpls-omp].  These drafts were
   abandonned due to provider lack of interest at the time.  This was



Villamizar             Expires September 15, 2014              [Page 31]

Internet-Draft     Routing and Load Balance Stability         March 2014


   partially due to the difficulty in providing mathematical proof of
   stability.

   There has been a renewed though cautious interest in distributed load
   balancing as evidenced by [I-D.ietf-rtgwg-cl-requirement] and related
   work.  Any use of [I-D.ietf-ospf-te-metric-extensions] and
   [I-D.ietf-isis-te-metric-extensions] are distributed load balancing
   techniques.  With these extensions with measurements based on delay
   may optionally include queuing delay.  Measurements of jitter are
   inherently queuing based.  Measurements of loss are also queuing
   based except for lossy links such as wireless links.  The authors of
   these documents have intentionally put stability concerns out of
   scope due to the difficulty in ensuring stability and even greater
   difficulty of mathematically proving stability.

7.5.  Load Balance Timing and Traffic Adjustment

   In any system with feedback careful attention to timing is needed to
   ensure that the system remains stable.  For load balancing this may
   be easier to accomplish where measurements and traffic adjustments
   are entirely local.

   The following timing considerations are related to making
   measurements and propogating measurement results:

   1.  Utilization, delay, jitter, and loss measurement interval for a
       direct measurement.

   2.  Type of filtering of direct measurements, if any, applied to
       produce a reportable measurement.

   3.  Amount of positive or negative change to reportable measurement
       needed before updating the reportable measurement.  This amount
       should be a function of the time since the last reported
       measurement and a function of the importance of making an
       adjustment at that measurement level.

   4.  The time necessary to propogate changes must be considered when
       trying to determine if a technique is stable.  This may be a
       small but non-zero time.

   Upon receiving an updated measurement result, any traffic adjustment
   should be deferred for a target time period with jitter applied to
   the timer.  The target time period should be a function of the
   following:

   1.  the percent contribution to the traffic at the link or path where
       traffic would be moved from,



Villamizar             Expires September 15, 2014              [Page 32]

Internet-Draft     Routing and Load Balance Stability         March 2014


   2.  the difference between the conditions at the link or path from
       which traffic would be moved from and the link or path where the
       traffic would be moved to,

   3.  any other factor that the protocol specification deems relevent
       or that the implementation deems relevant and that is allowable
       by the protocol.

   Upon receiving an updated measurement result, any traffic adjustment
   amount should be a function of the following:

   1.  the amount of traffic that needs to be moved from all traffic
       sources to acheive balance,

   2.  the amount of traffic that can be moved from all traffic sources
       to acheive balance,

   3.  the percent contribution to the traffic at the link or path where
       traffic would be moved from,

   4.  the difference between the conditions at the link or path from
       which traffic would be moved from and the link or path where the
       traffic would be moved to,

   5.  any other factor that the protocol specification deems relevent
       or that the implementation deems relevant and that is allowable
       by the protocol,

   6.  any prior traffic adjustment between the set of links or paths
       and the time that has elapsed since that adjustment.

   It is important to note that for a given update there may be multiple
   more lightly loaded links or paths for which traffic can be moved to
   and there may be multiple more heavily load links or paths for which
   traffic can be moved from.  The amount of traffic for which there is
   already a scheduled adjustment from or to a given link or path must
   be taken into consideration when computing the amount of a new
   traffic adjustment.

7.6.  Preventing Feedback Problems

   The goal of this section is not to produce an algorithm for traffic
   adjustments.  The goal is simply to point out factors that will
   either increase or decrease the tendency toward feedback problems and
   potential instability.

   The most severe problem is positive feedback at some frequency (or
   time period) causing a growing oscillation until saturation is



Villamizar             Expires September 15, 2014              [Page 33]

Internet-Draft     Routing and Load Balance Stability         March 2014


   reached.  Also very sever is an undamped oscillation, which may not
   grow but will continue forever.

   Oscillations often occur due to overshoot in making an adjustment.
   Overshoot can be caused in a distributed load balancing environment
   by multiple traffic sources each making an adjustment that
   individually would bring the network closer to balance but together
   cause overshoot.  This is why the contribution of a given traffic
   source to the total traffic must be considered.

   If there is a no tendency for overshoot at all, there is a tendency
   to have no oscillation at all.  If in the event of overshoot,
   subsequent adjustments are always less aggresive, then there will be
   a damped oscillation, known as a "ringing".  Attempts to respond
   quickly to a traffic imbalance may result in some overshoot.  Damping
   subsequent responses will reduce subsequent overshoot and reduce
   ringing.  Small overshoot and damped ringing is undesirable but might
   be acceptable if there was some other gain such as faster reaction
   time.

   What follows are some suggestions to help ensure stability and
   improve damping.

   1.  If there was a prior traffic adjustment between two links or
       paths, then the following type of bounds should be applied.

       a.  If the new traffic adjustment is in the same direction, the
           adjustment amount should be no more than the prior adjustment
           amount multiplied by some factor (greater than one) that is
           determined to by the protocol to help ensure stability.  For
           example, if the prior traffic adjustment was "recent" this
           factor might be two or less (a doubling in the next
           adjustment or less) and rise proportionately if this elapsed
           time was above the "recent" threshhold.

       b.  If the new traffic adjustment is in the opposite direction,
           the adjustment amount should be no more than the prior
           adjustment amount multiplied by some fraction that is
           determined to by the protocol to adequately damp any ringing.
           For example, if the prior traffic adjustment was "recent"
           this factor might be one half and rise proportionately with
           time since last adjustment if this elapsed time was above the
           "recent" threshhold, reaching unity at some very long time
           where prior history may be less relevant.

   2.  When an updated measurement result and a traffic adjustment is
       already scheduled, the computation of timer value to be used to
       defer taking action and computation of adjustment amount should



Villamizar             Expires September 15, 2014              [Page 34]

Internet-Draft     Routing and Load Balance Stability         March 2014


       be repeated.  The updated adjustment amount should be used.  The
       lesser of the remaining time on the prior timer and the new
       computed timer value should be used.

   3.  Analysis of traffic adjustment scheduled time and traffic
       adjustment amount should be repeated each time there is a change
       to either side of an already scheduled traffic adjustment.  When
       traffic adjustment is strictly local, this should not impose a
       high computational burden.  For distributed load balancing, the
       computational burden in a large mesh network will have to be
       considered when considering protocol timer default value guidance
       and safe bounds of timer values.

   It is very difficult to mathematically prove that the specifications
   for a particular dynamic load balancing method is stable.  So far, at
   best an argument can be made that specific provisions within a
   particular dynamic load balancing method strongly tend toward
   stability.  See Section 8 for further discussion.

8.  Open Area for Research

   So far there is no mathematic proof of stability for any given
   dynamic load balancing method.  Having such a proof would go a long
   way toward encouraging deployment of new methods.

   Absent a mathematic proof of stability, some dynamic load balancing
   methods for which a strong argument could be made for a tendency
   toward stability have been deployed.  Once deployed, successful
   deployments provide an existance proof that networks can be stable,
   but this existance proof falls short of ensuring that all deployments
   will always be stable.

   The discrte nature of traffic adjustment events rather than
   continuous small adjustment prevent the use of linear systems
   analysis.  This is unfortunate since stability proofs for linear
   systems (or proof of instability) are relatively simple.

   The mesh topology and analog nature of traffic adjustment amount make
   it difficult or impossible to apply techniques based on discrete
   states, such as Markov chain.

   Proof of stability may be easier for dynamic load balancing applied
   as a local decision rather than a distributed decision (see
   Section 7.4).  For a local decision, a set of inputs to a node are
   balanced over a set of outputs.  Absent routing loops, the output
   will not affect the inputs.





Villamizar             Expires September 15, 2014              [Page 35]

Internet-Draft     Routing and Load Balance Stability         March 2014


   For dynamic load balancing with distributed decisions a proof must be
   applicable to an arbitrary mesh topology.  This is a more difficult
   problem than the problem of local decision.

   There may be characteristics of the mesh that make a proof for
   dynamic load balancing with distributed decisions somewhat less
   difficult.  It may help that there exists a non-cylic relationship
   between move heavily loaded links and less loaded links, where given
   that all routers have the same information, traffic is only moved
   from more heavily loaded links to less loaded links.  Other
   characteristics, such as non-cyclic nature of the traffic flow, may
   help as well.

   The problems of providing stability proofs for existing and proposed
   mechanisms are clearly difficult and not yet solved.  As such this
   remains a good topic for researh.

9.  Acknowledgements

   While there was interest in OMP in the 1990s there was some
   discussion of stability on the OSPF, ISIS, and MPLS WG mailing lists
   as well as off list.  Particularly helpful was discussion with Matt
   Mathis (PSC) and Teunis Ott (Bellcore).

10.  IANA Considerations

   This memo includes no request to IANA.

11.  Security Considerations

   This document discusses both the known stability or instability of
   previously deployed routing and load balancing techniques and the
   unknown stability or instability of techniques which may be deployed
   in the future.  The difficulty in proving stability is also
   discussed.  Given the scope of the document, with no proposal for new
   protocols and only discussion of factors that impact stability, there
   are no security concerns raised by this document.

12.  Informative References

   [DBSPRA]   Bertsekas, D., "Dynamic Behavior of Shortest Path Routing
              Algorithms for Communication Networks", IEEE Trans. Auto.
              Control 1982, 1982.

   [I-D.ietf-isis-omp]
              Villamizar, C. and T. Li, "IS-IS Optimized Multipath
              (ISIS-OMP)", draft-ietf-isis-omp-01 (work in progress),
              February 1999.



Villamizar             Expires September 15, 2014              [Page 36]

Internet-Draft     Routing and Load Balance Stability         March 2014


   [I-D.ietf-isis-te-metric-extensions]
              Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas,
              A., Filsfils, C., and W. Wu, "IS-IS Traffic Engineering
              (TE) Metric Extensions", draft-ietf-isis-te-metric-
              extensions-01 (work in progress), October 2013.

   [I-D.ietf-mpls-forwarding]
              Villamizar, C., Kompella, K., Amante, S., Malis, A., and
              C. Pignataro, "MPLS Forwarding Compliance and Performance
              Requirements", draft-ietf-mpls-forwarding-09 (work in
              progress), March 2014.

   [I-D.ietf-mpls-multipath-use]
              Villamizar, C., "Use of Multipath with MPLS and MPLS-TP",
              draft-ietf-mpls-multipath-use-04 (work in progress),
              January 2014.

   [I-D.ietf-mpls-te-express-path]
              Atlas, A., Drake, J., Giacalone, S., Ward, D., Previdi,
              S., and C. Filsfils, "Performance-based Path Selection for
              Explicitly Routed LSPs using TE Metric Extensions", draft-
              ietf-mpls-te-express-path-00 (work in progress), October
              2013.

   [I-D.ietf-ospf-omp]
              Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)",
              draft-ietf-ospf-omp-02 (work in progress), February 1999.

   [I-D.ietf-ospf-te-metric-extensions]
              Giacalone, S., Ward, D., Drake, J., Atlas, A., and S.
              Previdi, "OSPF Traffic Engineering (TE) Metric
              Extensions", draft-ietf-ospf-te-metric-extensions-05 (work
              in progress), December 2013.

   [I-D.ietf-rtgwg-cl-requirement]
              Villamizar, C., McDysan, D., Ning, S., Malis, A., and L.
              Yong, "Requirements for Advanced Multipath in MPLS
              Networks", draft-ietf-rtgwg-cl-requirement-16 (work in
              progress), February 2014.

   [I-D.ietf-rtgwg-cl-use-cases]
              Ning, S., Malis, A., McDysan, D., Yong, L., and C.
              Villamizar, "Advanced Multipath Use Cases and Design
              Considerations", draft-ietf-rtgwg-cl-use-cases-05 (work in
              progress), November 2013.

   [I-D.ietf-rtgwg-mrt-frr-architecture]




Villamizar             Expires September 15, 2014              [Page 37]

Internet-Draft     Routing and Load Balance Stability         March 2014


              Atlas, A., Kebler, R., Envedi, G., Csaszar, A., Tantsura,
              J., Konstantynowicz, M., and R. White, "An Architecture
              for IP/LDP Fast-Reroute Using Maximally Redundant Trees",
              draft-ietf-rtgwg-mrt-frr-architecture-03 (work in
              progress), July 2013.

   [I-D.ietf-rtgwg-remote-lfa]
              Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S.
              Ning, "Remote LFA FRR", draft-ietf-rtgwg-remote-lfa-04
              (work in progress), November 2013.

   [I-D.villamizar-mpls-omp]
              Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)",
              draft-villamizar-mpls-omp-01 (work in progress), February
              1999.

   [IEEE-802.1AX]
              IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
              Standard for Local and Metropolitan Area Networks - Link
              Aggregation", 2006, <http://standards.ieee.org/getieee802/
              download/802.1AX-2008.pdf>.

   [ISO.10589.1992]
              International Organization for Standardization,
              "Intermediate system to intermediate system intra-domain-
              routing routine information exchange protocol for use in
              conjunction with the protocol for providing the
              connectionless-mode Network Service (ISO 8473)", ISO
              Standard 10589, 1992.

   [RFC1058]  Hedrick, C., "Routing Information Protocol", RFC 1058,
              June 1988.

   [RFC2080]  Malkin, G. and R. Minnear, "RIPng for IPv6", RFC 2080,
              January 1997.

   [RFC2328]  Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

   [RFC2439]  Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
              Flap Damping", RFC 2439, November 1998.

   [RFC2453]  Malkin, G., "RIP Version 2", STD 56, RFC 2453, November
              1998.

   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
              and W. Weiss, "An Architecture for Differentiated
              Services", RFC 2475, December 1998.




Villamizar             Expires September 15, 2014              [Page 38]

Internet-Draft     Routing and Load Balance Stability         March 2014


   [RFC2991]  Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
              Multicast Next-Hop Selection", RFC 2991, November 2000.

   [RFC2992]  Hopps, C., "Analysis of an Equal-Cost Multi-Path
              Algorithm", RFC 2992, November 2000.

   [RFC3209]  Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
              and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
              Tunnels", RFC 3209, December 2001.

   [RFC3270]  Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen,
              P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-
              Protocol Label Switching (MPLS) Support of Differentiated
              Services", RFC 3270, May 2002.

   [RFC3468]  Andersson, L. and G. Swallow, "The Multiprotocol Label
              Switching (MPLS) Working Group decision on MPLS signaling
              protocols", RFC 3468, February 2003.

   [RFC3630]  Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering
              (TE) Extensions to OSPF Version 2", RFC 3630, September
              2003.

   [RFC4090]  Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
              Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May
              2005.

   [RFC4124]  Le Faucheur, F., "Protocol Extensions for Support of
              Diffserv-aware MPLS Traffic Engineering", RFC 4124, June
              2005.

   [RFC4201]  Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
              in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.

   [RFC4206]  Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP)
              Hierarchy with Generalized Multi-Protocol Label Switching
              (GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005.

   [RFC4632]  Fuller, V. and T. Li, "Classless Inter-domain Routing
              (CIDR): The Internet Address Assignment and Aggregation
              Plan", BCP 122, RFC 4632, August 2006.

   [RFC5036]  Andersson, L., Minei, I., and B. Thomas, "LDP
              Specification", RFC 5036, October 2007.

   [RFC5250]  Berger, L., Bryskin, I., Zinin, A., and R. Coltun, "The
              OSPF Opaque LSA Option", RFC 5250, July 2008.




Villamizar             Expires September 15, 2014              [Page 39]

Internet-Draft     Routing and Load Balance Stability         March 2014


   [RFC5286]  Atlas, A. and A. Zinin, "Basic Specification for IP Fast
              Reroute: Loop-Free Alternates", RFC 5286, September 2008.

   [RFC5305]  Li, T. and H. Smit, "IS-IS Extensions for Traffic
              Engineering", RFC 5305, October 2008.

   [RFC5308]  Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, October
              2008.

   [RFC5340]  Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF
              for IPv6", RFC 5340, July 2008.

   [RFC5462]  Andersson, L. and R. Asati, "Multiprotocol Label Switching
              (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic
              Class" Field", RFC 5462, February 2009.

   [RFC5714]  Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC
              5714, January 2010.

   [RFC5715]  Shand, M. and S. Bryant, "A Framework for Loop-Free
              Convergence", RFC 5715, January 2010.

   [RFC6126]  Chroboczek, J., "The Babel Routing Protocol", RFC 6126,
              April 2011.

   [RFC6391]  Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan,
              J., and S. Amante, "Flow-Aware Transport of Pseudowires
              over an MPLS Packet Switched Network", RFC 6391, November
              2011.

   [RFC6790]  Kompella, K., Drake, J., Amante, S., Henderickx, W., and
              L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
              RFC 6790, November 2012.

   [RFC6976]  Shand, M., Bryant, S., Previdi, S., Filsfils, C.,
              Francois, P., and O. Bonaventure, "Framework for Loop-Free
              Convergence Using the Ordered Forwarding Information Base
              (oFIB) Approach", RFC 6976, July 2013.

   [RFC6981]  Bryant, S., Previdi, S., and M. Shand, "A Framework for IP
              and MPLS Fast Reroute Using Not-Via Addresses", RFC 6981,
              August 2013.









Villamizar             Expires September 15, 2014              [Page 40]

Internet-Draft     Routing and Load Balance Stability         March 2014


Author's Address

   Curtis Villamizar
   Outer Cape Cod Network Consulting, LLC

   Email: curtis@occnc.com













































Villamizar             Expires September 15, 2014              [Page 41]