<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="no" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc rfcedstyle="yes" ?>
<?rfc subcompact="no" ?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
  submissionType="IETF"
  docName="draft-ietf-rtgwg-bgp-pic-23"
  category="info"
  ipr="trust200902"
  consensus="true"
  version="3">
  <!-- xml2rfc v2v3 conversion 3.28.1 -->
  <!-- Generated by id2xml 1.5.2 on 2025-05-05T02:28:52Z -->
	<front>
    <title>BGP Prefix Independent Convergence</title>
    <seriesInfo name="Internet-Draft" value="draft-ietf-rtgwg-bgp-pic-23"/>
    <author initials="A." surname="Bashandy" fullname="Ahmed Bashandy" role="editor">
      <organization>HPE</organization>
      <address>
        <postal>
          <country>USA</country>
        </postal>
        <email>abashandy.ietf@gmail.com</email>
      </address>
    </author>
    <author initials="C." surname="Filsfils" fullname="Clarence Filsfils">
      <organization>Cisco Systems</organization>
      <address>
        <postal>
          <country>Brussels, Belgium</country>
        </postal>
        <email>cfilsfil@cisco.com</email>
      </address>
    </author>
    <author initials="P." surname="Mohapatra" fullname="Pradosh Mohapatra">
      <organization>Sproute Networks</organization>
      <address>
        <postal>
          <country>USA</country>
        </postal>
        <email>mpradosh@yahoo.com</email>
      </address>
    </author>
    <author fullname="Yingzhen Qu" initials="Y" surname="Qu" role="editor">
      <organization>Futurewei Technologies</organization>
      <address>
        <postal>
          <country>USA</country>
        </postal>
        <phone/>
        <email>yingzhen.ietf@gmail.com</email>
      </address>
    </author>
    <date/>
    <area>Routing</area>
    <workgroup>Routing Area</workgroup>    

    <abstract>
    <t>
      In a network comprising thousands of BGP peers exchanging millions of routes, it is desirable to restore traffic after failure in a time period that does not depend on the number of BGP prefixes.
    </t>
    <t>
      This document describes an architecture by which traffic can be re-routed to Equal Cost Multi-Path (ECMP) or pre-calculated backup paths in a timeframe that does not depend on the number of BGP prefixes. The objective is achieved through organizing the forwarding data structures in a hierarchical manner and sharing forwarding elements among the maximum possible number of routes. The described technique yields prefix independent convergence while ensuring incremental deployment, complete automation, and zero management and provisioning effort. It is noteworthy to mention that the benefits of BGP Prefix Independent Convergence (BGP-PIC) are hinged on the existence of more than one path whether as ECMP or primary-backup.
    </t>
    </abstract>
  </front>
  <middle>
    <section anchor="sect-1" numbered="true" toc="default">
      <name>Introduction</name>
      <t>
   BGP speakers exchange reachability information about prefixes
   <xref target="RFC4271" format="default"/>. For labeled address families, an edge router assigns
   local labels to prefixes and associates the local label with each
   advertised prefix using technologies such as L3VPN <xref target="RFC4364" format="default"/>, 6PE
   <xref target="RFC4798" format="default"/>, and Softwire <xref target="RFC5565" format="default"/> using BGP label unicast (BGP-LU)
   technique <xref target="RFC8277" format="default"/>. A BGP speaker then applies the path selection
   steps to choose the best route. In modern networks, it is not
   uncommon to have a prefix reachable via multiple edge routers.
   Multiple techniques have been described to allow for BGP to
   advertise more than one path for a given prefix <xref target="I-D.ietf-idr-best-external"/><xref target="RFC7911" format="default"/><xref target="RFC6774" format="default"/>, whether in the form of equal cost
   multipath or primary-backup. Another common and widely deployed
   scenario is L3VPN with multi-homed VPN sites with unique Route
   Distinguisher.</t>

      <t>
   This document describes a hierarchical and shared forwarding chain
   organization that allows traffic to be restored to a pre-calculated
   alternative equal cost path or backup path in a
   time period that does not depend on the number of BGP prefixes.
   The technique relies on internal router behavior that is
   completely transparent to the operator and can be incrementally
   deployed and enabled with zero operator intervention. In other
   words, once it is implemented and deployed on a router, nothing is
   required from the operator to make it work. It is noteworthy to
   mention that this document describes a Forwarding Information Base
   (FIB) architecture that can be implemented in both hardware and/or
   software, although we refer to hardware implementation in most of
   the cases because of the additional complexity and performance
   requirements associated with hardware implementations.</t>

   <t>
    It should be noted that although BGP is used 
    for routes calculation in this document, the underlying principles of 
    hierarchical forwarding, recursive resolution are not BGP specific. These
    mechanisms apply equally to routes computed by other routing protocols
    as well. The benefits of BGP-PIC are tied to the forwarding plane design
    rather than to the BGP protocol.
   </t>

      <section anchor="sect-1.1" numbered="true" toc="default">
        <name>Terminology</name>
        <t>
   This section defines the terms used in this document.</t>
        <ul spacing="normal">
          <li>
            <t>BGP-LU: BGP Label Unicast. Refers to using BGP to advertise the binding of an address
              prefix to one or more MPLS labels as in <xref target="RFC8277" format="default"/>.</t>
          </li>
          <li>
            <t>BGP prefix: A set of destination as an IP prefix with route learned through BGP as described in <xref target="RFC4271" format="default"/>.</t>
          </li>
          <li>
            <t>IGP prefix: A prefix that is learned via an Interior Gateway
      Protocol (IGP), such as OSPF and IS-IS.</t>
          </li>
          <li>
            <t>ePE: Egress PE <xref target="RFC4364" format="default"/>.</t>
          </li>
          <li>
            <t>iPE: Ingress PE <xref target="RFC4364" format="default"/>.</t>
          </li>
          <li>
            <t>Path: One specific candidate way to reach the destination in a route <xref target="RFC4271"/>. It's a sequence of nodes or links from the source to the destination. The nodes may not be directly
      connected.</t>
          </li>
          <li>
            <t>Recursive path: The next-hop of a path is an IP without the outgoing interface. it requires the router to look up the next-hop IP in the routing table (recursion) until it finds a directly connected or attached next-hop.
            </t>
          </li>
          <li>
            <t>Non-recursive path: A path consisting of the IP address
      of a directly connected next-hop and outgoing interface.</t>
          </li>
          <li>
            <t>Adjacency: The layer 2 encapsulation leading to the layer 3
      directly connected next-hop. An adjacency is identified by a
      next-hop and an outgoing interface</t>
          </li>
          <li>
            <t>Primary path: A recursive or non-recursive path that
      can be used for forwarding. A prefix can have more than one
      primary path.</t>
          </li>
          <li>
            <t>Backup path: A recursive or non-recursive path that can
      be used only after some or all primary paths become
      unreachable.</t>
          </li>
          <li>
            <t>Leaf: A container data structure for a prefix or local label.
      Alternatively, it is the data structure that contains prefix
      specific information.</t>
          </li>
          <li>
            <t>IP leaf: The leaf corresponding to an IPv4 or IPv6 prefix.</t>
          </li>
          <li>
            <t>Label leaf. The leaf corresponding to a locally allocated label
      such as the VPN label on an egress PE <xref target="RFC4364" format="default"/>.</t>
          </li>
          <li>
            <t>Pathlist: An array of paths used by one or more prefixes to
      forward traffic to destination(s) covered by an IP prefix. Each
      path in the pathlist carries its "path-index" that identifies
      its position in the array of paths. In general the value of the
      path-index in a path is the same as its position in the
      pathlist, except in the case outlined in <xref target="sect-5" format="default"/>.  For example
      the 3rd path may carry a path-index value of 1. A pathlist
      may contain a mix of primary and backup paths.</t>
          </li>
          <li>
            <t>OutLabel-List: Each labeled prefix is associated with an
      OutLabel-List. The OutLabel-List is an array of one or more
      outgoing labels and/or label actions where each label or label
      action has 1-to-1 correspondence to a path in the pathlist.
      Label actions are: push (add) the label as specified in
      <xref target="RFC3031" format="default"/>, pop (remove) the label as specified in <xref target="RFC3031" format="default"/>,
      swap (replace) the incoming label with the label in the
      OutLabel-List entry, or don't push anything at all in case of
      "unlabeled". The prefix may be an IGP or BGP prefix.</t>
          </li>
          <li>
            <t>Forwarding chain: It is a compound data structure consisting of
      multiple connected blocks that a forwarding engine walks one
      block at a time to forward the packet out of an interface.
      <xref target="sect-2.2" format="default"/> explains an example of a forwarding chain.
      Subsequent sections provide additional examples</t>
          </li>
          <li>
            <t>Dependency: An object X is said to be a dependent or child of
      object Y if there is at least one forwarding chain where the
      forwarding engine must visit the object X before visiting the
      object Y in order to forward a packet. Note that if object X is
      a child of object Y, then Y cannot be deleted unless object X
      is no longer a dependent/child of object Y.</t>
          </li>
          <li>
            <t>ASN: Autonomous System Number.</t>
          </li>
        </ul>
      </section>
    </section>
    <section anchor="sect-2" numbered="true" toc="default">
      <name>Overview</name>
      <t>
        The idea of BGP-PIC is based on the following two pillars to
        make convergence independent of the number of prefixes:
      </t>
      <ul spacing="normal">
        <li>
          <t>A shared hierarchical forwarding chain: Multiple prefixes
            reference common next-hop and path objects arranged in a
            hierarchy, so that changes to a single shared object affect
            all dependent prefixes simultaneously.</t>
        </li>
        <li>
          <t>A forwarding plane with multiple levels of indirection:
            The forwarding plane supports recursive resolution and pointer-based
            forwarding entries, allowing failover by updating a small
            number of shared objects rather than per-prefix state.
          </t>
        </li>
      </ul>
      <t>
       A forwarding plane with shared, hierarchical forwarding chains with maximal object reuse can reroute a large number of destinations by modifying only a small set of shared objects. This enables convergence in a time frame that does not depend on the number of affected destinations. For example, if an IGP prefix used to resolve a recursive next-hop changes, there is no need to update the potentially large number of BGP NLRIs that reference that next-hop.</t>
      <section anchor="sect-2.1" numbered="true" toc="default">
        <name>Dependency</name>
        <t>
   This section describes the required functionalities in the
   forwarding and control planes to support BGP-PIC as described in
   this document.</t>
        <section anchor="sect-2.1.1" numbered="true" toc="default">
          <name>Hierarchical Hardware FIB (Forwarding Information Base)</name>
          <t>
            BGP-PIC requires forwarding hardware that supports a hierarchical
            FIB. When a packet’s destination address matches a BGP prefix,
            the forwarding plane performs recursive lookups through successive
            levels of indirection until a resolving adjacency is reached.
            <xref target="sect-4"/> provides further details on the packet
            forwarding process.</t>

          <t>
            For platforms that support only a limited number of levels of
            indirection, a necessary trad-off approach is to flatten forwarding
            dependencies when programming BGP destinations into the hardware
            FIB. In this case, recursive resolution is resolved at programming
            time, potentially eliminating both BGP pathlist and IGP pathlist
            lookups during forwarding.</t>
          <t>
            While flattening reduces the number of memory accesses per packet,
            it comes at the cost of increased hardware FIB memory usage as
            flattening reduces sharing and results in greater duplication of
            forwarding entries, reduced  ECMP and BGP-PIC properties as 
            fewer pathlists are available.</t>
          <t>
            <xref target="sect-a"/> describes the flattening approach in more detail for hardware
            platforms with a limited number of supported indirection levels.
          </t>
        </section>

        <section anchor="sect-2.1.2" numbered="true" toc="default">
          <name>Availability of Precomputed Backup Paths</name>
          <t>
            BGP-PIC requires backup paths so that traffic can be immediately
            redirected in the forwarding plane when a next hop fails, without
            reprocessing individual BGP prefixes.</t>
          <t>
            Backup paths are calculated before any failure and installed in
            the FIB along with the primary path. Because many prefixes share
            the same next hop, a failure only requires switching that next
            hop to its back.</t>
          <t>
   The BGP distribution of multiple paths is available thanks to
   the following BGP mechanisms: Add-Path <xref target="RFC7911" format="default"/>, BGP Best-External
   [I.D.ietf-idr-best-external], diverse path <xref target="RFC6774" format="default"/>, and the
   frequent use in VPN deployments of different VPN RD's per PE.
   Another option to learn multiple BGP next-hops/paths is to receive
   IBGP paths from multiple BGP RRs <xref target="RFC9107" format="default"/> selecting a different
   path as best. It is noteworthy to mention that the availability of
   another BGP path does not mean that all failure scenarios can be
   covered by simply forwarding traffic to the available secondary
   path. The discussion of how to cover various failure scenarios is
   beyond the scope of this document.</t>
        </section>
      </section>
      <section anchor="sect-2.2" numbered="true" toc="default">
        <name>BGP-PIC Illustration</name>
        <t>
   To illustrate the two pillars above as well as the platform
   dependency, this document will use an example of a multihomed L3VPN
   prefix in a BGP-free core running LDP <xref target="RFC5036" format="default"/> or segment routing
   over MPLS forwarding plane <xref target="RFC8660" format="default"/>.</t>
        <artwork name="" type="" align="left" alt=""><![CDATA[
 +--------------------------------+
 |                                |
 |                               ePE2 (IGP-IP1 192.0.2.1, Loopback)
 |                                |  \
 |                                |   \
 |                                |    \
iPE                               |    CE....VRF "Blue", ASN 65000
 |                                |    /   (VPN-IP1 198.51.100.0/24)
 |                                |   /    (VPN-IP2 203.0.113.0/24)
 |   LDP/Segment-Routing Core     |  /
 |                               ePE1 (IGP-IP2 192.0.2.2, Loopback)
 |                                |
 +--------------------------------+
Figure 1: VPN prefix reachable via multiple PEs
]]></artwork>
        <t>
   Referring to Figure 1, suppose the iPE (the ingress PE) receives
   NLRIs for the VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs,
   ePE1 and ePE2 with next-hop BGP-NH1 (192.0.2.1) and BGP-NH2
   (192.0.2.2), respectively. Assume that ePE1 advertise the VPN labels
   VPN-L11 and VPN-L12 while ePE2 advertise the VPN labels VPN-L21 and
   VPN-L22 for VPN-IP1 and VPN-IP2, respectively. Suppose that BGP-NH1
   and BGP-NH2 are resolved via the IGP prefixes IGP-IP1 and IGP-IP2,
   where each happen to have 2 equal cost paths with IGP-NH1 and IGP-
   NH2 reachable via the interfaces I1 and I2 on iPE, respectively.
   Suppose that local labels (whether LDP <xref target="RFC5036" format="default"/> or segment routing
   <xref target="RFC8660" format="default"/>) on the downstream LSRs for IGP-IP1 are IGP-L11 and IGP-
   L12 while for IGP-IP2 are IGP-L21 and IGP-L22. As such, the pic-
   routing table at iPE is as follows:</t>
        <artwork name="" type="" align="left" alt=""><![CDATA[
       65000: 198.51.100.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L11
            via ePE2 (192.0.2.2), VPN Label: VPN-L21

       65000: 203.0.113.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L12
            via ePE2 (192.0.2.2), VPN Label: VPN-L22
]]></artwork>
        <ul empty="true" spacing="normal">
          <li>
            <dl newline="true" spacing="normal" indent="5">
              <dt>192.0.2.1/32 (ePE2)</dt>
              <dd>
	             via I1, Label: IGP-L11
               via I2, Label: IGP-L12
	</dd>
              <dt>192.0.2.2/32 (ePE1)</dt>
              <dd>
	             via I1, Label: IGP-L21
               via I2, Label: IGP-L22
	</dd>
            </dl>
          </li>
        </ul>
        <t>
   Based on the above pic-routing-table, a hierarchical forwarding
   chain can be constructed as shown in Figure 2.</t>
        <figure anchor="ure-shared-hierarchical-forwarding-chain-at-ipe">
          <name>Shared Hierarchical Forwarding Chain at iPE</name>
          <artwork name="" type="" align="left" alt=""><![CDATA[
IP Leaf:  pathlist:       IP Leaf:       pathlist:
--------  +-----------+   --------
          |           |                 +-------------+
          |BGP-NH1------->IGP-IP1 ----->|             |
VPN-IP1-->|           |       |         | IGP-NH1,I1----->adjacency1
  |       |BGP-NH2------->... |         |             |
  |       |           |       |         | IGP-NH2,I2----->adjacency2
  |       +-----------+       |         |             |
  |                           |         +-------------+
  |                           |
  v                           v
OutLabel-List:             OutLabel-List:
+--------+                 +--------+
|VPN-L11 |                 |IGP-L11 |
|VPN-L21 |                 |IGP-L12 |
+--------+                 +--------+
]]></artwork>
        </figure>
        <t>
   The forwarding chain depicted in Figure 2 illustrates the first
   pillar, which is sharing and hierarchy. It can be seen that the BGP
   pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
   reachable via ePE1 and ePE2. As such, it is possible to make changes
   to the pathlist without having to make changes to the NLRIs. For
   example, if BGP-NH2 becomes unreachable, there is no need to modify
   any of the possibly large number of NLRIs. Instead only the shared
   pathlist needs to be modified. Likewise, due to the hierarchical
   structure of the forwarding chain, it is possible to make
   modifications to the IGP routes without having to make any
   changes to the BGP NLRIs. For example, if the interface "I2" goes
   down, only the shared IGP pathlist needs to be updated, but none of
   the IGP prefixes sharing the IGP pathlist nor the BGP NLRIs using
   the IGP prefixes for resolution need to be modified.</t>
        <t>
   Figure 2 can also be used to illustrate the second BGP-PIC pillar.
   Having a deep forwarding chain such as the one illustrated in Figure
   2 requires a forwarding plane that is capable of accessing multiple
   levels of indirection in order to calculate the outgoing
   interface(s) and next-hops(s). While a deeper forwarding chain
   minimizes the re-convergence time on topology change, there will
   always exist platforms with limited capabilities and hence imposing
   a limit on the depth of the forwarding chain. <xref target="sect-5" format="default"/> describes
   how to gracefully trade off convergence speed with the number of
   hierarchical levels to support platforms with different
   capabilities.</t>
        <t>
   Another example using IPv6 addresses can be something like the
   following:</t>
   <artwork name="" type="" align="left" alt=""><![CDATA[
      65000: 2001:DB8:1::/48
          via ePE1 (65000: 2001:DB8:192::1), VPN Label: VPN6-L11
          via ePE2 (65000: 2001:DB8:192::2), VPN Label: VPN6-L21
	     
      65000: 2001:DB8:2:/48
          via ePE1 (65000: 2001:DB8:192::1), VPN Label: VPN6-L12
          via ePE2 (65000: 2001:DB8:192::2), VPN Label: VPN6-L22

      65000: 2001:DB8:192::1/128
          via Core, Label:    IGP6-L11
          via Core, Label:    IGP6-L12

      65000: 2001:DB8:192::2/128
          via Core, Label:    IGP6-L21
          via Core, Label:    IGP6-L22

    ]]></artwork>
        <t>
   The same hierarchical forwarding chain described can be constructed
   for IPv6 addresses/prefixes.</t>
      </section>
    </section>
    <section anchor="sect-3" numbered="true" toc="default">
      <name>Constructing the Shared Hierarchical Forwarding Chain</name>
      <t>
        This section describes how the forwarding chain is constructed using a hierarchical shared model, as introduced in
        <xref target="sect-2" format="default"/>. <xref target="sect-3.1"/>
        details the construction steps, and <xref target="sect-3.2"/> provides
        an illustrative example.</t>
      <section anchor="sect-3.1" numbered="true" toc="default">
        <name>Constructing the BGP-PIC Forwarding Chain</name>
        <t>
          The forwarding chain is built using the following steps:</t>
          <ol type="(%d)">
            <li>
              Prefix arrival in FIB. The prefix contains one or more outgoing paths. For certain labeled
              prefixes, such as L3VPN <xref target="RFC4364" format="default"/> prefixes, each path may be
              associated with an outgoing label and the prefix itself may be
              assigned a local label. The list of outgoing paths defines a pathlist.</li>
            <li>
              Pathlist lookup/creation. If such pathlist does not already, then the FIB manager
              (software or hardware entity responsible for managing the FIB)
              creates a new pathlist, otherwise the existing pathlist with the
              same list of paths exist (the pathlist may already exist because
              there is another pic-route that is already using the same list of
              paths) is used.</li>
            <li>
              Register prefix dependency. The BGP prefix is added as a dependent of the pathlist.</li>
            <li>
              Resolve pathlist entries. The forwarding chain is completed by resolving the
              paths of the pathlist. A BGP path usually consists of a
              next-hop. The next-hop is resolved by finding a matching prefix reachable
              via IGP or other protocols.</li>
          </ol>
        <t>
   The end result is a hierarchical shared forwarding chain where the
   BGP pathlist is shared by all BGP prefixes that use the same list of
   paths and the IGP prefix is shared by all pathlists that have a
   path resolving via that IGP prefix.</t>

      </section>
      <section anchor="sect-3.2" numbered="true" toc="default">
        <name>Example: Primary-Backup path Scenario</name>
        <t>
   Consider the egress PE ePE1 in the case of the multi-homed VPN
   prefixes shown in Figure 1. Suppose ePE1 determines that the primary
   path is the external path, while the backup path is the
   IBGP path to the other PE ePE2 with next-hop BGP-NH2. ePE1
   constructs the forwarding chain depicted in Figure 3. The figure
   shows only a single VPN prefix for simplicity. But all prefixes that
   are multihomed to ePE1 and ePE2 share the BGP pathlist.</t>
        <figure anchor="ure-vpn-prefix-forwarding-chain-with-eibgp-pic-paths-on-egress-pe">
          <name>VPN Prefix Forwarding Chain with eiBGP paths on egress PE</name>
          <artwork name="" type="" align="left" alt=""><![CDATA[
                 BGP OutLabel-List
                     +---------+
  VPN-L11            |Unlabeled|
(Label-leaf)---+---->+---------+
               |     | VPN-L21 |
               v     | (swap)  |
               |     +---------+
               |
               |
               |
               |
               |                    BGP pathlist
               |                   +--------------+
               |                   |              |
               |                   |    CE-NH   ------->(to the CE)
               |                   | path-index=0 |
  VPN-IP1 -----+------------------>+--------------+
(IP leaf)                          |   VPN-NH2    |
     |                             |   (backup) ------->IGP Leaf
     |                             | path-index=1 |   (Towards ePE2)
     |                             +--------------+
     |
     |           BGP OutLabel-List
     |              +---------+
     |              |Unlabeled|
     +------------->+---------+
                    | VPN-L21 |
                    | (push)  |
                    +---------+
]]></artwork>
        </figure>
        <t>
   The example depicted in Figure 3 differs from the example in Figure
   2 in two main aspects. First, as long as the primary path
   towards the CE (external path) can be used for forwarding, it
   will be the only path used for forwarding while the OutLabel-
   List contains both the unlabeled (primary path) and the VPN
   label (backup path) advertised by the backup path ePE2. The
   second aspect is presence of the label leaf corresponding to the VPN
   prefix. This label leaf is used to match VPN traffic arriving from
   the core. Note that the label leaf shares the pathlist with the IP
   prefix.</t>
      </section>
    </section>
    <section anchor="sect-4" numbered="true" toc="default">
      <name>Forwarding Behavior</name>
      <t>
   This section explains how the forwarding plane uses the hierarchical
   shared forwarding chain to forward a packet.</t>
      <t>
   When a packet arrives at a router, assume it matches a leaf. If not,
   the packet is handled according to the local policy (such as
   silently dropping the packet), which is beyond the scope of this
   document. A labeled packet matches a label leaf while an IP packet
   matches an IP leaf. The forwarding engines walks the forwarding
   chain starting from the leaf until the walk terminates on an
   adjacency. Thus when a packet arrives, the chain is walked as
   follows:</t>
      <ol spacing="normal" type="1"><li>
          <t>Lookup the leaf based on the destination address or the label at
      the top of the packet.</t>
        </li>
        <li>
          <t>Retrieve the parent pathlist of the leaf.</t>
        </li>
        <li>
          <t>Pick an outgoing path "Pi" from the list of resolved pic-
      paths in the pathlist. The method by which the outgoing path
      is picked is beyond the scope of this document (e.g. flow-
      preserving hash exploiting entropy within the MPLS stack and IP
      header). Let the "path-index" of the outgoing path "Pi" be
      "j". Remember that, as described in the definition of the term
      pathlist in <xref target="sect-1.1" format="default"/>, the path-index of a path may not
      always be identical the position of the path in the pathlist.</t>
        </li>
        <li>
          <t>If the prefix is labeled, use the "path-index" "j" to retrieve
      the label "Lj" stored position j in the OutLabel-List and apply
      the label action of the label on the packet (e.g. for VPN label
      on the ingress PE, the label action is "push"). As mentioned in
      <xref target="sect-1.1" format="default"/> the value of the "path-index" stored in the pic-
      path may not necessarily be the same value of the location of the
      path in the pathlist.</t>
        </li>
        <li>
          <t>If the chosen path "Pi" is recursive, move to its parent
      prefix and go to step 2.</t>
        </li>
        <li>
          <t>If the chosen path is non-recursive move to its parent
      adjacency.</t>
        </li>
        <li>
          <t>Encapsulate the packet in the layer string specified by the
      adjacency and send the packet out.</t>
        </li>
      </ol>
      <t>
   Let's apply the above forwarding steps to the forwarding chain
   depicted in Figure 2 in <xref target="sect-2" format="default"/>. Suppose a packet arrives at
   ingress PE iPE from an external neighbor. Assume the packet matches
   the VPN prefix VPN-IP1. While walking the forwarding chain, the
   forwarding engine applies a hashing algorithm to choose the path
   and the hashing at the BGP level chooses the first path in the
   BGP pathlist while the hashing at the IGP level yields the second
   path in the IGP pathlist. In that case, the packet will be sent
   out of interface I2 with the label stack "IGP-L12,VPN-L11".</t>
    </section>
    <section anchor="sect-5" numbered="true" toc="default">
      <name>Handling Platforms with Limited Levels of Hierarchy</name>
      <t>
   This section describes the construction of the forwarding chain if a
   platform does not support the number of recursion levels required to
   resolve the NLRIs. There are two main design objectives.</t>
      <ul spacing="normal">
        <li>
          <t>Being able to reduce the number of hierarchical levels from any
      arbitrary value to a smaller arbitrary value that can be
      supported by the forwarding engine.</t>
        </li>
        <li>
          <t>Minimal modifications to the forwarding algorithm due to such
      reduction.</t>
          <dl newline="false" spacing="normal">
            <dt/>
            <dd>
              <t><xref target="sect-a"/> provides details on how to handle limited hardware</t>
              <dl newline="true" spacing="compact">
                <dt>capabilities.</dt>
                <dd/>
              </dl>
            </dd>
          </dl>
        </li>
      </ul>
    </section>
    <section anchor="sect-6" numbered="true" toc="default">
      <name>Forwarding Chain Adjustment at a Failure</name>
      <t>
   The hierarchical and shared structure of the forwarding chain
   explained in the previous section allows modifying a small number of
   forwarding chain objects to re-route traffic to a pre-calculated
   equal-cost or backup path without the need to modify the
   possibly very large number of BGP prefixes. This section goes over
   various core and edge failure scenarios to illustrate how the FIB
   manager can utilize the forwarding chain structure to achieve BGP
   prefix independent convergence.</t>
      <section anchor="sect-6.1" numbered="true" toc="default">
        <name>BGP-PIC core</name>
        <t>
   This section describes the adjustments to the forwarding chain when
   a core link or node fails but the BGP next-hop remains reachable.</t>
        <t>
   There are two case: remote link failure and attached link failure.
   Node failures are treated as link failures.</t>
        <t>
   When a remote link or node fails, the IGP on the ingress PE receives
   an advertisement indicating a topology change so IGP re-converges to
   either find a new next-hop and/or outgoing interface or remove the
   path completely from the IGP prefix used to resolve BGP next-
   hops. IGP and/or LDP download the modified IGP leaves with modified
   outgoing labels for the labeled core.</t>
        <t>
   When a local link fails, FIB manager detects the failure almost
   immediately. The FIB manager marks the impacted path(s) as
   unusable so that only useable paths are used to forward packets.
   Hence only IGP pathlists with paths using the failed local link
   need to be modified. All other pathlists are not impacted. Note that
   in this particular case there is no need to backwalk (walk back the
   forwarding chain) to IGP leaves to adjust the OutLabel-Lists because
   FIB can rely on the path-index stored in the useable paths in
   the pathlist to pick the right label.</t>
        <t>
   It is noteworthy to mention that because FIB manager modifies the
   forwarding chain starting from the IGP leaves only. BGP pathlists
   and leaves are not modified. Hence traffic restoration occurs within
   the time frame of IGP convergence, and, for local link failure,
   assuming a backup path has been precomputed, within the
   timeframe of local detection (e.g. 50ms). Examples of solutions that
   can pre-compute backup paths are IP FRR <xref target="RFC5714" format="default"/> remote LFA
   <xref target="RFC7490" format="default"/>, TI-LFA <xref target="I-D.ietf-rtgwg-segment-routing-ti-lfa" format="default"/> and MRT
   <xref target="RFC7812" format="default"/> or EBGP path having a backup path [bonaventure].</t>
        <t>
   Let's apply the procedure mentioned in this subsection to the
   forwarding chain depicted in Figure 2. Suppose a remote link failure
   occurs and impacts the first ECMP IGP path to the remote BGP
   next-hop. Upon IGP convergence, the IGP pathlist used by the BGP
   next-hop is updated to reflect the new topology (one path
   instead of two) and the new forwarding state is immediately
   available to all dependent BGP prefixes. The same behavior would
   occur if the failure was local such as an interface going down. As
   soon as the IGP convergence is complete for the BGP next-hop IGP
   pic-route, all its BGP depending routes benefit from the new pic-
   path. In fact, upon local failure, if LFA protection is enabled for
   the IGP route to the BGP next-hop and a backup path was pre-
   computed and installed in the pathlist, upon the local interface
   failure, the LFA backup path is immediately activated (e.g. sub-
   50msec) and thus protection benefits all the depending BGP traffic
   through the hierarchical forwarding dependency between the routes.</t>
      </section>
      <section anchor="sect-6.2" numbered="true" toc="default">
        <name>BGP-PIC edge</name>
        <t>
   This section describes the adjustments to the forwarding chains as a
   result of edge node or edge link failure.</t>
        <section anchor="sect-6.2.1" numbered="true" toc="default">
          <name>Adjusting Forwarding Chain in egress node failure</name>
          <t>
   When a node fails, IGP on neighboring core nodes send updates
   indicating that the edge node is no longer a direct neighbor. If the
   node that failed is an egress node, such as ePE1 and ePE2 in Figure
   1, IGP running on an ingress node, such as iPE in Figure 1,
   converges and the realizes that the egress node is no longer
   reachable. As such IGP on the ingress node instructs FIB to remove
   the IP and label leaves corresponding to the failed edge node from
   FIB. So FIB manager on the ingress node performs the following
   steps:</t>
          <ul spacing="normal">
            <li>
              <t>FIB manager deletes the IGP leaf corresponding to the failed edge
      node</t>
            </li>
            <li>
              <t>FIB manager backwalks to all dependent BGP pathlists and marks
      that path using the deleted IGP leaf as unresolved</t>
            </li>
            <li>
              <t>Note that there is no need to modify the possibly large number of
      BGP leaves because each path in the pathlist carries its pic-
      path index and hence the correct outgoing label will be picked.
      Consider for example the forwarding chain depicted in Figure 2.
      If the 1st BGP path becomes unresolved, then the forwarding
      engine will only use the second path for forwarding. Yet the
      path-index of that single resolved path will still be 1 and
      hence the label VPN-L21 will be pushed.</t>
            </li>
          </ul>
        </section>
        <section anchor="sect-6.2.2" numbered="true" toc="default">
          <name>Adjusting Forwarding Chain on PE-CE link Failure</name>
          <t>
   Suppose the link between an edge router and its external peer fails.
   There are two scenarios (1) the edge node attached to the failed
   link performs next-hop self (where BGP advertises the IP address of
   its own loopback as next-hop) and (2) the edge node attached to the
   failure advertises the IP address of the failed link as the next-hop
   attribute to its IBGP peers.</t>
          <t>
   In the first case, the rest of IBGP peers will remain unaware of the
   link failure and will continue to forward traffic to the edge node
   until the edge node attached to the failed link withdraws the BGP
   prefixes. If the destination prefixes are multi-homed to another
   IBGP peer, say ePE2, then FIB manager on the edge router detecting
   the link failure applies the following steps to the forwarding chain
   (see Figure 3):</t>
          <ul spacing="normal">
            <li>
              <t>FIB manager backwalks to the BGP pathlists marks the path
      through the failed link to the external peer as unresolved.</t>
            </li>
            <li>
              <t>Hence traffic will be forwarded using the backup path towards
      ePE2.</t>
            </li>
            <li>
              <t>Labeled traffic arriving at the egress PE ePE1 matches the BGP
      label leaf.</t>
              <ul spacing="normal">
                <li>
                  <t>The OutLabel-List attached to the BGP label leaf already
          contains an entry corresponding to the backup path.</t>
                </li>
                <li>
                  <t>The label entry in OutLabel-List corresponding to the
          internal path to backup egress PE has a swap action to
          the label advertised by the backup egress PE.</t>
                </li>
                <li>
                  <t>For an arriving label packet (e.g. VPN), the top label is
          swapped with the label advertised by backup egress PE and the
          packet is sent towards that the backup egress PE.</t>
                </li>
              </ul>
            </li>
            <li>
              <t>Unlabeled traffic arriving at the egress PE ePE1 matches the BGP
      IP leaf</t>
              <ul spacing="normal">
                <li>
                  <t>The OutLabel-List attached to the BGP label leaf already
          contains an entry corresponding to the backup path.</t>
                </li>
                <li>
                  <t>The label entry in OutLabel-List corresponding to the
          internal path to backup egress PE has a push (instead of
          the swap action in for the labeled traffic case) action to
          the label advertised by the backup egress PE.</t>
                </li>
                <li>
                  <t>For an arriving IP packet, the label advertised by backup
          egress PE is pushed and the packet is sent towards that the
          backup egress PE.</t>
                </li>
              </ul>
            </li>
          </ul>
          <t>
   In the second case where the edge router uses the IP address of the
   failed link as the BGP next-hop, the edge router will still perform
   the previous steps. But, unlike the case of next-hop self, the IGP
   on the failed edge node informs the rest of the IBGP peers that the
   IP address of the failed link is no longer reachable. Hence the FIB
   manager on IBGP peers will delete the IGP leaf corresponding to the
   IP prefix of the failed link. The behavior of the IBGP peers will be
   identical to the case of edge node failure outlined in <xref target="sect-6.2.1" format="default"/>.</t>
          <t>
   It is noteworthy to mention that because the edge link failure is
   local to the edge router, sub-50 msec convergence can be achieved as
   described in [bonaventure].</t>
          <t>
   Let's try to apply the case of next-hop self to the forwarding chain
   depicted in Figure 3. After failure of the link between ePE1 and CE,
   the forwarding engine will route traffic arriving from the core
   towards VPN-NH2 with path-index=1. A packet arriving from the core
   will contain the label VPN-L11 at top. The label VPN-L11 is swapped
   with the label VPN-L21 and the packet is forwarded towards ePE2.</t>
        </section>
      </section>
      <section anchor="sect-6.3" numbered="true" toc="default">
        <name>Handling Failures for Flattened Forwarding Chains</name>
        <t>
   As explained in the in <xref target="sect-5" format="default"/> if the number of hierarchy levels
   of a platform cannot support the native number of hierarchy levels
   of a recursive forwarding chain, the instantiated forwarding chain
   is constructed by flattening two or more levels. Hence a 3-levels
   chain in Figure 5 is flattened into the 2-levels chain in Figure 6.</t>
        <t>
   While reducing the benefits of BGP-PIC, flattening one hierarchy
   into a shallower hierarchy does not always result in a complete loss
   of the benefits of the BGP-PIC. To illustrate this fact suppose
   ASBR12 is no longer reachable in domain 1. If the platform supports
   the full hierarchy depth, the forwarding chain is the one depicted
   in Figure 5 and hence the FIB manager needs to backwalk one level to
   the pathlist shared by "ePE1" and "ePE2" and adjust it. If the
   platform supports 2 levels of hierarchy, then a useable forwarding
   chain is the one depicted in Figure 6. In that case, if ASBR12 is no
   longer reachable, the FIB manager has to backwalk to the two
   flattened pathlists and updates both of them.</t>
        <t>
   The main observation is that the loss of convergence speed due to
   the loss of hierarchy depth depends on the structure of the
   forwarding chain itself. To illustrate this fact, let's take two
   extremes. Suppose the forwarding objects in level i+1 depend on the
   forwarding objects in level i. If every object on level i+1 depends
   on a separate object in level i, then flattening level i into level
   i+1 will not result in loss of convergence speed. Now let's take the
   other extreme. Suppose "n" objects in level i+1 depend on 1 object
   in level i. Now suppose FIB flattens level i into level i+1. If a
   topology change results in modifying the single object in level i,
   then FIB has to backwalk and modify "n" objects in the flattened
   level, thereby losing all the benefit of BGP-PIC. Experience shows
   that flattening forwarding chains usually results in moderate loss
   of BGP-PIC benefits. Further analysis is needed to corroborate and
   quantify this statement.</t>
      </section>
    </section>
    <section anchor="sect-7" numbered="true" toc="default">
      <name>Operational Properties</name>
      <section anchor="sect-7.1" numbered="true" toc="default">
        <name>Failure Coverage</name>
        <t>
          BGP-PIC provides prefix-independent convergence for failures that affect shared forwarding dependencies, such as the loss of a next hop, an IGP path, or an adjacency used by multiple BGP prefixes. By precomputing and installing alternate forwarding paths and leveraging shared hierarchical forwarding objects, BGP-PIC enables traffic to be rerouted without requiring per-prefix BGP best-path recomputation.</t>
        <t>
          Failures that do not impact shared forwarding objects, or that require BGP policy re-evaluation, may still rely on conventional BGP convergence behavior.</t>
      </section>
      
      <section numbered="true">
        <name>Convergence Characteristics</name>
          <t>
            The primary convergence characteristic of BGP-PIC is that forwarding convergence time is independent of the number of affected BGP prefixes. Upon a failure, only a limited number of shared forwarding objects need to be updated. Compared with traditional BGP convergence, where forwarding updates scale with the number of impacted prefixes and may result in prolonged convergence in large routing tables.
          </t>
      </section>

      <section anchor="sect-7.2" numbered="true" toc="default">
        <name>Fast Local Repair</name>
        <t>
          BGP-PIC enables forwarding repair that is independent of BGP control-plane convergence. Backup forwarding paths are computed and installed in advance, allowing the forwarding plane to redirect traffic immediately upon detection of a local failure.</t>
          <t>
   When the failure is local (a local IGP next-hop failure or a local
   EBGP next-hop failure), a pre-computed and pre-installed backup is
   activated by a local-protection mechanism that does not depend on
   the number of BGP destinations impacted by the failure. Sub-50msec
   is thus possible even if millions of BGP prefixes are impacted.</t>
        <t>
   When the failure is remote (a remote IGP failure not impacting the
   BGP next-hop or a remote BGP next-hop failure), an alternate pic-
   path is activated upon IGP convergence. All the impacted BGP
   destinations benefit from a working alternate path as soon as
   the IGP convergence occurs for their impacted BGP next-hop even if
   millions of BGP routes are impacted.</t>
        <t>
   <xref target="sect-c"/> puts the BGP-PIC benefits in perspective by providing
   some results using actual numbers.</t>
      </section>
      <section anchor="sect-7.3" numbered="true" toc="default">
        <name>Configuration Free</name>
        <t>
          The BGP-PIC solution depends on internal structures and procedures and does not require any configuration and operator involvement.</t>
      </section>
      <section anchor="sect-7.4" numbered="true" toc="default">
        <name>Incremental Deployment</name>
        <t>
   As soon as one router supports BGP-PIC solution, it is possible to
   benefit from all its benefits (most notably convergence that does
   not depend in the number of prefixes) without any requirement for
   other routers to support BGP-PIC.</t>
      </section>
    </section>
    <section anchor="sect-8" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
   The behavior described in this document is internal functionality
   to a router that result in significant improvement to convergence
   time as well as reduction in CPU and memory used by FIB while not
   showing change in basic routing and forwarding functionality. As
   such no additional security risk is introduced by using the
   mechanisms described in this document.</t>
    </section>
    <section anchor="sect-9" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
   This document has no IANA actions.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.4271.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.3031.xml"/>
      </references>

      <references>
        <name>Informative References</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-idr-best-external.xml"/>

        <reference anchor="RFC5565" target="https://www.rfc-editor.org/info/rfc5565" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5565.xml">
          <front>
            <title>Softwire Mesh Framework</title>
            <author fullname="J. Wu" initials="J." surname="Wu"/>
            <author fullname="Y. Cui" initials="Y." surname="Cui"/>
            <author fullname="C. Metz" initials="C." surname="Metz"/>
            <author fullname="E. Rosen" initials="E." surname="Rosen"/>
            <date month="June" year="2009"/>
            <abstract>
              <t>The Internet needs to be able to handle both IPv4 and IPv6 packets. However, it is expected that some constituent networks of the Internet will be "single-protocol" networks. One kind of single-protocol network can parse only IPv4 packets and can process only IPv4 routing information; another kind can parse only IPv6 packets and can process only IPv6 routing information. It is nevertheless required that either kind of single-protocol network be able to provide transit service for the "other" protocol. This is done by passing the "other kind" of routing information from one edge of the single-protocol network to the other, and by tunneling the "other kind" of data packet from one edge to the other. The tunnels are known as "softwires". This framework document explains how the routing information and the data packets of one protocol are passed through a single-protocol network of the other protocol. The document is careful to specify when this can be done with existing technology and when it requires the development of new or modified technology. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="5565"/>
          <seriesInfo name="DOI" value="10.17487/RFC5565"/>
        </reference>
        <reference anchor="RFC4364" target="https://www.rfc-editor.org/info/rfc4364" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4364.xml">
          <front>
            <title>BGP/MPLS IP Virtual Private Networks (VPNs)</title>
            <author fullname="E. Rosen" initials="E." surname="Rosen"/>
            <author fullname="Y. Rekhter" initials="Y." surname="Rekhter"/>
            <date month="February" year="2006"/>
            <abstract>
              <t>This document describes a method by which a Service Provider may use an IP backbone to provide IP Virtual Private Networks (VPNs) for its customers. This method uses a "peer model", in which the customers' edge routers (CE routers) send their routes to the Service Provider's edge routers (PE routers); there is no "overlay" visible to the customer's routing algorithm, and CE routers at different sites do not peer with each other. Data packets are tunneled through the backbone, so that the core routers do not need to know the VPN routes. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4364"/>
          <seriesInfo name="DOI" value="10.17487/RFC4364"/>
        </reference>
        <reference anchor="RFC4798" target="https://www.rfc-editor.org/info/rfc4798" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4798.xml">
          <front>
            <title>Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider Edge Routers (6PE)</title>
            <author fullname="J. De Clercq" initials="J." surname="De Clercq"/>
            <author fullname="D. Ooms" initials="D." surname="Ooms"/>
            <author fullname="S. Prevost" initials="S." surname="Prevost"/>
            <author fullname="F. Le Faucheur" initials="F." surname="Le Faucheur"/>
            <date month="February" year="2007"/>
            <abstract>
              <t>This document explains how to interconnect IPv6 islands over a Multiprotocol Label Switching (MPLS)-enabled IPv4 cloud. This approach relies on IPv6 Provider Edge routers (6PE), which are Dual Stack in order to connect to IPv6 islands and to the MPLS core, which is only required to run IPv4 MPLS. The 6PE routers exchange the IPv6 reachability information transparently over the core using the Multiprotocol Border Gateway Protocol (MP-BGP) over IPv4. In doing so, the BGP Next Hop field is used to convey the IPv4 address of the 6PE router so that dynamically established IPv4-signaled MPLS Label Switched Paths (LSPs) can be used without explicit tunnel configuration. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4798"/>
          <seriesInfo name="DOI" value="10.17487/RFC4798"/>
        </reference>

        <reference anchor="RFC5036" target="https://www.rfc-editor.org/info/rfc5036" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5036.xml">
          <front>
            <title>LDP Specification</title>
            <author fullname="L. Andersson" initials="L." role="editor" surname="Andersson"/>
            <author fullname="I. Minei" initials="I." role="editor" surname="Minei"/>
            <author fullname="B. Thomas" initials="B." role="editor" surname="Thomas"/>
            <date month="October" year="2007"/>
            <abstract>
              <t>The architecture for Multiprotocol Label Switching (MPLS) is described in RFC 3031. A fundamental concept in MPLS is that two Label Switching Routers (LSRs) must agree on the meaning of the labels used to forward traffic between and through them. This common understanding is achieved by using a set of procedures, called a label distribution protocol, by which one LSR informs another of label bindings it has made. This document defines a set of such procedures called LDP (for Label Distribution Protocol) by which LSRs distribute labels to support MPLS forwarding along normally routed paths. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="5036"/>
          <seriesInfo name="DOI" value="10.17487/RFC5036"/>
        </reference>
        <reference anchor="RFC7911" target="https://www.rfc-editor.org/info/rfc7911" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7911.xml">
          <front>
            <title>Advertisement of Multiple Paths in BGP</title>
            <author fullname="D. Walton" initials="D." surname="Walton"/>
            <author fullname="A. Retana" initials="A." surname="Retana"/>
            <author fullname="E. Chen" initials="E." surname="Chen"/>
            <author fullname="J. Scudder" initials="J." surname="Scudder"/>
            <date month="July" year="2016"/>
            <abstract>
              <t>This document defines a BGP extension that allows the advertisement of multiple paths for the same address prefix without the new paths implicitly replacing any previous ones. The essence of the extension is that each path is identified by a Path Identifier in addition to the address prefix.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7911"/>
          <seriesInfo name="DOI" value="10.17487/RFC7911"/>
        </reference>
        <reference anchor="RFC6774" target="https://www.rfc-editor.org/info/rfc6774" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6774.xml">
          <front>
            <title>Distribution of Diverse BGP Paths</title>
            <author fullname="R. Raszuk" initials="R." role="editor" surname="Raszuk"/>
            <author fullname="R. Fernando" initials="R." surname="Fernando"/>
            <author fullname="K. Patel" initials="K." surname="Patel"/>
            <author fullname="D. McPherson" initials="D." surname="McPherson"/>
            <author fullname="K. Kumaki" initials="K." surname="Kumaki"/>
            <date month="November" year="2012"/>
            <abstract>
              <t>The BGP4 protocol specifies the selection and propagation of a single best path for each prefix. As defined and widely deployed today, BGP has no mechanisms to distribute alternate paths that are not considered best path between its speakers. This behavior results in a number of disadvantages for new applications and services.</t>
              <t>The main objective of this document is to observe that by simply adding a new session between a route reflector and its client, the Nth best path can be distributed. This document also compares existing solutions and proposed ideas that enable distribution of more paths than just the best path.</t>
              <t>This proposal does not specify any changes to the BGP protocol definition. It does not require a software upgrade of provider edge (PE) routers acting as route reflector clients. This document is not an Internet Standards Track specification; it is published for informational purposes.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="6774"/>
          <seriesInfo name="DOI" value="10.17487/RFC6774"/>
        </reference>
        <reference anchor="I-D.pmohapat-idr-fast-conn-restore" target="https://datatracker.ietf.org/doc/html/draft-pmohapat-idr-fast-conn-restore-03" xml:base="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.pmohapat-idr-fast-conn-restore.xml">
          <front>
            <title>Fast Connectivity Restoration Using BGP Add-path</title>
            <author fullname="Prodosh Mohapatra" initials="P." surname="Mohapatra">
              <organization>Cisco Systems</organization>
            </author>
            <author fullname="Rex Fernando" initials="R." surname="Fernando">
              <organization>Cisco Systems</organization>
            </author>
            <author fullname="Clarence Filsfils" initials="C." surname="Filsfils">
              <organization>Cisco Systems</organization>
            </author>
            <author fullname="Robert Raszuk" initials="R." surname="Raszuk">
              <organization>NTT MCL Inc.</organization>
            </author>
            <date day="22" month="January" year="2013"/>
            <abstract>
              <t>A BGP route defines an association of an address prefix with an "exit point" from the current Autonomous System (AS). If the exit point becomes unreachable due to a failure, the route becomes invalid. This usually triggers an exchange of BGP control messages after which a new BGP route for the given prefix is installed. However, connectivity can be restored more quickly if the router maintains precomputed BGP backup routes. It can then switch to a backup route immediately upon learning that an exit point is unreachable, without needing to wait for the BGP control messages exchange. This document specifies the procedures to be used by BGP to maintain and distribute the precomputed backup routes. Maintaining these additional routes is also useful in promoting load balancing, performing maintenance without causing traffic loss, and in reducing churn in the BGP control plane.</t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-pmohapat-idr-fast-conn-restore-03"/>
        </reference>
        <reference anchor="I-D.ietf-rtgwg-segment-routing-ti-lfa" target="https://datatracker.ietf.org/doc/html/draft-ietf-rtgwg-segment-routing-ti-lfa-21" xml:base="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-segment-routing-ti-lfa.xml">
          <front>
            <title>Topology Independent Fast Reroute using Segment Routing</title>
            <author fullname="Ahmed Bashandy" initials="A." surname="Bashandy">
              <organization>Individual</organization>
            </author>
            <author fullname="Stephane Litkowski" initials="S." surname="Litkowski">
              <organization>Cisco Systems</organization>
            </author>
            <author fullname="Clarence Filsfils" initials="C." surname="Filsfils">
              <organization>Cisco Systems</organization>
            </author>
            <author fullname="Pierre Francois" initials="P." surname="Francois">
              <organization>INSA Lyon</organization>
            </author>
            <author fullname="Bruno Decraene" initials="B." surname="Decraene">
              <organization>Orange</organization>
            </author>
            <author fullname="Daniel Voyer" initials="D." surname="Voyer">
              <organization>Bell Canada</organization>
            </author>
            <date day="12" month="February" year="2025"/>
            <abstract>
              <t>This document presents Topology Independent Loop-free Alternate Fast Reroute (TI-LFA), aimed at providing protection of node and adjacency segments within the Segment Routing (SR) framework. This Fast Reroute (FRR) behavior builds on proven IP Fast Reroute concepts being LFAs, remote LFAs (RLFA), and remote LFAs with directed forwarding (DLFA). It extends these concepts to provide guaranteed coverage in any two-connected networks using a link-state IGP. An important aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair, reducing the operational need to control the tie-breaks among various FRR options.</t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-ietf-rtgwg-segment-routing-ti-lfa-21"/>
        </reference>
        <reference anchor="RFC5714" target="https://www.rfc-editor.org/info/rfc5714" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5714.xml">
          <front>
            <title>IP Fast Reroute Framework</title>
            <author fullname="M. Shand" initials="M." surname="Shand"/>
            <author fullname="S. Bryant" initials="S." surname="Bryant"/>
            <date month="January" year="2010"/>
            <abstract>
              <t>This document provides a framework for the development of IP fast- reroute mechanisms that provide protection against link or router failure by invoking locally determined repair paths. Unlike MPLS fast-reroute, the mechanisms are applicable to a network employing conventional IP routing and forwarding. This document is not an Internet Standards Track specification; it is published for informational purposes.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="5714"/>
          <seriesInfo name="DOI" value="10.17487/RFC5714"/>
        </reference>
        <reference anchor="RFC7490" target="https://www.rfc-editor.org/info/rfc7490" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7490.xml">
          <front>
            <title>Remote Loop-Free Alternate (LFA) Fast Reroute (FRR)</title>
            <author fullname="S. Bryant" initials="S." surname="Bryant"/>
            <author fullname="C. Filsfils" initials="C." surname="Filsfils"/>
            <author fullname="S. Previdi" initials="S." surname="Previdi"/>
            <author fullname="M. Shand" initials="M." surname="Shand"/>
            <author fullname="N. So" initials="N." surname="So"/>
            <date month="April" year="2015"/>
            <abstract>
              <t>This document describes an extension to the basic IP fast reroute mechanism, described in RFC 5286, that provides additional backup connectivity for point-to-point link failures when none can be provided by the basic mechanisms.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7490"/>
          <seriesInfo name="DOI" value="10.17487/RFC7490"/>
        </reference>
        <reference anchor="RFC7812" target="https://www.rfc-editor.org/info/rfc7812" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7812.xml">
          <front>
            <title>An Architecture for IP/LDP Fast Reroute Using Maximally Redundant Trees (MRT-FRR)</title>
            <author fullname="A. Atlas" initials="A." surname="Atlas"/>
            <author fullname="C. Bowers" initials="C." surname="Bowers"/>
            <author fullname="G. Enyedi" initials="G." surname="Enyedi"/>
            <date month="June" year="2016"/>
            <abstract>
              <t>This document defines the architecture for IP and LDP Fast Reroute using Maximally Redundant Trees (MRT-FRR). MRT-FRR is a technology that gives link-protection and node-protection with 100% coverage in any network topology that is still connected after the failure.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7812"/>
          <seriesInfo name="DOI" value="10.17487/RFC7812"/>
        </reference>
        <reference anchor="RFC8277" target="https://www.rfc-editor.org/info/rfc8277" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8277.xml">
          <front>
            <title>Using BGP to Bind MPLS Labels to Address Prefixes</title>
            <author fullname="E. Rosen" initials="E." surname="Rosen"/>
            <date month="October" year="2017"/>
            <abstract>
              <t>This document specifies a set of procedures for using BGP to advertise that a specified router has bound a specified MPLS label (or a specified sequence of MPLS labels organized as a contiguous part of a label stack) to a specified address prefix. This can be done by sending a BGP UPDATE message whose Network Layer Reachability Information field contains both the prefix and the MPLS label(s) and whose Next Hop field identifies the node at which said prefix is bound to said label(s). This document obsoletes RFC 3107.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="8277"/>
          <seriesInfo name="DOI" value="10.17487/RFC8277"/>
        </reference>
        <reference anchor="RFC8660" target="https://www.rfc-editor.org/info/rfc8660" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8660.xml">
          <front>
            <title>Segment Routing with the MPLS Data Plane</title>
            <author fullname="A. Bashandy" initials="A." role="editor" surname="Bashandy"/>
            <author fullname="C. Filsfils" initials="C." role="editor" surname="Filsfils"/>
            <author fullname="S. Previdi" initials="S." surname="Previdi"/>
            <author fullname="B. Decraene" initials="B." surname="Decraene"/>
            <author fullname="S. Litkowski" initials="S." surname="Litkowski"/>
            <author fullname="R. Shakir" initials="R." surname="Shakir"/>
            <date month="December" year="2019"/>
            <abstract>
              <t>Segment Routing (SR) leverages the source-routing paradigm. A node steers a packet through a controlled set of instructions, called segments, by prepending the packet with an SR header. In the MPLS data plane, the SR header is instantiated through a label stack. This document specifies the forwarding behavior to allow instantiating SR over the MPLS data plane (SR-MPLS).</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="8660"/>
          <seriesInfo name="DOI" value="10.17487/RFC8660"/>
        </reference>
        <reference anchor="RFC9107" target="https://www.rfc-editor.org/info/rfc9107" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9107.xml">
          <front>
            <title>BGP Optimal Route Reflection (BGP ORR)</title>
            <author fullname="R. Raszuk" initials="R." role="editor" surname="Raszuk"/>
            <author fullname="B. Decraene" initials="B." role="editor" surname="Decraene"/>
            <author fullname="C. Cassar" initials="C." surname="Cassar"/>
            <author fullname="E. Åman" initials="E." surname="Åman"/>
            <author fullname="K. Wang" initials="K." surname="Wang"/>
            <date month="August" year="2021"/>
            <abstract>
              <t>This document defines an extension to BGP route reflectors. On route reflectors, BGP route selection is modified in order to choose the best route from the standpoint of their clients, rather than from the standpoint of the route reflectors themselves. Depending on the scaling and precision requirements, route selection can be specific for one client, common for a set of clients, or common for all clients of a route reflector. This solution is particularly applicable in deployments using centralized route reflectors, where choosing the best route based on the route reflector's IGP location is suboptimal. This facilitates, for example, a "best exit point" policy ("hot potato routing").</t>
              <t>The solution relies upon all route reflectors learning all paths that are eligible for consideration. BGP route selection is performed in the route reflectors based on the IGP cost from configured locations in the link-state IGP.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="9107"/>
          <seriesInfo name="DOI" value="10.17487/RFC9107"/>
        </reference>
      </references>
    </references>
    <section anchor="sect-11" numbered="true" toc="default">
      <name>Acknowledgments</name>
      <t>
   Special thanks to Neeraj Malhotra and Yuri Tsier for the valuable
   help</t>
      <t>
   Special thanks to Bruno Decraene, Theresa Enghardt, Ines Robles,
   Luc Andre Burdet, and Alvaro Retana for the valuable comments</t>
      <t>
   This document was prepared using 2-Word-v2.0.template.dot.</t>
    </section>
    <section anchor="sect-a" numbered="true" toc="default">
      <name>Handling Platforms with Limited Levels of Hierarchy</name>
      <t>
   This section provides additional details on how to handle platforms
   with limited number of hierarchical levels.</t>
      <t>
   Let's consider a pathlist associated with the leaf "R1" consisting
   of the list of paths &lt;P1, P2,..., Pn&gt;. Assume that the leaf "R1"
   has an OutLabel-list &lt;L1, L2,..., Ln&gt;. Suppose the path Pi is a
   recursive path that resolves via a prefix represented by the
   leaf "R2". The leaf "R2" itself is pointing to a pathlist consisting
   of the paths &lt;Q1, Q2,..., Qm&gt;.</t>
      <t>
   If the platform supports the number of hierarchy levels of the
   forwarding chain, then a packet that uses the path "Pi" will be
   forwarded according to the steps in <xref target="sect-4" format="default"/>.</t>
      <t>
   Suppose the platform cannot support the number of hierarchy levels
   in the forwarding chain. FIB manager needs to reduce the number of
   hierarchy levels when programming the forwarding chain in the FIB.
   The idea of reducing the number of hierarchy levels is to "flatten"
   two chain levels into a single level. The "flattening" steps are as
   follows</t>
      <ol spacing="normal" type="1"><li>
          <t>FIB manager walks to the parent of "Pi", which is the leaf "R2".</t>
        </li>
        <li>
          <t>FIB manager extracts the parent pathlist of the leaf "R2", which
      is &lt;Q1, Q2,..., Qm&gt;.</t>
        </li>
        <li>
          <t>FIB manager also extracts the OutLabel-list of R2 associated with
      the leaf "R2". Remember that the OutLabel-list of R2 is &lt;L1, L2,..., Lm&gt;.</t>
        </li>
        <li>
          <t>FIB manager replaces the path "Pi", with the list of pic-
      paths &lt;Q1, Q2,..., Qm&gt;.</t>
        </li>
      </ol>
      <artwork name="" type="" align="left" alt=""><![CDATA[
5. Hence the path list <P1, P2,..., Pn> now becomes "<P1,
   P2,...,Pi-1, Q1, Q2,..., Qm, Pi+1, Pn>.
]]></artwork>
      <ol spacing="normal" type="1"><li>
          <t>The path-index stored inside the locations "Q1", "Q2", ..., "Qm"
      must all be "i" because the index "i" refers to the label "Li"
      associated with leaf "R1".</t>
        </li>
        <li>
          <t>FIB manager attaches an OutLabel-list with the new pathlist as
      follows: &lt;Unlabeled,..., Unlabeled, L1, L2,..., Lm, Unlabeled, ..., Unlabeled&gt;. The size of the label list associated with the
      flattened pathlist equals the size of the pathlist. Thus there is
      a 1-1 mapping between every path in the "flattened" pathlist
      and the OutLabel-list associated with it.</t>
        </li>
      </ol>
      <t>
   It is noteworthy to mention that the labels in the OutLabel-list
   associated with the "flattened" pathlist may be stored in the same
   memory location as the path itself to avoid additional memory
   access.</t>
      <t>
   The same steps can be applied to all paths in the pathlist &lt;P1, P2,..., Pn&gt; so that all paths are "flattened" thereby reducing
   the number of hierarchical levels by one. Note that that
   "flattening" a pathlist pulls in all paths of the parent pic-
   paths, a desired feature to utilize all paths at all levels. A
   platform that has a limit on the number of paths in a pathlist
   for any given leaf may choose to reduce the number paths using
   methods that are beyond the scope of this document.</t>
      <t>
   The steps can be recursively applied to other paths at the same
   levels or other levels to recursively reduce the number of
   hierarchical levels to an arbitrary value so as to accommodate the
   capability of the forwarding engine.</t>
      <t>
   Because a flattened pathlist may have an associated OutLabel-list
   the forwarding behavior has to be slightly modified. The
   modification is done by adding the following step right after step 4
   in <xref target="sect-4" format="default"/>.</t>
      <ol spacing="normal" type="1"><li>
          <t>If there is an OutLabel-list associated with the pathlist, then
      if the path "Pi" is chosen by the hashing algorithm, retrieve
      the label at location "i" in that OutLabel-list and apply the
      label action of that label on the packet.</t>
        </li>
      </ol>
      <t>
   The steps in this Section to are applied to an example in the next
   Section.</t>
    </section>
    <section anchor="sect-b" numbered="true" toc="default">
      <name>Example: Flattening a forwarding chain.</name>
      <t>
   This example uses a case of inter-AS option C <xref target="RFC4364" format="default"/> where there
   are 3 levels of hierarchy. Figure 4 illustrates the sample topology.
   The Autonomous System Border Routers (ASBRs) on the ingress domain
   (Domain 1) use BGP to advertise the core routers (ASBRs and ePEs) of
   the egress domain (Domain 2) to the iPE. The end result is that the
   ingress PE (iPE) has 2 levels of recursion for the VPN prefixes VPN-
   IP1 and VPN-IP2.</t>
      <figure anchor="ure-sample-3-level-hierarchy-topology">
        <name>Sample 3-level hierarchy topology</name>
        <artwork name="" type="" align="left" alt=""><![CDATA[
    Domain 1                 Domain 2
+-------------+          +-------------+
|             |          |             |
| LDP/SR Core |          | LDP/SR core |
|             |          |             |
|     (192.0.2.4)        |             |
|         ASBR11-------ASBR21........ePE1(192.0.2.1)
|             | \      / |   .      .  |\
|             |  \    /  |    .    .   | \
|             |   \  /   |     .  .    |  \
|             |    \/    |      ..     |   \VPN-IP1(198.51.100.0/24)
|             |    /\    |      . .    |   /VRF "Blue" ASN: 65000
|             |   /  \   |     .   .   |  /
|             |  /    \  |    .     .  | /
|             | /      \ |   .       . |/
iPE        ASBR12-------ASBR22........ePE2 (192.0.2.2)
|     (192.0.2.5)        |             |\
|             |          |             | \
|             |          |             |  \
|             |          |             |   \VRF "Blue" ASN: 65000
|             |          |             |   /VPN-IP2(203.0.113.0/24)
|             |          |             |  /
|             |          |             | /
|             |          |             |/
|         ASBR13-------ASBR23........ePE3(192.0.2.3)
|     (192.0.2.6)        |             |
|             |          |             |
|             |          |             |
+-------------+          +-------------+
 <===========  <=========  <============
Advertise ePEx  Advertise   Redistribute
Using IBGP-LU   ePEx Using  ePEx routes
                 EBGP-LU      into BGP
]]></artwork>
      </figure>
      <t>
   The following assumptions about connectivity are made:</t>
      <ul spacing="normal">
        <li>
          <t>In "Domain 2", both ASBR21 and ASBR22 can reach both ePE1 and
      ePE2 using the same metric.</t>
        </li>
        <li>
          <t>In "Domain 2", only ASBR23 can reach ePE3.</t>
        </li>
        <li>
          <t>In "Domain 1", iPE (the ingress PE) can reach ASBR11, ASBR12, and
      ASBR13 via IGP using the same metric.</t>
        </li>
      </ul>
      <t>
   The following assumptions are made about the labels:</t>
      <ul spacing="normal">
        <li>
          <t>The VPN labels advertised by ePE1 and ePE2 for prefix VPN-IP1 are
      VPN-L11 and VPN-L21, respectively.</t>
        </li>
        <li>
          <t>The VPN labels advertised by ePE2 and ePE3 for prefix VPN-IP2 are
      VPN-L22 and VPN-L32, respectively.</t>
        </li>
        <li>
          <t>The labels advertised by ASBR11 to iPE using BGP-LU for the
      egress PEs ePE1 and ePE2 are LASBR111(ePE1) and LASBR112(ePE2),
      respectively.</t>
        </li>
        <li>
          <t>The labels advertised by ASBR12 to iPE using BGP-LU for the
      egress PEs ePE1 and ePE2 are LASBR121(ePE1) and LASBR122(ePE2),
      respectively.</t>
        </li>
        <li>
          <t>The label advertised by ASBR13 to iPE using BGP-LU for the egress
      PE ePE3 is LASBR13(ePE3).</t>
        </li>
        <li>
          <t>The IGP labels advertised by the next hops directly connected to
      iPE towards ASBR11, ASBR12, and ASBR13 in the core of domain 1
      are IGP-L11, IGP-L12, and IGP-L13, respectively.</t>
        </li>
        <li>
          <t>Both the routers ASBR21 and ASBR22 of Domain 2 advertise the same
      label LASBR21 and LASBR22 for the egress PEs ePE1 and ePE2,
      respectively, to the routers ASBR11 and ASBR22 of Domain 1.</t>
        </li>
        <li>
          <t>The router ASBR23 of Domain 2 advertises the label LASBR23 for
      the egress PE ePE3 to the router ASBR13 of Domain 1.</t>
        </li>
      </ul>
      <t>
   Based on these connectivity assumptions and the topology in Figure
   4, the routing table on iPE is</t>
      <artwork name="" type="" align="left" alt=""><![CDATA[
       65000: 198.51.100.0/24
            via ePE1 (192.0.2.1), VPN Label: VPN-L11
            via ePE2 (192.0.2.2), VPN Label: VPN-L21
       65000: 203.0.113.0/24
            via ePE2 (192.0.2.2), VPN Label: VPN-L22
            via ePE3 (192.0.2.3), VPN Label: VPN-L32
]]></artwork>
      <ul empty="true" spacing="normal">
        <li>
          <dl newline="true" spacing="normal" indent="3">
            <dt>192.0.2.1/32 (ePE1)</dt>
            <dd>
	          via ASBR11, Label: LASBR111(ePE1)
            via ASBR12, Label: LASBR121(ePE1)
	          </dd>
            <dt>192.0.2.2/32 (ePE2)</dt>
            <dd>
	          via ASBR11, Label: LASBR112(ePE2)
            via ASBR12, Label: LASBR122(ePE2)
	          </dd>
            <dt>192.0.2.3/32 (ePE3)</dt>
            <dd>
	          Via ASBR13, Label: LASBR13(ePE3)
	          </dd>
          </dl>
        </li>
      </ul>
      <artwork name="" type="" align="left" alt=""><![CDATA[
       192.0.2.4/32 (ASBR11)
            via Core, Label:    IGP-L11
       192.0.2.5/32 (ASBR12)
            via Core, Label:    IGP-L12
       192.0.2.6/32 (ASBR13)
            via Core, Label:    IGP-L13
]]></artwork>
      <t>
   The diagram in Figure 5 illustrates the forwarding chain in iPE
   assuming that the forwarding hardware in iPE supports 3 levels of
   hierarchy. The leaves corresponding to the ASBRs on domain 1
   (ASBR11, ASBR12, and ASBR13) are at the bottom of the hierarchy.
   There are few important points:</t>
      <ul spacing="normal">
        <li>
          <t>Because the hardware supports the required depth of hierarchy,
      the sizes of a pathlist equal the size of the label list
      associated with the leaves using this pathlist.</t>
        </li>
        <li>
          <t>The path-index inside the pathlist entry indicates the label that
      will be picked from the OutLabel-List associated with the child
      leaf if that path is chosen by the forwarding engine hashing
      function.</t>
        </li>
      </ul>
      <figure anchor="ure-forwarding-chain-for-hardware-supporting-3-levels">
        <name>Forwarding Chain for hardware supporting 3 Levels</name>
        <artwork name="" type="" align="left" alt=""><![CDATA[
OutLabel-List                                      OutLabel-List
  For VPN-IP1                                         For VPN-IP2
+------------+    +--------+           +-------+   +------------+
|  VPN-L11   |<---| VPN-IP1|           |VPN-IP2|-->|  VPN-L22   |
+------------+    +---+----+           +---+---+   +------------+
|  VPN-L21   |        |                    |       |  VPN-L32   |
+------------+        |                    |       +------------+
                      |                    |
                      V                    V
                 +---+---+            +---+---+
                 | 0 | 1 |            | 0 | 1 |
                 +-|-+-\-+            +-/-+-\-+
                   |    \              /     \
                   |     \            /       \
                   |      \          /         \
                   |       \        /           \
                   v        \      /             \
              +-----+       +-----+             +-----+
         +----+ ePE1|       |ePE2 +-----+       | ePE3+-----+
         |    +--+--+       +-----+     |       +--+--+     |
         v       |            /         v          |        v
+--------------+ |           /   +--------------+  | +-------------+
|LASBR111(ePE1)| |          /    |LASBR112(ePE2)|  | |LASBR13(ePE3)|
+--------------+ |         /     +--------------+  | +-------------+
|LASBR121(ePE1)| |        /      |LASBR122(ePE2)|  | OutLabel-List
+--------------+ |       /       +--------------+  |    For ePE3
OutLabel-List    |      /        OutLabel-List     |
    For ePE1     |     /           For ePE2        |
                 |    /                            |
                 |   /                             |
                 |  /                              |
                 v v                               v
             +---+---+  Shared pathlist          +---+  pathlist
             | 0 | 1 | For ePE1 and ePE2         | 0 |  For ePE3
             +-|-+-\-+                           +-|-+
               |    \                              |
               |     \                             |
               |      \                            |
               |       \                           |
               v        v                          v
            +------+    +------+               +------+
        +---+ASBR11|    |ASBR12+--+            |ASBR13+---+
        |   +------+    +------+  |            +------+   |
        v                         v                       v
   +-------+                  +-------+              +-------+
   |IGP-L11|                  |IGP-L12|              |IGP-L13|
   +-------+                  +-------+              +-------+
]]></artwork>
      </figure>
      <t>
   Now suppose the hardware on iPE (the ingress PE) supports 2 levels
   of hierarchy only. In that case, the 3-levels forwarding chain in
   Figure 5 needs to be "flattened" into 2 levels only.</t>
      <figure anchor="ure-flattening-3-levels-to-2-levels-of-hierarchy-on-ipe">
        <name>Flattening 3 levels to 2 levels of Hierarchy on iPE</name>
        <artwork name="" type="" align="left" alt=""><![CDATA[
OutLabel-List                                  OutLabel-List
  For VPN-IP1                                    For VPN-IP2
+------------+    +-------+      +-------+     +------------+
|  VPN-L11   |<---|VPN-IP1|      | VPN-IP2|--->|  VPN-L22   |
+------------+    +---+---+      +---+---+     +------------+
|  VPN-L21   |        |              |         |  VPN-L32   |
+------------+        |              |         +------------+
                      |              |
                      |              |
                      |              |
       Flattened      |              |  Flattened
       pathlist       V              V   pathlist
                 +===+===+        +===+===+===+     +==============+
        +--------+ 0 | 1 |        | 0 | 0 | 1 +---->|LASBR112(ePE2)|
        |        +=|=+=\=+        +=/=+=/=+=\=+     +==============+
        v          |    \          /   /     \      |LASBR122(ePE2)|
 +==============+  |     \  +-----+   /       \     +==============+
 |LASBR111(ePE1)|  |      \/         /         \    |LASBR13(ePE3) |
 +==============+  |      /\        /           \   +==============+
 |LASBR121(ePE1)|  |     /  \      /             \
 +==============+  |    /    \    /               \
                   |   /      \  /                 \
                   |  /       +  +                  \
                   |  +       |  |                   \
                   |  |       |  |                    \
                   v  v       v  v                     v
                 +------+    +------+              +------+
            +----|ASBR11|    |ASBR12+---+          |ASBR13+---+
            |    +------+    +------+   |          +------+   |
            v                           v                     v
        +-------+                  +-------+              +-------+
        |IGP-L11|                  |IGP-L12|              |IGP-L13|
        +-------+                  +-------+              +-------+
]]></artwork>
      </figure>
      <t>
   Figure 6 represents one way to "flatten" a 3 levels hierarchy into
   two levels. There are a few important points:</t>
      <ul spacing="normal">
        <li>
          <t>As mentioned in <xref target="sect-a"/>, a flattened pathlist may have
      label lists associated with them. The size of the label list
      associated with a flattened pathlist equals the size of the
      pathlist. Hence it is possible that an implementation includes
      these label lists in the flattened pathlist itself.</t>
        </li>
        <li>
          <t>Again as mentioned in <xref target="sect-a"/>, the size of a flattened
      pathlist may not be equal to the size of the OutLabel-lists of
      leaves using the flattened pathlist. So the indices inside a
      flattened pathlist still indicate the label index in the
      OutLabel-Lists of the leaves using that pathlist. Because the
      size of the flattened pathlist may be different from the size of
      the OutLabel-lists of the leaves, the indices may be repeated.</t>
        </li>
        <li>
          <t>Let's take a look at the flattened pathlist used by the prefix
      "VPN-IP2". The pathlist associated with the prefix "VPN-IP2" has
      three entries.</t>
          <ul spacing="normal">
            <li>
              <t>The first and second entry have index "0". This is because
         both entries correspond to ePE2. Thus when hashing performed
         by the forwarding engine results in using the first or the
         second entry in the pathlist, the forwarding engine will pick
         the correct VPN label "VPN-L22", which is the label advertised
         by ePE2 for the prefix "VPN-IP2".</t>
            </li>
            <li>
              <t>The third entry has the index "1". This is because the third
         entry corresponds to ePE3. Thus when the hashing is performed
         by the forwarding engine results in using the third entry in
         the flattened pathlist, the forwarding engine will pick the
         correct VPN label "VPN-L32", which is the label advertised by
         "ePE3" for the prefix "VPN-IP2".</t>
            </li>
          </ul>
        </li>
      </ul>
      <t>
   Now let's try and apply the forwarding steps in <xref target="sect-4" format="default"/> together
   with the additional step in Section <xref target="sect-a"/> to the flattened
   forwarding chain illustrated in Figure 6.</t>
      <ul spacing="normal">
        <li>
          <t>Suppose a packet arrives at "iPE" and matches the VPN prefix
      "VPN-IP2".</t>
        </li>
        <li>
          <t>The forwarding engine walks to the parent of the "VPN-IP2", which
      is the flattened pathlist and applies a hashing algorithm to pick
      a path.</t>
        </li>
        <li>
          <t>Suppose the hashing by the forwarding engine picks the second
      path in the flattened pathlist associated with the leaf "VPN-IP2".</t>
        </li>
        <li>
          <t>Because the second path has the index "0", the label "VPN-L22" is pushed on the packet.</t>
        </li>
        <li>
          <t>Next the forwarding engine picks the second label from the
      OutLabel-List associated with the flattened pathlist resulting in
      "LASBR122(ePE2)" being the next pushed label.</t>
        </li>
        <li>
          <t>The forwarding engine now moves to the parent of the flattened
      pathlist corresponding to the second path. The parent is the
      IGP label leaf corresponding to "ASBR12".</t>
        </li>
        <li>
          <t>So the packet is forwarded towards the ASBR "ASBR12" and the IGP
      label at the top will be "IGP-L12".</t>
        </li>
      </ul>
      <t>
   Based on the above steps, a packet arriving at iPE and destined to
   the prefix VPN-L22 reaches its destination as follows:</t>
      <artwork name="" type="" align="left" alt=""><![CDATA[
o  iPE sends the packet along the shortest path towards ASBR12
   with the following label stack starting from the top: {L12,
   LASBR122(ePE2), VPN-L22}.

o  The penultimate hop of ASBR12 pops the top label "L12". Hence the
   packet arrives at ASBR12 with the remaining label stack
   {LASBR122(ePE2), VPN-L22} where "LASBR12(ePE2)" is the top label.

o  ASBR12 swaps "LASBR122(ePE2)" with the label "LASBR22(ePE2)",
   which is the label advertised by ASBR22 for the ePE2 (the egress
   PE).

o  ASBR22 receives the packet with "LASBR22(ePE2)" at the top.

o  Hence ASBR22 swaps "LASBR22(ePE2)" with the IGP label for ePE2
   advertised by the next-hop towards ePE2 in domain 2, and sends
   the packet along the shortest path towards ePE2.

o  The penultimate hop of ePE2 pops the top label. Hence ePE2
   receives the packet with the top label VPN-L22 at the top

o  ePE2 pops "VPN-L22" and sends the packet as a pure IP packet
   towards the destination VPN-IP2.
]]></artwork>
    </section>
    <section anchor="sect-c" numbered="true" toc="default">
      <name>Perspective</name>
      <t>
   The following table puts the BGP-PIC benefits in perspective
   assuming</t>
      <ul spacing="normal">
        <li>
          <t>1M impacted BGP prefixes</t>
        </li>
        <li>
          <t>IGP convergence ~ 500 msec</t>
        </li>
        <li>
          <t>local protection ~ 50msec</t>
        </li>
        <li>
          <t>FIB Update per BGP destination ~ 100usec conservative,</t>
          <t>
	~ 10usec optimistic
          </t>
        </li>
        <li>
          <t>BGP best route recalculation per BGP destination</t>
          <t>
	~ 10usec optimistic,
          </t>
          <t>
	~ 100usec optimistic
          </t>
          <t>
	Without PIC            With PIC
          </t>
          <dl newline="false" spacing="normal">
            <dt>Local IGP Failure</dt>
            <dd>
              <t>
	10 to 100sec            50msec
              </t>
              <t/>
            </dd>
            <dt>Local BGP Failure</dt>
            <dd>
              <t>
	100 to 200sec            50msec
              </t>
              <t/>
            </dd>
            <dt>Remote IGP Failure</dt>
            <dd>
              <t>
	10 to 100sec           500msec
              </t>
              <t/>
            </dd>
            <dt>Local BGP Failure</dt>
            <dd>
              <t>
	100 to 200sec           500msec
              </t>
              <t/>
            </dd>
          </dl>
        </li>
      </ul>
      <t>
   Upon local IGP next-hop failure or remote IGP next-hop failure, the
   existing primary BGP next-hop is intact and usable hence the
   resiliency only depends on the ability of the FIB mechanism to
   reflect the new path to the BGP next-hop to the depending BGP
   destinations. Without BGP-PIC, a conservative back-of-the-envelope
   estimation for this FIB update is 100usec per BGP destination. An
   optimistic estimation is 10usec per entry.</t>
      <t>
   Upon local BGP next-hop failure or remote BGP next-hop failure,
   without the BGP-PIC mechanism, a new BGP Best-Path needs to be
   recomputed and new updates need to be sent to peers. This depends on
   BGP processing time that will be shared between best-path
   computation, RIB update and peer update. A conservative back-of-the-
   envelope estimation for this is 200usec per BGP destination. An
   optimistic estimation is 100usec per entry.</t>
    </section>
  </back>
</rfc>
