<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="exp" docName="draft-ietf-idr-bgp-fwd-rr-05" ipr="trust200902"
     submissionType="IETF" updates="">
  <front>
    <title abbrev="BGP RR NHS">BGP Route Reflector with Next Hop Self</title>

    <author fullname="Kaliraj Vairavakkalai" initials="K." role="editor"
            surname="Vairavakkalai">
      <organization>HPE</organization>

      <address>
        <postal>
          <street>1133 Innovation Way,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>kaliraj.vairavakkalai@hpe.com</email>
      </address>
    </author>

    <author fullname="Natrajan Venkataraman" initials="N." role="editor"
            surname="Venkataraman">
      <organization>HPE</organization>

      <address>
        <postal>
          <street>1133 Innovation Way,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>natrajan.venkataraman@hpe.com</email>
      </address>
    </author>

    <date day="23" month="February" year="2026"/>

    <abstract>
      <t>The procedures in BGP Route Reflection (RR) spec RFC4456 primarily
      deal with scenarios where the RR is reflecting BGP routes with next hop
      unchanged. In some deployments like Inter-AS Option C (Section 10,
      RFC4364), the ABRs may perform RR functionality with nexthop set to
      self. If adequate precautions are not taken, the RFC4456 procedures can
      result in traffic forwarding loop in such deployments.</t>

      <t>This document illustrates one such looping scenario, and specifies
      approaches to minimize possiblity of traffic forwarding loop in such
      deployments. An example with Inter-AS Option C (Section 10, RFC4364)
      deployment is used, where RR with next hop self is used at redundant
      ABRs when they re-advertise BGP transport family routes between multiple
      IGP domains.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119">RFC 2119</xref> <xref target="RFC8174">RFC
      8174</xref> when, and only when, they appear in all capitals, as shown
      here.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>The procedures in BGP Route Reflection (RR) spec RFC4456 primarily
      deal with scenarios where the RR is reflecting BGP routes with next hop
      unchanged. In some deployments like Inter-AS Option C (Section 10,
      RFC4364), the ABRs may perform RR functionality with nexthop set to
      self. If adequate precautions are not taken, the RFC4456 procedures can
      result in traffic forwarding loop in such deployments.</t>

      <t>This document illustrates one such looping scenario, and specifies
      approaches to minimize possiblity of traffic forwarding loop in such
      deployments. An example with Inter-AS Option C (Section 10, RFC4364)
      deployment is used, where RR with next hop self is used at redundant
      ABRs when they re-advertise BGP transport family routes between multiple
      IGP domains.</t>

      <t/>
    </section>

    <section title="Terminology">
      <t>ABR: Area Border Router</t>

      <t>AS: Autonomous System</t>

      <t>AFI: Address Family Identifier</t>

      <t>BN: Border Node</t>

      <t>EP: Endpoint, e.g. a loopback address in the network</t>

      <t>MPLS: Multi Protocol Label Switching</t>

      <t>PE: Provider Edge</t>

      <t>SAFI: Subsequent Address Family Identifier</t>

      <section title="Definitions and Notations">
        <t>Service Family: A BGP address family used for advertising routes
        for destinations in "data traffic". For example, AFI/SAFIs 1/1 or
        1/128.</t>

        <t>Transport Family: A BGP address family used for advertising
        tunnels, which are in turn used by service routes for resolution. For
        example, BGP LU (AFI/SAFI : 1/4) or BGP CT (AFI/SAFI : 1/76).</t>

        <t>Transport Tunnel : A tunnel over which a service may place traffic.
        Such a tunnel can be provisioned or signaled using a variety of means.
        For example, Generic Routing Encapsulation (GRE), UDP, LDP, RSVP-TE,
        IGP FLEX-ALGO or SRTE.</t>

        <t>Tunnel Route: A Route to Tunnel Destination/Endpoint that is
        installed at the headend (ingress) of the tunnel.</t>

        <t>Tunnel Domain: A domain of the network under a single
        administrative control, containing transport tunnels between Service
        Nodes (SNs) and Border Nodes (BNs).</t>
      </section>
    </section>

    <section title="Problem Description">
      <figure anchor="RefTopo" suppress-title="false"
              title="Reference Topology: Inter-domain BGP Transport Network">
        <artwork align="left" xml:space="preserve">
                [RR26]      [RR27]                       
                 |            |                             
                 |            |                             
                 |+-[ABR23]--+|+--[ASBR21]---[ASBR13]-+
                 ||          |||          `  /        |
[CE41]--[PE25]--[P28]       [P29]          `/       [P15]-[PE11]--[CE31]
                 |           | |           /`         |
                 |           | |         /    `       |
                 +--[ABR24]--+ +--[ASBR22]---[ASBR14]-+


       |      AS2       |         AS2      |                   |
   AS4 +    region-1    +      region-2    +       AS1         + AS3
       |                |                  |                   |


203.0.113.41  ------------ Traffic Direction ----------&gt;  203.0.113.31

</artwork>
      </figure>

      <list>
        <t>This topology shows an Inter-AS option C (Section 10, <xref
        target="RFC4364"/>) provider MPLS network that consists of two ASes,
        AS1 and AS2. They are serving customer networks AS3, AS4 respectively.
        Traffic direction being described is CE41 to CE31.</t>

        <t>AS2 is further divided into two regions. There are three tunnel
        domains in provider's network: The two regions in AS2 use RSVP
        intra-domain tunnel. AS1 also uses RSVP-TE intra-domain tunnels. MPLS
        forwarding is used within these domains and on inter-domain links. BGP
        LU (AFI/SAFI: 1/4) is the transport family providing reachability
        between PE loopbacks PE25 and PE11.</t>

        <t>Forwarding of PE25 to PE11 BGP LU traffic in AS2 region-2 is the
        focus of this discussion.</t>

        <t>The following RSVP-TE tunnels exist in region-2.<list>
            <t>ABR23_to_ASBR21 - metric 40</t>

            <t>ABR23_to_ASBR22 - metric 30</t>

            <t>ABR24_to_ASBR21 - metric 40</t>

            <t>ABR24_to_ASBR22 - metric 30</t>

            <t>ABR23_to_ABR24 - metric 30</t>

            <t>ABR24_to_ABR23 - metric 30</t>
          </list>The Router-ID of ASBR21 is better than ASBR22 from
        perspective of the BGP path selection.</t>

        <t>The problem is that the pair of redundant ABRs (ABR23, ABR24 in
        <xref target="RefTopo"/>), each acting as an RR with next hop self,
        may choose each other as best path towards egress PE11, instead of the
        upstream ASBR (ASBR21 or ASBR22), causing a traffic forwarding
        loop.</t>

        <t>This happens because of following the path selection rule specified
        in <xref target="RFC4456">Section 9 of BGP RR</xref> that tie-breaks
        on ORIGINATOR_ID before CLUSTER_LIST. RFC4456 considers pure RR
        functionality which leaves next hop unchanged.</t>

        <t>This problem is more probable to happen for routes of BGP transport
        address families in Inter-AS Option C (Section 10 <xref
        target="RFC4364"/>) networks, like BGP LU (1/4 or 2/4) and BGP CT
        (AFI/SAFIs: 1/76 or 2/76), because the ABRs perform RR with nexthop
        self functionality for these families.</t>

        <t>Summarising, the necessary conditions for this problem are: <list>
            <t>Redundant ABRs perform RR with nexthop self</t>

            <t>The redundant ABRs using distinct CLUSTER_ID</t>

            <t>Addpath send enabled in Region 1, from RR26 to the redundant
            ABRs ABR23, ABR24</t>

            <t>ABR23, ABR24 using per-prefix label allocation mode for the
            transport layered families.</t>

            <t>IGP metric situations in Region 2, as explained above.</t>

            <t>Existance of Inter-ABR tunnels.</t>

            <t>RFC4456 tie-breaks on ORIGINATOR_ID before CLUSTER_LIST</t>

            <t>Router-ID values for upstream ASBRs.</t>
          </list></t>
      </list>
    </section>

    <section title="Solution Approaches">
      <t>Using one or more of the following approaches softens the possibility
      of such loops in an Inter-AS Option C network with redundant ABRs. These
      approaches manage one of the above necessary conditions.</t>

      <section anchor="SameClusterID"
               title="Using Same Cluster ID at the ABRs">
        <t>Configuring the same CLUSTER_ID at the redundant ABR nodes.</t>

        <t>CLUSTER_ID Loop check will make routes reflected by an ABR unusable
        at the redundant ABRs.</t>

        <t>This approach provides a stable way to avoid this loop, and is not
        affected by network churn.</t>

        <t>However this approach does not allow the ABR-ABR tunnels to be used
        as backup path, in the event where an ABR looses all tunnels to
        upstream ASBR.</t>
      </section>

      <section anchor="IgpMetricMgmt" title="Using IGP Metric Management">
        <t>Assign IGP metrics, such that "ABR to redundant ABR" cost is
        inferior to "ABR to upstream ASBR" cost.</t>

        <t>Then 'IGP metric' based tie-breaker will make an ABR chose the
        ASBRs as best path, instead of redundant ABR.</t>

        <t>Since IGP metrics may change during network churn caused by events
        like link down, this approach needs careful planning to handle all
        possible IGP metric change scenarios. Debugging any loops caused by
        such transient situations may be much harder.</t>

        <t>This approach allows using the ABR-ABR tunnels to be used as backup
        path, in the event where an ABR looses reachability to upstream ASBR.
        But there is a possibility of transient forwarding loop until BGP
        withdrawals are received, in situations where the redundant ABRs
        simultanously loose tunnel to upstream ASBR (like upstream ASBR
        failure event). Some mechanism like the one described in <xref
        target="MNH"/> Sec A.9 may be needed to handle the transient
        forwarding loop problem.</t>
      </section>

      <section anchor="UseAIgp" title="Using AIGP Cost Management">
        <t>Using AIGP Cost in the network accumulates the IGP metric at "each
        next hop self" re-advertisement. This provides a better accumulated
        metric for the path.</t>

        <t>Then 'AIGP cost' based tie-breaker will make an ABR chose the ASBRs
        as best path, instead of redundant ABR.</t>

        <t>This approach also needs careful IGP metric planning because it
        depends on the underlying IGP metric view of each node.</t>

        <t>However this approach allows using the ABR-ABR tunnels to be used
        as backup path, in the event where an ABR looses all tunnels to
        upstream ASBR.</t>

        <t>This approach allows using the ABR-ABR tunnels to be used as backup
        path, in the event where an ABR looses reachability to upstream ASBR.
        But there is a possibility of transient forwarding loop until BGP
        withdrawals are received, in situations where the redundant ABRs
        simultanously loose tunnel to upstream ASBR (like upstream ASBR
        failure event). Some mechanism like the one described in MNH Sec A.9
        may be needed to handle the transient forwarding loop problem.</t>
      </section>

      <section anchor="bgpCT" title="Using BGP CT Management">
        <t>In a BGP CT network, using procedures described in <xref
        target="BGP-CT"/> , tunnels belonging to a certain Transport Class
        (TC) may not be provisioned between the redundant ABRs, or may not be
        included in the customized Resolution Scheme used to resolve BGP CT
        routes with that TC.</t>

        <t>This will ensure that the BGP CT route received with redundant ABR
        as next hop will be Unusable at the receiving ABR, because it will
        fail resolving the next hop.</t>

        <t>This approach needs Transport Class and Resolution Scheme planning
        in the BGP CT network, and provides a stable way to avoid this loop,
        and is not affected by network churn.</t>

        <t>However this approach does not allow the ABR-ABR TC tunnels to be
        used as backup path, in the event where an ABR looses all tunnels for
        that TC to upstream ASBR.</t>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes no new requests of IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This document does not change the underlying security issues inherent
      in the existing BGP protocol, such as those described in <xref
      target="RFC4271"/>, <xref target="RFC4272"/> and <xref
      target="RFC4456"/>.</t>

      <t>Mehanisms described in this document reduce possibility of loops
      within an IBGP domain. They do not affect routing across EBGP
      sessions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4456.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4271.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4272.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"?>
    </references>

    <references title="Informative References">
      <reference anchor="BGP-CT"
                 target="https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-ct-28">
        <front>
          <title abbrev="BGP CT">BGP Classful Transport Planes</title>

          <author fullname="Kaliraj Vairavakkalai" initials="" role="editor"
                  surname="Vairavakkalai"/>

          <author fullname="Natarajan Venkataraman" initials="" role="editor"
                  surname="Venkataraman"/>

          <date day="17" month="03" year="2024"/>
        </front>
      </reference>

      <reference anchor="MNH"
                 target="https://datatracker.ietf.org/doc/html/draft-ietf-idr-multinexthop-attribute-00">
        <front>
          <title abbrev="MNH">BGP MultiNexthop Attribute</title>

          <author fullname="Kaliraj Vairavakkalai" initials="" role="editor"
                  surname="Vairavakkalai"/>

          <date day="17" month="03" year="2024"/>
        </front>
      </reference>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4364.xml"?>
    </references>

    <section anchor="Appendix A" numbered="true" title="Appendix">
      <section title="Document History">
        <t>The content in this document was introduced as part of <xref
        target="BGP-CT"/>. But because the described problem is not specific
        to BGP CT and is useful for other BGP families also, it is being
        extracted out to this separate document.</t>
      </section>
    </section>

    <section anchor="Acknowledgements" numbered="false"
             title="Acknowledgements">
      <t>The authors thank Jeff Haas, Jon Hardwick, Keyur Patel, Igor
      Malyushkin, Robert Raszuk, Susan Hares for the discussions and review
      comments.</t>
    </section>
  </back>
</rfc>
