Inter-Domain Routing | H. Gredler, Ed. |
Internet-Draft | RtBrick Inc. |
Intended status: Informational | K. Vairavakkalai |
Expires: April 9, 2018 | C. Ramachandran |
B. Rajagopalan | |
E. Aries | |
Juniper Networks, Inc. | |
L. Fang | |
eBay | |
October 06, 2017 |
Egress Peer Engineering using BGP-LU
draft-gredler-idr-bgplu-epe-11
The MPLS source routing paradigm provides path control for both intra- and inter- Autonomous System (AS) traffic. RSVP-TE is utilized for intra-AS path control. This documents outlines how MPLS routers may use the BGP labeled unicast protocol (BGP-LU) for doing traffic-engineering on inter-AS links.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 9, 2018.
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Today, BGP-LU [RFC3107] is used both as an intra-AS [I-D.ietf-mpls-seamless-mpls] and inter-AS routing protocol. BGP-LU may advertise a MPLS transport path between IGP regions and Autonomous Systems. Those paths may span one or more router hops. This document describes advertisement and use of one-hop MPLS label-switched paths (LSPs) for traffic-engineering the links between Autonomous Systems.
Consider Figure 1: an ASBR router (R2) advertises a labeled host route for the remote-end IP address of its link (IP3). The BGP next-hop gets set to R2s loopback IP address. For the advertised Label <N> a forwarding action of 'POP and forward' to next-hop (IP3) is installed in R2's MPLS forwarding table. Now consider if R2 had several links and R2 would advertise labels for all of its inter-AS links. By pushing the corresponding MPLS label <N> on the label-stack an ingress router R1 may control the egress peer selection.
AS1 : AS2 : +----+ iBGP +----+ : eBGP +----+ | R1 |----------| R2 |-IP2----IP3-| R3 | +----+ +----+ : +----+ : -----------traffic-flow----------> <------------route-flow-----------
Figure 1: single-hop LSPs
Of course, since R1 and R2 may not be directly connected to each other, if the interior routers within AS1 do not maintain routes to external destinations, carrying traffic to such destinations would require a tunnel from R1 to R2. Such tunnel could be realized as either a MPLS Label Switched Path (LSP), or by GRE [RFC2784].
BGP-LU is often just seen as a 'stitching' protocol for connecting Autonomous Systems. BGP-LU is often not viewed as a viable protocol for solving the Inter-domain traffic-engineering problem.
With this document the authors want to clarify the use of BGP-LU for Egress Peering traffic-engineering purposes and encourage both implementers and network operators to use a widely deployed and operationally well understood protocol, rather than inventing new protocols or new extensions to the existing protocols.
The following topology and IP addresses shall be used throughout the Egress Peering Engineering advertisement examples.
: : AS 1 : AS 2 : AS 4 : : : +-----+ : /IP2--:-IP3--|ASBR3| : +-----+ +-----+-IP4--:-IP5--+-----+-----------+-----+ | R1 +-------------+ASBR1| : |ASBR6| +--+--+ +--+--+-IP6--:-IP7--+-----+-----------+-----+ | | \ : |ASBR4| : / | | \ : +-----+ : / | | IP8- --- | | \ ................ / | | IP9- --- | | : \ / : | | : \ / : +--+--+ +--+--+ : +--+--+ : | R2 +-------------+ASBR2|-IP10-:-IP11-|ASBR5| : +-----+ +-----+ : +-----+ : : : : AS3 : : :
Figure 2: Sample Topology
In Figure 3 a simple network layout is shown. There are two classes of BGP speakers:
Ingress routers receive BGP-LU routes from the ASBRs. Each BGP-LU route corresponds to an egress link. Furthermore Ingress routers receive their service routes using the BGP protocol. The BGP Add-paths extension [I-D.ietf-idr-add-paths] ensures that multiple paths to a given service route may get advertised.
As outlined in [I-D.filsfils-spring-segment-routing-central-epe], Controllers receive BGP-LU routes from the ASBRs as well. However the service routes may be received either using the BGP protocol plus the BGP Add-paths extension [I-D.ietf-idr-add-paths] or alternatively The BGP Monitoring protocol [I-D.ietf-grow-bmp] (BMP). BMP has support for advertising the RIB-In of a BGP router. As such it might be a suitable protocol for feeding all potential egress paths of a service-route from a ASBR into a controller.
An ASBR assigns a distinct label for each of its next-hops facing an eBGP peer and advertises it to its internal BGP mesh. The ASBR programs a forwarding action 'POP and forward' into the MPLS forwarding table. Note that the neighboring AS is not required to support exchanging NLRIs with the local AS using BGP-LU. It is the local ASBR (ASBR{1,2}) which generates the BGP-LU routes into its iBGP mesh or controller facing session(s). The forwarding next-hop for those routes points to the link-IP addresses of the remote ASBRs (ASBR{3,4,5}). Note that the generated BGP-LU routes always match the BGP next-hop that the remote ASBRs set their BGP service routes to, such that the software component doing route-resolution understands the association between the BGP service route and the BGP-LU forwarding route.
Throughout this document we describe how the BGP next-hop of both BGP Service Routes and BGP-LU routes shall be rewritten. This may clash with existing network deployments and existing network configurations guidelines which may mandate to rewrite the BGP next-hop when an BGP update enters an AS.
The Egress peering use case assumes a central controller as shown Figure 3. In order to support both existing BGP nexthop guidelines and the suggestion described in this document, an implementation SHOULD support several internal BGP peer-groups:
The first peer group MAY be left unchanged and use any existing BGP nexthop rewrite policy. The second peer group MUST use the BGP rewrite policy described in this document for both service and BGP-LU routes.
Of course a common iBGP peer group and a common rewrite policy may be used if the proposed policy is compatible with existing routing software implementations of BGP next-hop route resolution.
+-----------+ | Ingress | | Router | +-----------+ ^ | +-----------+ | BGP | +------------+ | Route |-------------->| Controller | | Reflector | +------------+ +-----------+ ^ ^ ^ | | | | | +-------------------+ | | | | v v v +-----------+ +-----------+ | BGP | | BGP | | ASBR1 | . . . | ASBR2 | +-----------+ +-----------+
Figure 3: Selective iBGP NH rewrite
In Figure 2 the ASBR{1,5} and ASBR{2,5} links are examples for single-hop eBGP advertisements.
Todays operational practice for load-sharing across parallel links is to configure a single multi-hop eBGP session between a pair of routers. The IP addresses for the Multi-hop eBGP session are typically sourced from the loopback IP interfaces. Note that those IP addresses do not share an IP subnet. Most often those loopback IP addresses are most specific host routes. Since the BGP next-hops of the received BGP service routes are typically rewritten to the remote routers loopback IP address they cannot get immediatly resolved by the receiving router. To overcome this, the operator configures a static route with next-hops pointing to each of the remote-IP addresses of the underlying links.
In Figure 2 both ASBR{1,3} links are examples of a multi-hop eBGP advertisement. In order to advertise a distinct label for a common FEC throughout the iBGP mesh, ASBR1 and all the receiving iBGP routers need to support the BGP Add-paths extension. [I-D.ietf-idr-add-paths].
In addition to offering a distinct BGP-LU label for each egress link, an ASBR MAY want to advertise a BGP-LU label which represents a load-balancing forwarding action across a set of peers. The difference is here that the ingress node gives up individual link control, but rather delegates the load-balancing decision to a particular egress router which has the freedom to send the traffic down to any link in the Peer Set as identified by the BGP-LU label.
Assume that ASBR1 wants to advertise a label identifying the Peer Set {ASBR3, ASBR4, ASBR5}.
Finally ASBR1 programs a MPLS forwarding state of 'POP and load-balance' to {203.0.113.3, 203.0.113.5, 203.0.113.7, 203.0.113.9} for the advertised label 104.
A router has one or more forwarding plane units. A forwarding plane unit consists of one or more interfaces. Forwarding of packets to an interface that is member of a forwarding plane unit is cheaper than across units.
A route entry in the forwarding-table may contain multiple next-hops, each pointing to a network-interface. When forwarding a packet, a forwarding plane unit may optionally provide preference to a subset of these next-hops, whose interfaces are its own members. This behavior is called "Locality forwarding bias".
An ASBR MAY assign a distinct label for the set of eBGP-peers that share a forwarding plane unit and advertise it to its internal BGP mesh. The ASBR programs a forwarding action 'POP and IP-lookup' into the MPLS forwarding table for these labels. While performing the IP-lookup, the ASBR MUST perform "Locality-forwarding bias" to ensure it only selects next-hops towards eBGP peers that are attached to the current forwarding plane unit, where the IP-lookup is happening.
This provides the ingress-peers with ability to steer traffic towards a "subset of eBGP-peers" attached to an ASBR, while preserving the ability of the ASBR to aggregate the IP prefixes received from those eBGP-peers, while re-advertising to the internal BGP mesh.
It is desirable to provide a local-repair based protection scheme, in case a redundant path is available to reach a peer AS. Protection may be applied at multiple levels in the routing stack. Since the ASBR has insight into both BGP-LU and BGP service advertisements, protection can be provided at the BGP-LU, at the BGP service or both levels.
Assume the network operator wants to provide a local-repair next-hop for the 172.16/12 BGP service route at ASBR1. The active route resolves over the parallel links towards ASBR3. In case the link #1 between ASBR{1,3} fails there are now several candidate backup paths providing protection against link or node failure.
Assuming that the remaining link #2 between ASBR{1,3} has enough capacity, and link-protection is sufficient, this link MAY serve as temporary backup.
However if node-protection or additional capacity is desired, then the local link between ASBR{1,4} or ASBR{1,5} MAY be used as temporary backup.
ASBR1 is both originator and receiver of BGP routing information. For this protection method it is required that the ASBRs support the [I-D.ietf-idr-best-external] behavior. ASBR1 receives both the BGP-LU and BGP service routes from ASBR2 and therefore can use the ASBR2 advertised label as a backup path given that ASBR1 has a tunnel towards ASBR2.
For protecting plain unicast (Internet) routing information a very simple backup scheme could be to recurse to the relevant IP forwarding table and do an IP lookup to further determine a new egress link.
Typically, Egress Link Protection mechanisms for Service-routes at the ASBRs are susceptible to micro forwarding-loops if the IP-lookup at backup-path ASBR points back to the primary-ASBR for some reason, during local-repair period.
By using mechanisms described in this document, such forwarding-loops can be avoided. Because the backup-ASBR will receive a MPLS-packet with EPE label, it will not do an IP-lookup, and will forwarding traffic based on MPLS-label lookup only. Thus the repaired traffic is guaranteed to exit the network towards an Egress-peer at backup-ASBR, and not turn back towards the IBGP core.
For a software component which controls the egress link selection it may be desirable to know about a particular egress links current utilization, such that it can adjust the traffic that gets sent to a particular interface.
In [I-D.ietf-idr-link-bandwidth] a community for reporting link-bandwidth is specified. Rather than reporting the static bandwidth of the link, the ASBRs shall report the available bandwidth as seen by the data-plane via the link-bandwidth community in their BGP-LU update message.
It is crucial that ingress routers learn quickly about congestion of an egress link and hence it is desired to get timely updates of the advertised per-link BGP-LU routes carrying the available bandwidth information when the available bandwidth crosses a certain (preconfigured) threshold.
Controllers may also utilize the link-bandwidth community among other common mechanisms to retrieve data-plane statistics (e.g. SNMP, NETCONF)
Many thanks to Yakov Rekhter, Chris Bowers and Jeffrey (Zhaohui) Zhang for their detailed review and insightful comments.
Special thanks to Richard Steenbergen and Tom Scholl who brought up the original idea of using MPLS for BGP based egress load-balancing at their inspiring talk at Nanog 48.
This documents does not request any action from IANA.
This document does not introduce any change in terms of BGP security.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC2784] | Farinacci, D., Li, T., Hanks, S., Meyer, D. and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, DOI 10.17487/RFC2784, March 2000. |
[RFC3107] | Rekhter, Y. and E. Rosen, "Carrying Label Information in BGP-4", RFC 3107, DOI 10.17487/RFC3107, May 2001. |
[I-D.filsfils-spring-segment-routing-central-epe] | Filsfils, C., Previdi, S., Patel, K., Shaw, S., Ginsburg, D. and D. Afanasiev, "Segment Routing Centralized Egress Peer Engineering", Internet-Draft draft-filsfils-spring-segment-routing-central-epe-05, August 2015. |
[I-D.ietf-grow-bmp] | Scudder, J., Fernando, R. and S. Stuart, "BGP Monitoring Protocol", Internet-Draft draft-ietf-grow-bmp-17, January 2016. |
[I-D.ietf-idr-add-paths] | Walton, D., Retana, A., Chen, E. and J. Scudder, "Advertisement of Multiple Paths in BGP", Internet-Draft draft-ietf-idr-add-paths-15, May 2016. |
[I-D.ietf-idr-best-external] | Marques, P., Fernando, R., Chen, E., Mohapatra, P. and H. Gredler, "Advertisement of the best external route in BGP", Internet-Draft draft-ietf-idr-best-external-05, January 2012. |
[I-D.ietf-idr-link-bandwidth] | Mohapatra, P. and R. Fernando, "BGP Link Bandwidth Extended Community", Internet-Draft draft-ietf-idr-link-bandwidth-06, January 2013. |
[I-D.ietf-mpls-seamless-mpls] | Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, M. and D. Steinberg, "Seamless MPLS Architecture", Internet-Draft draft-ietf-mpls-seamless-mpls-07, June 2014. |