Network Working Group | L. Lavallee, Ed. |
Internet-Draft | A.K. Khopkar |
Intended status: Standards Track | Sky Network Services |
Expires: December 06, 2011 | V.J. Joseph |
Juniper Networks | |
C.S. Shah | |
G. Thareja | |
COLT Technology Services | |
June 04, 2011 |
BGP Optimal Route Reflection
draft-vinod-lavallee-bgp-optimal-route-reflection-00
This document describes procedures on providing optimal routing for IPv4 Internet traffic based on the shortest IGP metric. It builds on the ability of advertising multiple paths for a given destination prefix, using BGP Add-Path and provides a solution for selectively indicating preferences towards BGP path(s) via affinities.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 06, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document describes applications and carrier Internet routing requirements that can be fulfilled by using BGP [RFC4271] Add-Path [I-D.ietf-idr-add-paths], and proposes a mechanism to use affinities with the BGP protocol to influence path selection.
Carriers have diverse deployments of peering points, which may not reflect the network hierarchy. For instance major POPs, which form the core of the network hierarchy may not be the only POPs to have Internet peering/transit points. It is common for POPs further down the network hierarchy (minor POPs) to also have Internet peering/ transit arrangements.
A very common requirement is to offer closest exit routing for Internet traffic. In other words traffic originating from POP-A should always use the local ASBR (if there is a viable path available locally). We make the assumption that complete Internet routing information is available at every ASBR. If the local ASBR does not have reachibility to the relevant prefixes (or the ASBR itself is not available), traffic should use the closest (in terms of IGP metric) egress. This is typically known as "hot potato routing".
In order to ensure the required behavior, a design as follows is commonly deployed:
The dotted lines denote both the physical connections between devices, as well as the BGP RR Client to RR peering sessions.
+-----------+ +-----------+ | PE Layer | | ASBR Layer| __|__,-----. | ------- | ----- _|_____ ,---. / | ( PEA ).....P-Core ... ASBR1 ) | \ ,---. / \/ | `----- . |. ------- | . .----- | \/ \ ( CE | | .| | . | ( AS XYZ) \ /\___|__,----- . |. ------- | . ----- | /\ / `---' | ( PEB ).....P-Core ... ASBR2 ) _|_____/ `---' | `-----' | ------- | ----- | +-----------+ +-----------+
In the illustration above, each ASBR advertises its available prefixes to each of the core (Provider) routers. Each of the P routers acts as a route reflector for all of the PE routers in a given POP. This generally ensures that for any given prefix each PE router will pick a path via the local ASBR(s).
We note two problems with this design:
First, core routers carry BGP, and this hinders operators eager to deploy a purely label switched, BGP free core for all traffic types and applications.
Second, some address families (e.g. VPNv4, MVPNv4, etc) may be deployed using dedicated route reflectors located at strategic locations (major POPs for instance). If IPv4 Internet route reflection is distributed to each P router in each POP as described above, the challenge is to provide a transition plan to overcome the first problem (achieve a BGP free core) and also move to a consolidated scheme of route reflector deployment such as is used by other address families.
In the near term not all devices in the network will support BGP Add- Path. This is because relatively smaller/legacy hardware may not have the scale to support multiple paths for a given prefix, or it is just not planned to be supported. Therefore carriers may deploy alternate models - as illustrated in the following sections, to achieve the following:
In order to address the issue of removing BGP from the P routers, the BGP client-RR peering sessions which were originally between the PE and P routers may be shifted from the P routers to the ASBR(s). Therefore an ASBR may now perform the role of Route Reflector for all of its locally connected PE routers (i.e within the same POP). The P routers are relieved of this role.
+-----------+==========+-----------+ | PE Layer | | ASBR Layer| __|__,-----. | ------- | ----- _|_____ ,---. / | ( PEA ).....P-Core ... ASBR1 ) | \ ,---. / \/ | `----- . |. ------- | . .----- | \/ \ ( CE | | .| | . | ( AS XYZ) \ /\___|__,----- . |. ------- | . ----- | /\ / `---' | ( PEB ).....P-Core ... ASBR2 ) _|_____/ `---' | `-----' | ------- | ----- | +-----------+==========+-----------+
The "==" lines indicate that the client to RR sessions that were originally between the PE and P routers are shifted to be between the PE and ASBR(s) layers.
We note that ASBRs will generally have a large RIB/FIB capacity, due to the fact that they may connect to more than one upstream provider/AS. Therefore resource constraints will be less likely to hinder adoption of Add-Path on such systems.
Carriers normally utilize centralized RRs, placed at strategic locations of their choice, for consolidation of iBGP sessions from the ASBRs. Therefore each ASBR would normally peer with one or more dedicated/central RRs for BGP route propagation between each ASBR. It is to be noted that both the RR and ASBRs need to support the adoption of BGP Add-Path.
It is to be noted that both the RR and ASBRs need to support the adoption of BGP Add-Path. This is illustrated below:
+-----------+==========+-----------+ | PE Layer | | ASBR Layer| __|__,-----. | ------- | ----- _|_____ ,---. / | ( PEF ).....P-Core ... ASBR1 ) | \ ,---. / \/ | `----- . |. ------- | . .----- | \/ \ ( CE | | .| | . | ( AS ABC) \ /\___|__,----- . |. ------- | . ----- | /\ / `---' | ( PEG ).....P-Core ... ASBR2 ) _|_____/ `---' | `-----' | ------- | ----- | +-----------+==========+-----------+ | ! | v +-----------+ |CENTRAL RRs| | | | ------ | | ( RR1 ) | | ------ | | | | ------ | | ( RR2 ) | | ----- | +-----------+ ^ | ------------> Each ASBR | is peered | with +-----------+==========+-----------+ Centralized | PE Layer | | ASBR Layer| RRs. __|__,-----. | ------- | ----- _|_____ ,---. / | ( PEC ).....P-Core ... ASBR3 ) | \ ,---. / \/ | `----- . |. ------- | . .----- | \/ \ ( CE | | .| | . | ( AS XYZ) \ /\___|__,----- . |. ------- | . ----- | /\ / `---' | ( PED ).....P-Core ... ASBR4 ) _|_____/ `---' | `-----' | ------- | ----- | +-----------+==========+-----------+
Each ASBR will now receive multiple paths for a given destination prefix, due to the adoption of Add-Path. The ASBRs would now perform the best path selection for a given prefix, which would ideally result in the local path received i.e. the path received from an external peer - to be preferred over paths received from other ASBRs within the carrier network. It is this path, which would be advertised to the PE routers within a local POP or region - that are peered with a given ASBR.
This scheme of deployment provides a means of extending BGP Add-Path to the entire network, in a seamless and phased manner, while at the same time providing carriers the means to achieve closest exit routing, and deploy a BGP free core (which was the only means of achieving such routing requirements without Add-Path).
Eventually a carrier may be able to consolidate all iBGP sessions on a centralized RR, based on the adoption of Add-Path on PEs. This could relieve the ASBR of route reflection responsibilities.
Extending BGP Add-Path in a phased manner to PE devices that support this feature also improves convergence and reduces churn in the event of a preferred path for a given prefix being unavailable, since multiple paths are available on the PE and the decision process can now be re-run on the PE itself.
Carriers need to deploy services (for example, DNS, RADIUS, etc) with N+M resiliency with a view of offering high availability, geo resiliency and load sharing to scale the solution horizontally. Given such requirements an anycast routing approach may seem to be favorable. A virtual address or set of addresses for each application may be advertised via BGP from every service POP. The advantage of using BGP is that PE routers can learn BGP paths to each service POP and have a local policy based on affinity or other design to load-share and provide resilience.
However the downside of this approach is that all PE routers are required to have a mesh of BGP sessions to all service POPs in addition to the existing iBGP sessions with RRs and other eBGP sessions. PEs at a service POP also need to support a large number of BGP sessions (linear in the number of PEs in the network under consideration). Illustrated below:
+-----------+ |SERVICE POP| | ------ | | ( DNS ) | | ------ | | | | ------ | | ( AAA ) | | ----- | +-----------+ +-----------+ |SERVICE POP| +-----------+ | | | | | ------ | | | | ( DNS ) | | PE1 | | ------ | | PE2 | | | | . | | | | . | | | | PE10 | | | | | | | | | | | | | | ------ | | | | ( AAA ) | +-----------+ | ----- | +-----------+ +-----------+ |SERVICE POP| | | | ------ | | ( WWW ) | | ------ | | | | ------ | | ( DNS ) | | ----- | +-----------+
Above we have three service POPs, and 10 PE routers. This would demand that each PE router has an iBGP session to each of the Service POPs, which may be highly undesirable.
Operationally this implies that any POP commissioning or de- commissioning activity involves multiple touch points on the network and additional change control processes need to be invoked as business critical services are located behind PEs in service POPs. This also increases the risk to other services by nature of change enforced at multiple points.
In the event of Add-Path being incrementally supported on PE devices, BGP Add-Path can be effectively used in these solutions as the RR completely decouples itself from the path selection process and PE routers are presented with all available paths to service POPs. No additional BGP sessions are needed other than the existing iBGP sessions with RRs.
A second benefit is that the number of configuration touches required in order to take a service offline is reduced. For example, this can be accomplished by means of signalling a community (or other attribute) by the relevant service PE. Although this can be done today in a full mesh, all PEs receiving the signaled route must have coherent BGP policies, which can present operational challenges. By contrast, in the proposed scheme, only the RRs directly serving the service PE need to be considered. We also observe that current practice of taking an entire service POP off-line by manipulating IGP metrics may not always be feasible if multiple services/applications are offered from this POP.
In sum this approach allows carriers to operate services in a very low touch mode where chances of collateral damage are minimized, policy enforcement is made simpler and demands are eased on operational processes and resources.
One of the concerns in accepting all paths for a given prefix is scalability. For instance if there are 5 given paths for a prefix - Add-Path would ensure that all paths are accepted by the PE routers, and this is likely to pose scalability concerns with regard to whether the PE router can indeed accept that many BGP paths. Further if a given PE router is accepting full internet tables, not every hardware installed in the network may have the capacity to handle this.
Therefore the requirements here can be defined as following:
Having a limited number of paths being announced via Add-Path may well result in the "best-path" not being announced to a given PE. For instance if RR1 receives 5 paths for Prefix "A", and if PE1 is configured to only accept 2 paths for any given prefix - then PE1 will receive 2 paths for prefix "A" which would be based on the best path calculation from the viewpoint of RR1. In other words each PE will receive random paths from the RR, based on the RRs view of each path which it considers the best and not the ones which would ideally be preferred by the PE. So even though we have solved the issue of scalability, the bigger issue of choosing the closest ASBR based on IGP metric on each PE still remains. This is addressed in the following sections.
The objective here is to ensure that the "best path" based on the shortest IGP metric from each PE to an ASBR is announced by the RR. In other words if a PE indicates its preference to receive only 2 paths for every prefix, from the RR - the RR should ensure that one of these two paths announced includes the path announced by an ASBR that is closest to this PE (based on the IGP metric).
In order to achieve this, Route Targets with RT Constrain [RFC4684] for IPv4 prefixes are proposed. In other words a BGP speaker will have the ability to advertise all of its prefixes using an Route Target, while announcing this to a Route Reflector or an External BGP session.
RT usage in an IPv4 environment would be similar to the standard model used in BGP/MPLS VPNs, where each prefix may have one or more of this extended community attached to it. The purpose is to identify prefixes and paths with certain Route Targets and have them advertised. To further illustrate this from a real life deployment standpoint, A single RT may be used on a per POP/Region - depending on the operator's preference. The RT may be used to indicate the group to which all BGP speakers belong to, and can be used as an identifier to refer to a given POP or region. For instance all BGP speakers within a POP located at Frankfurt can use an RT of "X", whilst BGP speakers in New York can use RT "Y".
Route Targets in the context of IPv4 would not demarcate the network into various boundaries, and only intends to serve merely as a mechanism to indicate a preference for certain BGP paths. Let us assume that there are three ASBRs in a given network; ASBR1 is located in Frankfurt, ASBR2 is located in New York, and ASBR3 is located in Tokyo. Now when Add-Path is used, all 3 paths would be announced to each PE in all the locations. Let us assume that each PE is configured to receive only 2 paths instead of all 3. In this case RT Constrain may be used to instruct the RR to announce a set of paths that match the RT Membership NLRI.
For instance, if all PEs and ASBRs in Tokyo are configured to advertise all prefixes with a RT called "X:1", PEs and ASBRs in New York configured as "X:2", PEs in London are configured for "X:3"and so on - Let us say that the PE routers in Tokyo intend to only receive the local ASBR paths, and the same applies to the PE routers in New York and London. Each PE can use the RT Membership NLRI (via RT Constrain) to only signal interest in the respective Route Targets and receive only BGP paths from their local ASBR.
The use of Route Targets within the context of IPv4 does not change the 32 bit IPv4 prefix structure, and also does not mandate the need of any additional information to be contained within the prefix itself. Route Targets would be advertised as IPv4 BGP extended communities.
It is also proposed that the RR may have the ability to restrict the number of paths on a per RT basis towards each BGP speaker. This will ensure that a given speaker may be configured to receive a only a given number of paths per RT - wherein each RT may correspond to one or more ASBRs. This will ensure that not all paths are announced to a PE that only needs "N" paths. If a given path within the context of an RT is withdrawn - then an alternate path (if exists) with the same RT would need to be advertised to the concerned PE.
Let us look at a scenario on how the use of Route Targets impacts route announcements to ASBRs. Referring back to the example in the previous section - using the example of three POPs in New York, London, and Tokyo - we may prefer that each ASBR in the network receives prefixes and paths for each prefix from every ASBR in the network. Therefore it is fair to assume that none of the ASBR's would ever be configured to restrict the maximum number of paths that it can accept. Therefore a "Default RT" may be advertised by each ASBR. The default RT instructs the RR to announce all paths/prefixes to ASBRs. So each of the three ASBRs would receive all the paths announced by the others. However this does not restrict the option of configuring an ASBR to only receive a given number of paths.
It is to be noted that the "Default RT" indicates that all prefixes/paths even without a corresponding RT will be considered as candidates for announcements.
Let us look at a case where a POPs in the network has an ASBR with only limited number of Internet routes, for instance 1000 IPv4 Internet Prefixes. So let us say, this POP located is located in Munich, and its scope is to provide transit to only a given number of prefixes that are announced by a content provider.
In this scenario, all PEs in the Munich POP may be configured to use the same Route Targets used by a POP that has ASBRS with full routes (Let us say Frankfurt, which has an ASBR with full Internet routes) and just indicate interest in the single common RT via an RT membership NLRI. It is also possible to for the BGP speakers in Munich to use an independent RT which is common for the POP, and signal interest in the local prefixes/paths in addition to the Frankfurt Prefixes/Paths using RT Constrain.
Let us consider a scenario where, a BGP speaker is interested in multiple RTs. The proposal suggests that any implementation should have the ability for a given BGP speakers to use a range of RT values and Wildcards to signal affinities. For example a range as follows indicates that the RT Membership NLRI includes RT values of [1-20, 30-35, 50]. This avoids the need for configuring each RT value individually.
A second consideration is for a PE to receive all paths announced by a set of ASBRs - indicated by a list of RTs (let us say "X:Y, X:Z", and have the RR calculated best path for all the remaining prefixes ir-respective of whether they have any RTs attached or not. Going back to the example of the Munich POP which only had a local ASBR with 1000 prefixes, it may be important for the operator to just ensure that all local ASBR paths are available in the local PEs - whilst all other paths/prefixes may follow the best path calculated by the RR.
In this it is proposed that each RR in the network can be configured with a unique RT. For instance RR1 may be configured with RT A:B, while RR2 is configured with with A:C. Each BGP speaker by virtue of signalling the RT configured on each RR (A:B, A:C) via RT Constrain instructs the RR to only announce the best path for all prefixes with or without Route Targets. This way each BGP speaker can choose which RR it intends to receive the best path from, and which RR it may choose to receive paths for specific Route Targets. This provides the needed flexibility for an operator.
In the scenario where a given POP does not have any local ASBRs, all PEs could just use RT Membership NLRI to indicate the preference of paths they would like to receive.
RT advertisement and RT membership NLRI used to signal RT constrain are two different aspects. For instance an ASBR may not attach an RT to any of its announcements, but can still indicate its preference by using a list of RTs that it is interested in (via Route Constrain). Similarly an ASBR can advertise all its prefixes with an RT of "X:Y" for example, and indicate a Default RT Membership NLRI, to receive all paths/prefixes.
The Default RT ensures that all paths/prefixes in the network, ir-respective of whether an RT is attached or not - is advertised to a given BGP speaker by the RR. It is possible to configure the maximum no. of paths that will be advertised to BGP speakers via the Default RT.On the contrary, RT configured on the RR instructs the RR to send only the best path for prefixes (prefixes may have RTs attached or even may not).
This informational document introduces no IANA considerations.
This document discusses deployment practices and has no effect on the security of the underlying routing protocols.
We would like to thank Rahul Aggarwal, Yakov Rekhter, John Scudder and Ronald Bonica for discussions that helped.
[RFC2796] | Bates, T., Chandra, R. and E. Chen, "BGP Route Reflection - An Alternative to Full Mesh IBGP", RFC 2796, April 2000. |
[RFC4271] | Rekhter, Y., Li, T. and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. |
[RFC4456] | Bates, T., Chen, E. and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006. |
[RFC4684] | Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K. and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, November 2006. |
[RFC4786] | Abley, J. and K. Lindqvist, "Operation of Anycast Services", BCP 126, RFC 4786, December 2006. |
[I-D.ietf-idr-add-paths] | Walton, D, Chen, E, Retana, A and J Scudder, "Advertisement of Multiple Paths in BGP", Internet-Draft draft-ietf-idr-add-paths-06, September 2011. |