BESS | J. Heitz |
Internet-Draft | A. Sajassi |
Intended status: Standards Track | Cisco |
Expires: May 17, 2018 | J. Drake |
Juniper | |
J. Rabadan | |
Nokia | |
November 13, 2017 |
Multi-homing and E-Tree in EVPN with Inter-AS Option B
draft-heitz-bess-evpn-option-b-01
The BGP speaker that originates an EVPN Ethernet A-D per ES route is identified by the next-hop of the route. When the route is propagated by an ASBR as an Inter-AS Option B route, the ASBR overwrites the next-hop. This document describes a method to identify the originator of the route.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 17, 2018.
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Inter-AS Option B: This is described in Section 10.b of [RFC4364]
EAD-per-ES: Ethernet A-D per Ethernet Segment Route.
EAD-per-EVI: Ethernet A-D per EVPN Instance Route.
EAD: EVPN Type 1 route: Ethernet Auto-discovery Route. Either an EAD-per-ES or an EAD-per-EVI route.
Type 2/5: either the EVPN Type 2 route: MAC/IP Advertisement Route or the EVPN Type 5 route: IP Prefix Route described in [I-D.ietf-bess-evpn-prefix-advertisement].
Mass Withdraw: To withdraw the route from the forwarding table. For example, a MAC route that is mass withdrawn remains in the BGP table. The MAC route is required for directing packets with the specified MAC destination address to a matching backup or alias route. When a MAC route is completely withdrawn, then the matching backup or alias routes can no longer be used for the given MAC address. The withdrawal of an EAD-per-ES route will cause the mass withdrawal of associated Type 2/5 routes as well as associated EAD-per-EVI routes.
CE3 | PE1 / \ CE1 ASBR1---ASBR2---PE3--CE2 \ / PE2 Figure 1: Inter-AS Option B
Inter-AS Option B is illustrated in Figure 1.
Traffic flow is from CE2 to CE1 where PE3 is an imposition PE, and PE1 and PE2 are disposition PEs. The following sections describe the issues that EVPN multi-homing and EVPN E-tree services have in these types of scenarios.
In a multi-homing scenario, the router that performs the redundancy switchover or the load balancing (e.g. PE3) must know which router originated the Ethernet A-D routes. These redundancy functions are normally implemented on a PE, but not on an ASBR.
Quote from [RFC7432]:
In the Intra-AS case, the remote PE identifies the "PEs that have advertised reachability" by the next-hops of the Ethernet A-D routes. In the Inter-AS option B case, ASBR1 and ASBR2 rewrite the next-hops to themselves on all EVPN route advertisements, thus losing the identity of the PE that originated an advertisement.
As a result, PE3 is unable to distinguish an EAD-per-ES route that originated at PE1 from one that originated at PE2.
As described in [EVPN-Etree], leaf-to-leaf BUM traffic filtering is always performed at the disposition PE and based on the Leaf Label. The Leaf Label can be downstream allocated (ingress replication) or upstream allocated (p2mp tunnels) and is advertised in an EAD-per-ES route with ESI-0. As in the multi-homing case, the PEs must identify the PE that originated a given EAD-per-ES route, for both cases, ingress replication or p2mp tunnels, so that the leaf-to-leaf BUM filtering can be successful.
If ingress-replication is used for BUM traffic, the ingress PE must identify the originator of the ESI-0 EAD-per-ES route, program the Leaf Label and push it on the stack when sending BUM Leaf traffic to the egress PE. However, this identification of the originating PE is not possible in Inter-AS option B scenarios where ASBRs rewrite the next-hops. For instance, assuming CE2 and CE3 (Figure 1) are connected to Leaf ACs, PE1 will advertise a Leaf Label in an EAD-per-ES route for ESI-0. When CE2 sends BUM traffic, PE3 will not know what Leaf Label to use for sending traffic to PE1.
Similarly, when PE3 uses non-segmented p2mp tunnels for BUM traffic, PE3 will upstream allocate a Leaf Label and advertise it in an EAD-per-ES route, so that when sending BUM traffic with a Leaf Label, PE1 can identify that is coming from a Leaf and not forward it to CE3.
In both cases, the current Intra-AS procedures do not allow to identify the originator of the EAD-per-ES routes and therefore egress BUM filtering for leaf-to-leaf is not possible when the Leaf ACs are located on different AS'es.
The Tunnel Encapsulation Attribute is specified in [I-D.ietf-idr-tunnel-encaps]. A new TLV to identify the Originating PE is specified here. It is called OPE. The tunnel type for the OPE (suggested value 15) is to be assigned by IANA. The OPE MUST contain the Remote Endpoint Sub-TLV. The OPE must be able to uniquely identify the PE of origin within all ASes that participate in an EVPN instance.
If a BGP speaker, such as a route reflector or an ASBR, is about to re-advertise a Type 2/5 or EAD route that does not have a OPE, and will change the next-hop of that route, then it MUST add one by putting the received next-hop into the Remote Endpoint Sub-TLV of the OPE. This will ensure that all originating EVPN routes carry the necessary information for imposition PEs to function properly for aliasing and mass withdraw.
Any router that re-advertises a route that contains a OPE may modify some TLVs in the Tunnel Encapsulation Attribute attribute. However, it MUST keep the OPE unchanged. Examples are ASBR1 and ASBR2 in Figure 1.
For an inter-AS option B scenario, when a PE receives EVPN route(s) with OPE from an ASBR, then everything works per [RFC7432] specification including both aliasing function and mass withdraw. i.e., the imposition PE (e.g., PE3) can process mass withdraw messages (Ethernet A-D per ES route). However, if a PE receives EVPN route(s) without a OPE from an ASBR, then the mass withdraw function operates in a degenerate mode where only Ethernet A-D per EVI route can be processed (for its corresponding MAC-VRF) but not Ethernet A-D per ES route (corresponding to all the impacted MAC-VRFs). The following sections detail the procedures associated with OPE processing.
When routes are being compared, they must exist in the same MAC-VRF and have the same non-reserved ESI. In addition, when Type 2/5 routes and EAD-per-EVI routes are being compared, they must have the same Ethernet Tag. Type 2/5 routes with ESI==0 do not use mass withdrawal or aliasing.
If all Type 2/5 and EAD routes have a OPE, then "PEs that have advertised reachability" can be identified by the OPE and the procedures of [RFC7432] can be applied without modification.
The routes that have a OPE are handled as per the previous section. The routes that do not have a OPE need the following procedures.
Type 2/5 routes without a OPE and EAD-per-EVI routes without a OPE are valid if at least one EAD-per-ES route without a OPE exists with the same next-hop. In other words: if multiple EAD-per-ES routes with the same next-hop as a Type 2/5 route exist, then the Type 2/5 route will only be mass withdrawn once all of the EAD-per-ES routes are withdrawn. This rule is necessary, because a BGP speaker may serve dual roles as ASBR and PE
[Editorial note: If it is determined that no BGP speakers exist that do not normally follow the procedures in this document (Legacy speakers) then the following sub sections may be omitted]
If an EAD-per-EVI route without a OPE is withdrawn, it will mass withdraw all Type 2/5 routes without a OPE that have the same next-hop and the same RD as the EAD-per-EVI route. This is called mass-withdraw per EVI. Note, it is not the absence of the EAD-per-EVI route that causes mass-withdrawal, but the actual withdrawal itself. If the route was never there to begin with, then no withdrawal took place.
If any entity in the network rewrites an RD, then all entities must rewrite the RD in a consistent manner, such that routes with the same RD continue to have the same RD and routes with different RDs continue to have different RDs. Note that if this condition is violated, then other network functions would also break.
If a Type 2/5 route exists without a OPE and an EAD-per-EVI route exists with a OPE and it has the same next-hop and the same RD as the Type 2/5 route, then the Type 2/5 route shall inherit the OPE from the EAD-per-EVI route. Thereafter, Section 5.2 applies.
TBD
A Tunnel Encapsulation Attribute Tunnel Type for the OPE is required.
Thanks to Kiran Pillai, Patrice Brissette, Satya Mohanty and Keyur Patel for careful review and suggestions.
[Note to RFC editor: This appendix to be removed before publication]
The Extended Community to use must be transitive and either IPv4 Specific or IPv6 Specific as described in [RFC5701]. Thus, if it is IPv4 Specific, it will be of type 0x41 and if IPv6 Specific, it will be of type 0x40.
The Extended Community will hold the IP address of the PE that originates the EVPN routes.
A PE can be uniquely identified by its BGP identifier (also called Router ID) and its AS number (ASN). A Large Community [RFC8092] can be used to carry the BGP identifier and the ASN. A well known Large Community needs to be allocated for this. This allocation is for the Global Administrator field. The Local Data Part 1 field should carry ASN and the Local Data Part 2 should carry the BGP identifier.
It may be possible to associate the EAD-per-ES route with the Type 2/5 route by matching the Administrator Subfield of the RD. However, there are too many constraints that need to be met to make this method reliable. Basically, the RD was emphatically designed to distinguish routes, not to identify them. The constraints that need to be met are:
By allowing a single EAD-per-ES route to validate all EAD-per-EVI routes and all Type 2/5 routes, some of those routes may be falsely validated. However that is the best possible outcome without a OPE. It is transient until the Type 2/5 route can be withdrawn.
The possibility of the address space of PE next-hops in one AS overlapping that of another AS was raised. In such a case, the IP address of a PE in one AS may be the same as the IP address of a different PE in another AS. Because an ASBR overwrites next-hops, this can work. The OPE contains both the ASN as well as the IP address of the originating PE, so this works too. However, EVPN route types 3 and 4 contain only the originating router's IP address, but not the originating router's ASN. Therefore, EVPN route types 3 and 4 may also need a OPE.
The possibility of making the EAD-per-EVI route mandatory was raised. This would make some of the procedures easier, because the RD of the EAD-per-EVI route can be matched with the RD of the Type 2/5 route
[I-D.ietf-bess-evpn-prefix-advertisement] | Rabadan, J., Henderickx, W., Palislamovic, S. and A. Isaac, "IP Prefix Advertisement in EVPN", Internet-Draft draft-ietf-bess-evpn-prefix-advertisement-02, September 2015. |
[I-D.ietf-idr-tunnel-encaps] | Rosen, E., Patel, K. and G. Velde, "The BGP Tunnel Encapsulation Attribute", Internet-Draft draft-ietf-idr-tunnel-encaps-02, May 2016. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC4364] | Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006. |
[RFC5701] | Rekhter, Y., "IPv6 Address Specific BGP Extended Community Attribute", RFC 5701, DOI 10.17487/RFC5701, November 2009. |
[RFC7432] | Sajassi, A., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J. and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015. |
[RFC8092] | Heitz, J., Snijders, J., Patel, K., Bagdonas, I. and N. Hilliard, "BGP Large Communities Attribute", RFC 8092, DOI 10.17487/RFC8092, February 2017. |