BESS WG Y. Wang
Internet-Draft ZTE Corporation
Intended status: Standards Track May 17, 2020
Expires: November 18, 2020

Reduction of EVPN C-MAC Overload
draft-wang-bess-evpn-cmac-overload-reduction-00

Abstract

When there are too many customer-MACs (C-MACs), the RRs and/or ASBRs will be overloaded by the RT-2 routes for these MACs according to [I-D.dawra-bess-srv6-services]. This issue can be simply solved by making the remote C-MAC entries learnt via data-plane MAC learning (like what PBB VPLS have been done since [RFC7041]) rather than received from RT-2 routes. This simplified solution will works as well as PBB VPLS. But this simplified solution will lose many important features that based on the ESI concept. Because the ingress-ESI can't be learnt via data-plane MAC learning at the egress PE. So when the data packets is forwarded following these MAC entries, they can't benefit from the EAD/EVI routes as per RFC7432. So the All-Active Redundancy mode for ES can't be supported. This make the simplified solution can't work as well as PBB EVPN ([RFC7623]).

This document proposes a new SRv6 function type and an extension to [I-D.dawra-bess-srv6-services] to achieve all-active mode ES redundancy on TPEs and reduce the C-MAC loads for RRs and ASBRs. The new solution will work even more better than PBB EVPN under the help of these extensions.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on November 18, 2020.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

In [I-D.dawra-bess-srv6-services], an extension to [RFC7432] is proposed for SRv6 EVPN control-plane. In this control-plane the C-MACs is advertised via RT-2 route, but in order to solve the C-MAC overload problem for RRs and ASBRs, we have to return to a PBB-like dataplane C-MAC learning procedures.

This document introduce an "ESI Indicator" concept to the EVPN data-plane. We can recognize an ESI from its ESI-indicator. But an ESI may have a few ESI-indicators, each for a TPE, espacially in the single-active mode of ES redundancy.

Then we introduce a SRv6 function named End.ESI to carry the ESI-indicator in SRv6 dataplane. A SID with the End.ESI function is called as an "ESI SID" in this document. The ESI-indicator is the locator and fuction part of its ESI SID. The argument part of the ESI SID is a Global Discreminating Value (GDV) for an EVI. The GDV works like the function part of an End.DT2U/DT2M SID. But the GDV has a global meaning like a global VNI or an PBB ISID but the function part for an End.DT2U/DT2M SID typically is only a local discreminator on the egress PE. The argument part of the ESI SID is called as Arg.GED in this document, where the Global EVI discreminator is abbreviated as GED.

1.1. Terminology

Most of the terminology used in this documents comes from [RFC7432] and [I-D.dawra-bess-srv6-services] except for the following:

C-MAC: Customer MAC, it is the same as the C-MAC of PBB EVPN, But there is no B-MAC in this document.

ISID: a broadcast domain identifier in PBB I-Component.

LDV: Local Discreminating Value. It is similar to the Local Discreminating Value of type 3 ESI.

GDV: Global Discreminating Value. An identifier with global uniqueness.

GED: Global EVI Discreminator, a GDV for an EVI instance.

ESI Indicator: A Global ID for an ESI. Note that different PE may assign different ESI-indicator for the same ESI, espacially when the ES redundancy mode is single-active.

GEI: Global ESI Indicator. It is the same as the "ESI Indicator" except for the emphasization to its global uniqueness.

EAD/EVI: An Ethernet A-D route per EVI.

GEI/EVI: An EAD/EVI route with an Gloabal ESI Indicator.

Arg.GED: The argument part of a SID of the End.ESI function.

RT-2: MAC/IP Advertise Route.

ESI/IP: RT-2 Route whose IP field of the NLRI is a ESI-indicator.

MAC Entry: An entry in the EVPN MAC table in data-plane.

ESI IP: An End.ESI SID with its Argument part being set to zero.

2. Control Plane

We assign a GED to an EVI instance EVI_1, the GED is a number consists of N bits. We assign an ESI-indicator I1 with ESI1 on PE1, and we assign an ESI-indicator I2 with ESI1 on PE2. We call the relationship between ESI1 and its two ESI-indicators as ESI1_I1 and ESI1_I2 respectively.

                                 +----------+
                   PE1           |          |
              +-------------+    |          |
              | ESI1_I1     |    |          |         PE3
             /|             |----|          |   +-------------+
            / |             |    |   IP     |   |             |
       LAG /  +-------------+    | Backbone |   |     ESI2_I3 |---CE2
   CE1=====                      |   with   |   |             |
           \  +-------------+    |   EVPN   |---|             |
            \ |             |    |   RRs    |   +-------------+
             \|             |----|   and    |
              | ESI1_I2     |    |   ASBRs  |
              +-------------+    |          |
                   PE2           |          |
                                 +----------+

Figure 1: EVPN MAC Reduction Usecase

We use IMET routes to build a broadcast-list. The broadcast-list is used to forward BUM traffics. The data-plane MAC learning for BUM traffics produces the first batch of C-MAC entries. The subsequent C-MAC entries can be learnt from Unicast traffics and/or BUM traffics. It is clear that we don't use MAC/IP routes as usual for fear that the RRs and/or ASBRs are overloaded by these C-MACs.

The SRv6 SID in IMET route is an End.DT2M SID with a zero argument length. The I1 and I2 are SRv6 SID of End.ESI function that is defined in the following figure. We use IGP protocols to advertise I1 and I2 to PE3 respectively in SRv6 underlay. So we don't use EAD/ES route or EAD/EVI route in SRv6 EVPN in this section. If ESI1 is single-active mode, I1 is different from I2, but if ESI1 is all-active mode, I1 is the same as I2.


    |       ESI-Indicator(128-N bits)     |        N bits           |
    +------------+------------+-----------+-------------------------+
    |    Block   |   Node     | ESI.LDV   |        Arg.GED          |
    +------------+------------+-----------+-------------------------+

Figure 2: End.ESI SID Format

Note that an ESI-indicator is composed of Locator and Function, an ESI IP is an 128 bits SID with a zero argument. The function part is a Local Discreminating Value on that PE for the ESI. The argument part is a Global EVI Discreminator (GED) for the data packet. The argument part is also called Arg.GED in this document.

3. Dataplane

3.1. PE1 forward ARP Request to PE2/PE3

When CE1 requests CE2's ARP, PE1 will receive the ARP Request from a AC of ESI1. PE1 will forward the ARP Request following the broadcast-list for the AC's EVI instance. The broadcast-list is constructed by IMET routes from PE2/PE3.

PE1 will forward the ARP Request to PE2/PE3 with the following SRv6 BE encapsulation: It's underlay Source IP is the End.ESI SID on PE1 for ESI1; It's underlay Destination IP is the End.DT2M SID on PE2/PE3. The locator and function part of the End.ESI SID is I1. The Argument part of the End.ESI SID is 0. The SMAC of the ARP request is M1 which is CE1's MAC address.

Note that the underlay SIP will be the End.DT2U SID for the single-homed ingress ACs. The multi-homed ingress ACs with single-active behavior may not be assigned with an ESI-indicator either. In such situations, the underlay SIP will be the End.DT2U SID too.

3.2. PE2/PE3's Dataplane MAC Learning

When PE2/PE3 receives the ARP Request packet, they do dataplane MAC learning independently. They will learn that M1 is behind I1, which is determined by underlay SIP of the ARP Request packet.

Note that when PE2 learns that M1 is behind I1, it will assume that M1 is behind the local AC with an ESI-indicator I1 too. The local AC may have more higher priority than the ESI-IP.

After the dataplane MAC learning, the ARP request packet is broadcasted to the local ACs, behind one of which is CE2.

3.3. PE2 Discard ARP Request to CE1

When ESI1 is all-active mode and PE2 is about to forward the ARP request to CE1, PE2 will find that the ESI indicator for the outgoing AC is also I1, so PE2 discards it for ESI loop-free considerations.

When ESI1 is single-active mode, the outgoing AC may be in blocking state, otherwise its corresponding sub-interface on CE1 will take charge of packet-drop behavior instead. So alghough the ESI indicator for the outgoing AC is not the same as I1, no loop will arise in the Ethernet Segment.

3.4. PE3 Forward ARP Replay to PE1/PE2

When CE2 replies to CE1 for the ARP request, PE3 will forward the ARP reply according to the MAC entry M1 learned previously as above.

PE3 will forward the ARP reply to PE1 with the following SRv6 BE encapsulation: It's underlay Source IP is the End.ESI SID on PE3 for ESI2; It's underlay Destination IP is the End.ESI SID on PE1 for ESI1 according to the MAC entry M1. The Arg.GED for the End.ESI SID in DIP is the Global EVI Discreminator (GED) configured on PE3. Note that the GED for the same EVI is configured with the same value on PE1/PE2/PE3.

When ESI1 is all-active mode, I1 will be the same as I2, so we call both of them I21 instead. The traffics to M1 will be load-balanced between PE1 and PE2 by the underlay network on PE3. Because I21 is advertised by both PE1 and PE2 in the underlay IGP protocol.

3.5. PE1 Forward ARP Replay to CE1

Whe PE1 received the SRv6 encapsulated ARP reply packet from PE3, PE1 first match the packet to the End.ESI SID of ESI1 by DIP, then match the packet to the EVI instance EVI_1 by Arg.GED. And PE1 will not discard it because the egress ESI indicator I1 is not the same as the ingress ESI indicator I3 in the SIP of the packet.

4. ESI Indicator Advertisement Optimization

Although we can advertise End.ESI SID in underlay IGP protocols, But it is better to use the SRv6 SID Structure Sub-Sub-TLV to indicate the length of the Arg.GED in the End.ESI SID.

So we can use EAD/EVI route to advertise Global ESI Indicator (GEI), these EAD/EVI routes is called as GEI/EVI route in this document. But we also can use MAC/IP route to advertise GEI, like what have been done by PBB EVPN's B-MAC advertisement procedures as per [RFC7623]. When the MAC/IP route is used to advertise GEI, only the IP field in its NLRI is used to identify a GEI, so the MAC field in its NLRI can be set to zero. Such MAC/IP route is called as ESI/IP route in this document. When the GEI/EVI route is used to advertise GEI, the End.ESI SID is encapsulated its SRv6 L2 Service TLV, not in its nexthop.

Either GEI/EVI routes or ESI/IP routes will be advertised/imported for Global Routing Table (GRT), so their Route-Targets (RT) will be configured with GRT. Because there isn't a dedicated B-component like PBB VPLS and PBB EVPN.

Although GEIs is imported to GRT, they are awared only on PE nodes, the transit nodes in underlay network won't be aware of GEIs in order to reduce the FIB consumption. We can use the argument length in the SRv6 SID Structure Sub-Sub-TLV to check whether the GED is too big for the End.ESI SID, So we can avoid the destruction to the function part of the End.ESI and we can use flexible GED length.

5. C-MAC Flush Notification Procedure

The withdraw of ESI Indicator Advertisement can be used as C-MAC flush notification like what have been done by [RFC8317] and [I-D.snr-bess-pbb-evpn-isid-cmacflush].

6. E-Tree Support Considerations

E-tree Supprot extensions is similar to [RFC8317] section 5 except for the following notable differences: The B-MAC is replaced by GEIs, the PBB encapsulation is replaced by SRv6 encapsulation, the B-component is replaced by underlay GRT. The B-MAC Advertisement Route is replaced by GEI/EVI route or ESI/IP Route.

7. EVPN IRB Support Considerations

The PBB-VPLS/PBB-EVPN is not friendly to IRB usecase because of its complicated Protocol Stack, so it is used only in pure L2VPN usecase up to now in the industry. But the dataplane in this draft is no more complex with typical SRv6 EVPN. So it will work as efficient as we should expect in SRv6 EVPN IRB usecase.

8. Use End.ESI SID in MAC/IP Advertisement Routes

In [I-D.dawra-bess-srv6-services] the downstream assigned ESI label is encapsulated in the Arg.FE2 part of End.DT2M SID, And the ESI label present as Arg.FE2 only when the egress PE is adjacent with the ingress ESI. So it is difficult (if not impossible) to do data-plane C-MAC learning via End.DT2M SID and its unwarranted Arg.FE2 presence. Alghough upstream assigned ESI label may be used to learn ingress ESI-indicator on egress PE node, other issues will arise together.

But the End.ESI SID can be used in MAC/IP advertisement route, only if C-MAC overload is not a real threat. By doing this, the data-plane can be unified among these usecases. The details for using End.ESI SID in MAC/IP Advertisement Route will be described in future versions.

9. Security Considerations

This document does not introduce any new security considerations other than already discussed in [RFC7432] and [RFC7623].

10. IANA Considerations

There is no IANA consideration.

11. References

11.1. Normative References

[I-D.dawra-bess-srv6-services] Dawra, G., Filsfils, C., Brissette, P., Agrawal, S., Leddy, J., daniel.voyer@bell.ca, d., daniel.bernier@bell.ca, d., Steinberg, D., Raszuk, R., Decraene, B., Matsushima, S., Zhuang, S. and J. Rabadan, "SRv6 BGP based Overlay services", Internet-Draft draft-dawra-bess-srv6-services-02, July 2019.
[RFC7432] Sajassi, A., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J. and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015.
[RFC8317] Sajassi, A., Salam, S., Drake, J., Uttaro, J., Boutros, S. and J. Rabadan, "Ethernet-Tree (E-Tree) Support in Ethernet VPN (EVPN) and Provider Backbone Bridging EVPN (PBB-EVPN)", RFC 8317, DOI 10.17487/RFC8317, January 2018.

11.2. Informative References References

[I-D.snr-bess-pbb-evpn-isid-cmacflush] Rabadan, J., Sathappan, S., Nagaraj, K., Miyake, M. and T. Matsuda, "PBB-EVPN ISID-based CMAC-Flush", Internet-Draft draft-snr-bess-pbb-evpn-isid-cmacflush-06, July 2019.
[RFC7041] Balus, F., Sajassi, A. and N. Bitar, "Extensions to the Virtual Private LAN Service (VPLS) Provider Edge (PE) Model for Provider Backbone Bridging", RFC 7041, DOI 10.17487/RFC7041, November 2013.
[RFC7623] Sajassi, A., Salam, S., Bitar, N., Isaac, A. and W. Henderickx, "Provider Backbone Bridging Combined with Ethernet VPN (PBB-EVPN)", RFC 7623, DOI 10.17487/RFC7623, September 2015.

Author's Address

Yubao(Bob) Wang ZTE Corporation No. 50 Software Ave, Yuhuatai Distinct Nanjing China EMail: yubao.wang2008@hotmail.com