BESS Working Group | J. Drake |
Internet-Draft | Juniper Networks |
Intended status: Standards Track | A. Farrel |
Expires: November 11, 2019 | Old Dog Consulting |
May 10, 2019 |
BGP-LS Maps : A Framework for Network Slicing and Enhanced VPNs
draft-drake-bess-enhanced-vpn-01
Future networks that support advanced services, such as those enabled by 5G mobile networks, envision a set of overlay networks each with different performance and scaling properties. These overlays are known as network slices and are realized over a common underlay network.
In order to support network slicing, as well as to offer enhanced VPN services in general, it is necessary to define a mechanism by which specific resources (links and/or nodes) of an underlay network can be used by a specific network slice, VPN, or set of VPNs. This document sets out such a mechanism for use in Segment Routing networks.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 11, 2019.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Network slicing is an approach to network operations that builds on the concept of network abstraction to provide programmability, flexibility, and modularity. Driven largely by needs surfacing from 5G, the concept of network slicing has gained traction, for example in [TS23501] and [TS28530]. Network slicing requires the underlying network to support partitioning the network resources to provide the client with dedicated (private) networking, computing, and storage resources drawn from a shared pool. The slices may be seen as (and operated as) virtual networks.
Advanced services drive a need to create virtual networks with enhanced characteristics. The tenant of such a virtual network can require a degree of isolation and performance that previously could only be satisfied by dedicated networks. Additionally, the tenant may ask for some level of control to their virtual networks, e.g., to customize the service forwarding paths in the underlying network.
The concepts of "enhanced VPNs" and "network slicing" are introduced in [I-D.ietf-teas-enhanced-vpn].
In order to support network slicing, as well as to offer enhanced VPN services in general, it is necessary to define a mechanism by which specific resources (links and/or nodes) of an underlay network can be used by a specific network slice, VPN, or set of VPNs. This document sets out such a mechanism for use in Segment Routing networks [RFC8402] and builds on the ideas introduced in [I-D.ietf-idr-segment-routing-te-policy]. I.e., it generalizes that work to support multipoint-to-multipoint (MP2MP), point-to-multipoint (P2MP), and bidirectional point-to-point (P2P) topologies; it integrates BGP-based VPN support ([RFC4364], [RFC7432]); it supports DSCP as well a Color-based forwarding, and it uses BGP Link-State (BGP-LS) [RFC7752] to distribute topology information.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The approach is based on the use of DSCP-based forwarding in the underlay network [RFC2474]. For each VPN or sets of VPNs that are to use a given underlay network, a central network controller assigns resources per {link, DSCP} pair based upon the {source, destination, DSCP} traffic matrix. That is, each VPN or set of VPNs gets a subset, either dedicated or shared, of the resources in the underlay network.
It should be noted that resources can be assigned at any of the following granularities:
Once the central controller has determined the resource assignments, it distributes this information to the PEs that participate in each VPN using the usual VPN information dissemination tools, e.g., route targets (RT) [RFC4360], route reflectors (RR) [RFC4456], and RT constraints [RFC4684].
One way to distribute this information to those PEs is to give them a customized but limited view of the underlay network. (Note that giving each PE a full view of the underlay network does not help the PEs to manipulate the resources assigned for use by a particular slice or VPN, but providing a customized and limited view of those resources as a "virtual network" allows the PE to direct traffic over the designated resources as necessary to best deliver the end-to-end services.)
This information is distributed to the PEs by giving them a customized and limited view of the underlay network on the basis of a network slice, a VPN, or a set of VPNs. Each PE will have a complete view of the underlay network and this customized and limited view acts as filter on the underlay network telling the PE which underlay network resources it can use to direct the traffic of a given network slice, VPN, or set of VPNs to best deliver end-to-end services.
The resource allocation information is encoded using BGP-LS. This approach is chosen for the following reasons:
It should be noted that this mechanism also follows the scalability model of the existing BGP-based VPN infrastructure, which is that the per-VPN information is restricted to only those PE routers that are supporting that VPN and that the P routers have no per-VPN state.
Standard VPNs do not receive this resource allocation information and continue to use CSPF-based Weighted ECMP (WECMP) in the underlay network. This means that resources used by enhanced VPNs are reserved and are distinct from the resources used by the CSPF-based WECMP topology.
Additional to the programming of the PEs and its computation and assignment of resources for use by slices, VPN instances, or groups of VPNs, the central controller also instructs the P routers to makes actual allocation of resources per-DSCP.
We define a BGP-LS Map to be a BGP-LS encoded description of a subset of the links and nodes in the underlay network. A BGP-LS Map defines the topology for a network slice or a set of one or more VPNs. The topology connects a set of one or more VPNs and which is be used by the PEs in those VPNs to send packets. I.e., it connects the PEs in these VPNs and is used by them to send packets to each other. A given map is tagged with the route targets of the VPNs whose PEs are to import the map. A BGP-LS map is pushed southbound to these PEs by a network controller and may provide more than one path between a given ingress/egress PE pair.
Note that there will be multiple BGP-LS Maps in a given network deployment and that a given underlay network link or node may appear in more than one of them. In order to provide disambiguation AFI 16388 (BGP-LS) and SAFI 72 (BGP-LS-VPN) are used in BGP-LS UPDATE messages and the controller SHOULD allocate a different route distinguisher (RD) to each BGP-LS Map.
It is assumed that the underlay network is enabled for segment routing. Within a given VPN, when an ingress PE needs to send a packet to an egress PE it selects a path to that egress PE from the topology defined by the BGP-LS maps it has imported for that VPN and it specifies that path using a segment routing label stack.
To enable this function there is a need for a new attribute that is attached to a BGP-LS map update that contains a map ID, the version number, a map type (MP2MP, P2MP, or P2P), the total number of fragments in the map, and the specific fragment number of the piece in hand. That is, the total number of fragments in the map, and the fragment number of the piece currently in hand. It is assumed that a PE may import more than one BGP-LS map, that a given BGP-LS map may change over time, and that a given BGP-LS map may span multiple BGP updates. The map ID needs to be unique across the set of VPNs into which the BGP-LS map is to be imported.
A BGP-LS map that is created for a set of VPNs will contain a set of network resources sufficient to connect the PEs in each VPN in the set and each of the BGP-LS updates for the map MUST be tagged with the RT for each VPN in the set.
If a PE imports more than one BGP-LS map it may use the union of the links and nodes specified in each map when selecting a path. A PE should give precedence to BGP-LS maps of type P2MP and P2P when selecting a path. Routes targets specific to a given VPN/PE pair are needed for BGP-LS maps of type P2MP and P2P.
A given BGP-LS map may change in response to updates to the PE membership in a VPN to which the BGP-LS map applies or to updates to the underlay network. When this occurs, the network controller should push a new version of the affected BGP-LS maps. That is, it increments the version number of each BGP-LS map. This implies that the network controller needs to be connected to the route reflectors associated with the VPNs for which it is providing BGP-LS maps.
A BGP-LS map cannot be used by a PE until it is completely assembled. If the BGP-LS map that is being assembled is a newer version of a BGP-LS map that the PE is currently using, the PE should continue to use its current version of the BGP-LS map until the newer version is completely assembled.
When selecting a path using one or more BGP-LS maps, an ingress PE can use a link or node only if it is active in the underlay network. If this precludes connectivity to the egress PE it may use links and nodes in the CSPF-based WECMP underlay network topology nominally allocated to non-enhanced VPN traffic.
Additionally, when there is a newly activated PE it will not be present in any of the BGP-LS maps used by the other PEs. Until a new BGP-LS map or maps that contain that PE has been distributed, other PEs will have to use these links and nodes to reach the newly activated PE and it will have to use these links and nodes to reach other PEs.
[RFC4271] defines the BGP Path attribute. This document introduces a new Optional Transitive Path attribute called the BGP-LS Map attribute with value TBD1 to be assigned by IANA.
The first BGP-LS Map attribute MUST be processed and subsequent instances MUST be ignored.
The common fields of the BGP-LS Map attribute are set as follows:
The content of the BGP-LS Map attribute is a series of Type-Length-Value (TLV) constructs. Each TLV may include sub-TLVs. All TLVs and sub-TLVs have a common format that is:
The formats of the TLVs defined in this document are shown in the following sections. The presence rules and meanings are as follows.
The BGP-LS Map attribute MUST contain exactly one Map TLV. Its format is shown in Figure 1. Note that a given BGP-LS map may span multiple UPDATE messages and the Topology, Version Number, and the Number of Fragments fields in the BGP-LS Map attribute contained in each UPDATE message MUST be set to the same value or the BGP-LS map is unusable.
+--------------------------------------------+ | Type = 1 (1 octet) | +--------------------------------------------+ | Length (2 octets) | +--------------------------------------------+ | Topology (1 Octet) | +--------------------------------------------+ | ID (4 Octets) | +--------------------------------------------+ | Version Number (4 Octets) | +--------------------------------------------+ | Number of Fragments (4 Octets) | +--------------------------------------------+ | Fragment Number (4 Octets) | +--------------------------------------------+
Figure 1: The Map TLV Format
The fields are as follows:
The DSCP List TLV MAY be included in the BGP-LS Map attribute. If included, a packet whose DSCP matches a DSCP in the DSCP list is to be forwarded using the BGP-LS map defined by the containing BGP-LS Map attribute. The first DSCP List TLV MUST be processed and subsequent instances MUST be ignored. The format of the DSCP List TLV is shown in Figure 2.
+--------------------------------------------+ | Type = 2 (1 octet) | +--------------------------------------------+ | Length (2 octets) | +--------------------------------------------+ | DSCP List (variable) | +--------------------------------------------+
Figure 2: The DSCP List TLV Format
The fields are as follows:
The Color List TLV MAY be included in the BGP-LS Map attribute. If a BGP UPDATE contains a Color extended community with a color (as defined by [RFC5512]) that matches an entry in the Color List, then a packet whose destination is covered by one of the routes in that UPDATE is to be forwarded using the BGP-LS map defined by the containing BGP-LS Map attribute. The first Color List TLV MUST be processed and subsequent instances MUST be ignored. The format of the Color List TLV is shown in Figure 3.
Note that if both a DSCP List and a Color List TLV are included in a BGP-LS Map attribute, packets matching an entry in either list are to be forwarded using the BGP-LS map defined by the containing BGP-LS Map attribute.
+--------------------------------------------+ | Type = 3 (1 octet) | +--------------------------------------------+ | Length (2 octets) | +--------------------------------------------+ | Color List (variable) | +--------------------------------------------+
Figure 3: The Color List TLV Format
The fields are as follows:
The Root TLV MUST be included in the BGP-LS Map attribute if its topology is of type P2MP or P2P unidirectional. It defines the root node for that topology and if it is not present the BGP-LS map is unusable. The TLV, if present, MUST be ignored if the topology is of type MP2MP or P2P bidirectional.
The Root TLV is structured as shown in Figure 4 and MAY contain any of the sub-TLVs defined in section 3.2.1.4 of [RFC7752].
+--------------------------------------------+ | Type = 3 (1 octet) | +--------------------------------------------+ | Length (2 octets) | +--------------------------------------------+ | Sub-TLVs (variable) | +--------------------------------------------+
Figure 4: The Root TLV Format
The fields are as follows:
Section 6 of [RFC4271] describes the handling of malformed BGP attributes, or those that are in error in some way. [RFC7606] revises BGP error handling specifically for the for UPDATE message, provides guidelines for the authors of documents defining new attributes, and revises the error handling procedures for a number of existing attributes. This document introduces the BGP-LS Map attribute and so defines error handling as follows:
TBD
Figure 5shows a sample underlay topology. Six PEs (PE1 through PE6) are connected across a network of twelve P nodes (P1 through P12). Each PE is dual-homed, and the P nodes are variously connected so that there are multiple routes between PEs.
PE3 PE4 |\ /| | \ / | | \ / | | \/ | | /\ | | / \ | | / \ | |/ \| P1--------P2 / |\ /| \ / | \ / | \ / | \ / | \ / | \/ | \ P3-------P4--------P5-------P6 | / | /\ | \ | | / | / \ | \ | | / | / \ | \ | |/ |/ \| \| P7---P8--P9--------P10-P11-P12 |\ /| |\ /| | \/ | | \/ | | /\ | | /\ | |/ \| |/ \| PE1 PE2 PE5 PE6
Figure 5: Underlay Network Topology
Figure 6 shows how a Multi-point-to-multipoint (MP2MP) service that connects PE1, PE3, and PE6 can be installed over the underlay network. Path have been computed so that, for example, PE1 is connected to both PE3 and PE6 via a pair of redundant paths. Similarly, PE3 is connected to PE1 and PE6, and PE6 is connected to PE1 and PE3.
PE3 PE4 | \ | \ | \ | \ | \ | \ | \ | \ P1 P2 / \ /| / \ / | / \ / | / \ / | P3 P4 X P5 P6 | / \ \ | / \ \ | / \ \ | / \ \ P7 P8--P9---------P10-P11 P12 | / \ | | / \ | | / \ | |/ \| PE1 PE2 PE5 PE6
Figure 6: An MP2MP Service Installed at PE1, PE3, and PE6
Figure 7 shows the provision of a Point-to-Multipoint (P2MP) rooted at PE3 and connected to PE1 and PE6. As in the previous example, a redundant pair of paths is established between PE3 and each of PE1 and PE6. Thus, the two paths from PE3 to PE1 are PE3-P1-P4-P7-PE1 and PE3-P2-P9-P8-PE1.
PE3 PE4 | \ | \ | \ | \ | \ | \ | \ | \ P1 P2 |\ / \ | \ / \ | \ / \ | \ / \ P3 P4 X P5 P6 / / \ | / / \ | / / \ | / / \ | P7---P8--P9 P10-P11 P12 | / \ | | / \ | | / \ | |/ \| PE1 PE2 PE5 PE6
Figure 7: A P2MP Unidirectional Service Installed at PE3
Figure 8 shows a Point-to-Point (P2P) service rooted at PE1 and connected to PE3. This is equivalent to a Segment Routing Traffic Engineering (SR TE) Policy [I-D.ietf-idr-segment-routing-te-policy] installed at PE1.
As in the previous examples, a pair of redundant paths are computed.
PE3 PE4 |\ | \ | \ | \ | \ | \ | \ | \ P1 P2 | | | | | | | | P3 P4 P5 P6 / | / | / | / | P7 P8--P9--------P10 P11 P12 | / | / | / |/ PE1 PE2 PE5 PE6
Figure 8: A P2P Unidirectional Service (SR TE Policy) Installed at PE1
Figure 9 show a bidirectional P2P service connecting PE1 and PE6. This is equivalent to a Segment Routing Traffic Engineering (SR TE) Policy [I-D.ietf-idr-segment-routing-te-policy] installed at PE1 and PE6.
PE3 PE4 P1 P2 P3 P4--------P5 P6 / \ / \ / \ / \ P7 P8--P9--------P10-P11 P12 | / \ | | / \ | | / \ | |/ \| PE1 PE2 PE5 PE6
Figure 9: A P2P Bidirectional Service Installed at PE1 and PE6
TBD
IANA maintains a registry of "Border Gateway Protocol (BGP) Parameters" with a subregistry of "BGP Path Attributes". IANA is requested to assign a new Path attribute called "BGP-LS Map attribute" (TBD1 in this document) with this document as a reference.
IANA maintains a registry of "Border Gateway Protocol (BGP) Parameters". IANA is request to create a new subregistry called the "BGP-LS Map attribute TLVs" registry.
Valid values are in the range 0 to 255.
This document should be given as a reference for this registry. The new registry should track:
The registry should initially be populated as follows:
Type | Name | Reference | Date ------+-------------------------+---------------+--------------- 1 | Map TLV | [This.I-D] | Date-to-be-set 2 | DSCP List TLV | [This.I-D] | Date-to-be-set 3 | Color List TLV | [This.I-D] | Date-to-be-set 4 | Root TLV | [This.I-D] | Date-to-be-set
The authors are grateful to all those who contributed to the discussions that led to this work: Ron Bonica, Stewart Bryant, Jie Dong, and Keyur Patel.
The following people contributed text to this document:
A N Other Email: another@foocorp.doc