BESS Working Group A. Farrel
Internet-Draft Old Dog Consulting
Intended status: Standards Track J. Drake
Expires: February 13, 2021 E. Rosen
Juniper Networks
K. Patel
Arrcus, Inc.
L. Jalil
Verizon
August 12, 2020

Gateway Auto-Discovery and Route Advertisement for Segment Routing Enabled Domain Interconnection
draft-ietf-bess-datacenter-gateway-08

Abstract

Data centers are critical components of the infrastructure used by network operators to provide services to their customers. Data centers are attached to the Internet or a backbone network by gateway routers. One data center typically has more than one gateway for commercial, load balancing, and resiliency reasons.

Segment Routing is a protocol mechanism that can be used within a data center, and also for steering traffic that flows between two data center sites. In order that one data center site may load balance the traffic it sends to another data center site, it needs to know the complete set of gateway routers at the remote data center, the points of connection from those gateways to the backbone network, and the connectivity across the backbone network.

Segment Routing may also be operated in other domains, such as access networks. Those domains also need to be connected across backbone networks through gateways.

This document defines a mechanism using the BGP Tunnel Encapsulation attribute to allow each gateway router to advertise the routes to the prefixes in the Segment Routing domains to which it provides access, and also to advertise on behalf of each other gateway to the same Segment Routing domain.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on February 13, 2021.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Data centers (DCs) are critical components of the infrastructure used by network operators to provide services to their customers. DCs are attached to the Internet or a backbone network by gateway routers (GWs). One DC typically has more than one GW for various reasons including commercial preferences, load balancing, and resiliency against connection of device failure.

Segment Routing (SR) [RFC8402] is a protocol mechanism that can be used within a DC, and also for steering traffic that flows between two DC sites. In order for a source (ingress) DC that uses SR to load balance the flows it sends to a destination (egress) DC, it needs to know the complete set of entry nodes (i.e., GWs) for that egress DC from the backbone network connecting the two DCs. Note that it is assumed that the connected set of DCs and the backbone network connecting them are part of the same SR BGP Link State (LS) instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so that traffic engineering using SR may be used for these flows.

SR may also be operated in other domains, such as access networks. Those domains also need to be connected across backbone networks through gateways. For illustrative purposes, consider the Ingress and Egress SR Domains shown in Figure 1 as spearate ASes. The various ASes that provide connectivity between the Ingress and Egress Domains could each be constructed differently and use different technologies such as IP, MPLS with global table routing native BGP to the edge, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN.

Suppose that there are two gateways, GW1 and GW2 as shown in Figure 1, for a given egress SR domain and that they each advertise a route to prefix X which is located within the egress SR domain with each setting itself as next hop. One might think that the GWs for X could be inferred from the routes' next hop fields, but typically it is not the case that both routes get distributed across the backbone: rather only the best route, as selected by BGP, is distributed. This precludes load balancing flows across both GWs.

         
      -----------------                    ---------------------
     | Ingress         |                  | Egress     ------   |
     | SR Domain       |                  | SR Domain |Prefix|  |
     |                 |                  |           |   X  |  |
     |                 |                  |            ------   |
     |       --        |                  |   ---          ---  |
     |      |GW|       |                  |  |GW1|        |GW2| |
      -------++--------                    ----+-----------+-+--
             | \                               |          /  |
             |  \                              |         /   |
             |  -+-------------        --------+--------+--  |
             | ||ASBR|     ----|      |----  |ASBR| |ASBR| | |
             | | ----     |ASBR+------+ASBR|  ----   ----  | |
             | |           ----|      |----                | |
             | |               |      |                    | |
             | |           ----|      |----                | |
             | | AS1      |ASBR+------+ASBR|           AS2 | |
             | |           ----|      |----                | |
             |  ---------------        --------------------  |
           --+-----------------------------------------------+--
          | |ASBR|                                       |ASBR| |
          |  ----               AS3                       ----  |
          |                                                     |
           -----------------------------------------------------
         
       

Figure 1: Example Segment Routing Domain Interconnection

The obvious solution to this problem is to use the BGP feature that allows the advertisement of multiple paths in BGP (known as Add-Paths) [RFC7911] to ensure that all routes to X get advertised by BGP. However, even if this is done, the identity of the GWs will be lost as soon as the routes get distributed through an Autonomous System Border Router (ASBR) that will set itself to be the next hop. And if there are multiple Autonomous Systems (ASes) in the backbone, not only will the next hop change several times, but the Add-Paths technique will experience scaling issues. This all means that the Add-Paths approach is limited to SR domains connected over a single AS.

This document defines a solution that overcomes this limitation and works equally well with a backbone constructed from one or more ASes. The solution uses the Tunnel Encapsulation attribute [I-D.ietf-idr-tunnel-encaps] as follows:

In other words, each route advertised by a GW identifies all of the GWs to the same SR domain (see Section 3 for a discussion of how GWs discover each other). Therefore, even if only one of the routes is distributed to other ASes, it will not matter how many times the next hop changes, as the Tunnel Encapsulation attribute (and its remote endpoint sub-TLVs) will remain unchanged.

To put this in the context of Figure 1, GW1 and GW2 discover each other as gateways for the egress SR domain. Both GW1 and GW2 advertise themselves as having routes to prefix X. Furthermore, GW1 includes a Tunnel Encapsulation attribute with a tunnel instance of type "SR tunnel" for itself and another for GW2. Similarly, GW2 includes a Tunnel Encapsulation for itself and another for GW1. The gateway in the ingress SR domain can now see all possible paths to the egress SR domain regardless of which route advertisement is propagated to it, and it can choose one, or balance traffic flows as it sees fit.

The protocol extensions defined in this document are put into the broader context of SR domain interconnection by [I-D.farrel-spring-sr-domain-interconnect]. That document shows how other existing protocol elements may be combined with the extensions defined in this document to provide a full system.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. SR Domain Gateway Auto-Discovery

To allow a given SR domain's GWs to auto-discover each other and to coordinate their operations, the following procedures are implemented:

The auto-discovery route that each GW advertises consists of the following:

To avoid the side effect of applying the Tunnel Encapsulation attribute to any packet that is addressed to the GW itself, the GW SHOULD use a different loopback address for the two cases.

As described in Section 1, each GW will include a Tunnel Encapsulation attribute for each GW that is active for the SR domain (including itself), and will include these in every route advertised externally to the SR domain by each GW. As the current set of active GWs changes (due to the addition of a new GW or the failure/removal of an existing GW) each externally advertised route will be re-advertised with the set of SR tunnel instances reflecting the current set of active GWs.

If a gateway becomes disconnected from the backbone network, or if the SR domain operator decides to terminate the gateway's activity, it withdraws the advertisements described above. This means that remote gateways at other sites will stop seeing advertisements from this gateway. It also means that other local gateways at this site will "unlearn" the removed gateway and stop including a Tunnel Encapsulation attribute for the removed gateway in their advertisements.

Note that if a GW is (mis)configured with a different SR domain identifier from the other GWs to the same domain then it will not be auto-discovered by the other GWs (and will not auto-discover the other GWs). This would result in the in a receiver just getting the best route with only the advertising node's tunnel encapsulation information.

4. Relationship to BGP Link State and Egress Peer Engineering

When a remote GW receives a route to a prefix X it can use the SR tunnel instances within the contained Tunnel Encapsulation attribute to identify the GWs through which X can be reached. It uses this information to compute SR Traffic Engineering (SR TE) paths across the backbone network looking at the information advertised to it in SR BGP Link State (BGP-LS) [I-D.ietf-idr-bgp-ls-segment-routing-ext] and correlated using the SR domain identity. SR Egress Peer Engineering (EPE) [I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement the information advertised in BGP-LS.

5. Advertising an SR Domain Route Externally

When a packet destined for prefix X is sent on an SR TE path to a GW for the SR domain containing X (that is, the packet is sent in the Ingress Domain on an SR TE path that describes the path including within the Egress Domain), it needs to carry the receiving GW's label for X such that this label rises to the top of the stack before the GW completes its processing of the packet. To achieve this we place a Prefix SID sub-TLV [I-D.ietf-idr-tunnel-encaps] for X in each SR tunnel instance in the Tunnel Encapsulation attribute in the externally advertised route for X.

Alternatively, if the GWs for a given SR domain are configured to allow remote GWs to perform SR TE through that SR domain for a prefix X, then each GW computes an SR TE path through that SR domain to X from each of the currently active GWs, and places each in an MPLS label stack sub-TLV [I-D.ietf-idr-tunnel-encaps] in the SR tunnel instance for that GW.

Please refer to Section 7 of [I-D.farrel-spring-sr-domain-interconnect] for worked examples of how the label stack is consructed in this case, and how the advertisements would work.

6. Encapsulation

If the GWs for a given SR domain are configured to allow remote GWs to send them a packet in that SR domain's native encapsulation, then each GW will also include multiple instances of a tunnel TLV for that native encapsulation in externally advertised routes: one for each GW and each containing a remote endpoint sub-TLV with that GW's address. A remote GW may then encapsulate a packet according to the rules defined via the sub-TLVs included in each of the tunnel TLV instances.

7. IANA Considerations

7.1. Tunnel Encapsulation Tunnel Type

IANA maintains a registry called "Border Gateway Protocol (BGP) Parameters" with a sub-registry called "BGP Tunnel Encapsulation Attribute Tunnel Types." The registration policy for this registry is First-Come First-Served [RFC8126].

IANA has assigned the value 17 from this sub-registry for "SR Tunnel".

7.2. Tunnel Encapsulation Sub-TLVs

IANA maintains a registry called "Border Gateway Protocol (BGP) Parameters" with a sub-registry called "BGP Tunnel Encapsulation Attribute Sub-TLVs." The registration policy for this registry is Standards Action.[RFC8126].

IANA is requested to assign a codepoint from this sub-registry for "SR Tunnel TLV" (TBD1). The next available value may be used and reference should be made to this document.

8. Security Considerations

From a protocol point of view, the mechanisms described in this document can leverage the security mechanisms already defined for BGP. Further discussion of security considerations for BGP may be found in the BGP specification itself [RFC4271] and in the security analysis for BGP [RFC4272]. The original discussion of the use of the TCP MD5 signature option to protect BGP sessions is found in [RFC5925], while [RFC6952] includes an analysis of BGP keying and authentication issues.

The mechanisms described in this document involve sharing routing or reachability information between domains: that may mean disclosing information that is normally contained within a domain. So it needs to be understood that normal security paradigms based on the boundaries of domains are weakened. Discussion of these issues with respect to VPNs can be found in [RFC4364], while [RFC7926] describes many of the issues associated with the exchange of topology or TE information between domains.

Particular exposures resulting from this work include:

All of the issues in the list above could cause disruption to domain interconnection, but are not new protocol vulnerabilities so much as new exposures of information that SHOULD be protected against using existing protocol mechanisms. Furthermore, it is a general observation that if these attacks are possible then it is highly likely that far more significant attacks can be made on the routing system. It should be noted that BGP peerings are not discovered, but always arise from explicit configuration.

9. Manageability Considerations

The principal configuration item added by this solution is the allocation of an SR domain identifier. The same identifier MUST be assigned to every GW to the same domain, and each domain MUST have a different identifier. This requires coordination, probably through a central management agent.

It should be noted that BGP peerings are not discovered, but always arise from explicit configuration. This is no different from any other BGP operation.

9.1. Relationship to Route Target Constraint

In order to limit the VPN routing information that is maintained at a given route reflector, [RFC4364] suggests the use of "Cooperative Route Filtering" [RFC5291] between route reflectors. [RFC4684] defines an exension to that mechanism to include support for multiple autonomous systems and asymmetric VPN topologies such as hub-and-spoke. The mechanism in RFC 4684 is known as Route Target Constraint (RTC).

An operator would not normally configure RTC by default for any AFI/SAFI combination, and would only enable it after careful consideration. When using the mechanisms defined in this document, the operator should consider carefully the effects of filtering routes. In some cases this may be desirable, and in others it could limit the effectiveness of the procedures.

10. Acknowledgements

Thanks to Bruno Rijsman, Stephane Litkowsji, Boris Hassanov, Linda Dunbar, Ravi Singh, and Gyan Mishra for review comments, and to Robert Raszuk for useful discussions.

11. References

11.1. Normative References

[I-D.ietf-idr-bgpls-segment-routing-epe] Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, S. and J. Dong, "BGP-LS extensions for Segment Routing BGP Egress Peer Engineering", Internet-Draft draft-ietf-idr-bgpls-segment-routing-epe-19, May 2019.
[I-D.ietf-idr-tunnel-encaps] Patel, K., Velde, G., Sangli, S. and J. Scudder, "The BGP Tunnel Encapsulation Attribute", Internet-Draft draft-ietf-idr-tunnel-encaps-17, July 2020.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC4271] Rekhter, Y., Li, T. and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006.
[RFC4360] Sangli, S., Tappan, D. and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February 2006.
[RFC5925] Touch, J., Mankin, A. and R. Bonica, "The TCP Authentication Option", RFC 5925, DOI 10.17487/RFC5925, June 2010.
[RFC7752] Gredler, H., Medved, J., Previdi, S., Farrel, A. and S. Ray, "North-Bound Distribution of Link-State and Traffic Engineering (TE) Information Using BGP", RFC 7752, DOI 10.17487/RFC7752, March 2016.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017.

11.2. Informative References

[I-D.farrel-spring-sr-domain-interconnect] Farrel, A. and J. Drake, "Interconnection of Segment Routing Domains - Problem Statement and Solution Landscape", Internet-Draft draft-farrel-spring-sr-domain-interconnect-05, October 2018.
[I-D.ietf-idr-bgp-ls-segment-routing-ext] Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H. and M. Chen, "BGP Link-State extensions for Segment Routing", Internet-Draft draft-ietf-idr-bgp-ls-segment-routing-ext-16, June 2019.
[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", RFC 4272, DOI 10.17487/RFC4272, January 2006.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006.
[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K. and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684, November 2006.
[RFC5291] Chen, E. and Y. Rekhter, "Outbound Route Filtering Capability for BGP-4", RFC 5291, DOI 10.17487/RFC5291, August 2008.
[RFC6952] Jethanandani, M., Patel, K. and L. Zheng, "Analysis of BGP, LDP, PCEP, and MSDP Issues According to the Keying and Authentication for Routing Protocols (KARP) Design Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013.
[RFC7911] Walton, D., Retana, A., Chen, E. and J. Scudder, "Advertisement of Multiple Paths in BGP", RFC 7911, DOI 10.17487/RFC7911, July 2016.
[RFC7926] Farrel, A., Drake, J., Bitar, N., Swallow, G., Ceccarelli, D. and X. Zhang, "Problem Statement and Architecture for Information Exchange between Interconnected Traffic-Engineered Networks", BCP 206, RFC 7926, DOI 10.17487/RFC7926, July 2016.
[RFC8126] Cotton, M., Leiba, B. and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017.
[RFC8402] Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., Litkowski, S. and R. Shakir, "Segment Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, July 2018.

Authors' Addresses

Adrian Farrel Old Dog Consulting EMail: adrian@olddog.co.uk
John Drake Juniper Networks EMail: jdrake@juniper.net
Eric Rosen Juniper Networks EMail: erosen52@gmail.com
Keyur Patel Arrcus, Inc. EMail: keyur@arrcus.com
Luay Jalil Verizon EMail: luay.jalil@verizon.com