Internet DRAFT - draft-gredler-idr-bgplu-prefix-sid
draft-gredler-idr-bgplu-prefix-sid
Inter-Domain Routing H. Gredler
Internet-Draft Juniper Networks, Inc.
Intended status: Standards Track March 9, 2015
Expires: September 10, 2015
Prefix-SID extensions for BGP-LU
draft-gredler-idr-bgplu-prefix-sid-00
Abstract
The MPLS source routing paradigm provides path control for both
intra- and inter- Autonomous System (AS) traffic. In most MPLS
deployments the ingress of a MPLS tunnel is an IP router.
Availability of MPLS forwarding stacks for host operating systems is
extending the MPLS perimeter to Hypervisors and Servers. Recent Data
Center designs are using an IGP-less routing paradigm based on
massive ECMP multi path using external BGP. This documents outlines
how Hypervisors and Servers may interact with the MPLS control- and
data plane using extensions to the BGP labeled unicast protocol (BGP-
LU).
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 10, 2015.
Gredler Expires September 10, 2015 [Page 1]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Motivation, Rationale and Applicability . . . . . . . . . . . 3
3. Deployment Considerations . . . . . . . . . . . . . . . . . . 4
3.1. Control plane restart . . . . . . . . . . . . . . . . . . 4
3.2. BGP-LU as Server Control Plane . . . . . . . . . . . . . 5
3.3. Labeled-ARP as Server Control Plane . . . . . . . . . . . 5
3.4. Static Labels and Controller as Server Control Plane . . 5
4. BGP Prefix-SID Attribute . . . . . . . . . . . . . . . . . . 5
4.1. Label Index TLV . . . . . . . . . . . . . . . . . . . . . 6
4.2. Label Base TLV . . . . . . . . . . . . . . . . . . . . . 7
4.3. Label Range TLV . . . . . . . . . . . . . . . . . . . . . 7
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
7. Security Considerations . . . . . . . . . . . . . . . . . . . 7
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
8.1. Normative References . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction
Recent Datacenter routing designs are modeled like shown in
Figure Figure 1. Rather than using an IGP plus internal BGP (iBGP),
an IGP-less design is favored for disseminating routing information.
See [I-D.ietf-rtgwg-bgp-routing-large-dc] for rationale and detailed
information why and how to do so. Today BGP-LU [RFC3107] is used
both as an intra-AS [I-D.ietf-mpls-seamless-mpls] and inter-AS
routing protocol. Because of the IGP-less routing paradigm topology
information gets lost. Particularly the ability to direct traffic to
a specific node and hence the ability to do construct explicit paths
denominated by a set of nodes for traffic-engineering is of interest.
Gredler Expires September 10, 2015 [Page 2]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
BGP-LU today may advertise a MPLS transport path between Autonomous
Systems. This document describes extensions to the BGP-LU protocol
such that in addition to the advertised MPLS label-switched paths
(LSP) all potential MPLS label-switched paths of any given node in
the Data Center are exposed to ingress nodes.
The protocol extensions In this document are in full compliance with
the MPLS Architecture documented in [RFC3031].
+------+ +------+
| | | |
| |--| | Tier-1 / AS 651xx
| | | |
+------+ +------+
| | | |
+---------+ | | +----------+
| +-------+--+------+--+-------+ |
| | | | | | | |
+----+ +----+ +----+ +----+
| | | | | | | |
| |-----| | | |-----| | Tier-2 / AS 652xx
| | | | | | | |
+----+ +----+ +----+ +----+
| | | |
| | | |
| +-----+ | | +-----+ |
+-| |-+ +-| |-+ Tier-3 / AS 653xx
+-----+ +-----+
| | | | | |
<- Servers -> <- Servers -> Servers / AS 65534
Figure 1: eBGP-centric Data Center routing
2. Motivation, Rationale and Applicability
The specifications for Segment Routing (
[I-D.ietf-isis-segment-routing-extensions] and
[I-D.ietf-ospf-segment-routing-extensions] ) provide extensions for
setting up hop-by-hop shortest path routed MPLS LSPs. The used
Protocol semantics are:
o Domain-wide Index
o Local Label-Base
o Local Label Range
Gredler Expires September 10, 2015 [Page 3]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
advertised by any router in an IGP domain. This not only sets up
MPLS sink-trees to each egress router in a domain, but also allows to
steer traffic using stacks of node labels. The chosen protocol
semantics are essentially a compression scheme to advertise all MPLS
SPT paths in a domain.
The ability to do explicit path routing based on stacked labels,
constructed at the Hypervisors/Servers, without running conventional
TE-protocols like for example RSVP-TE is a lightweight way to scale
the Data Center Fabric.
In order to support deployments of Segment Routing across routing
protocol boundaries it is required to keep a common set of semantics
across all routing protocols. This document specifies BGP-LU
extensions to be able to address Node-SIDs across routing-protocol
boundaries.
3. Deployment Considerations
Depending on the Sophistication of the MPLS stack at the Hypervisor /
Server there are various levels of considerations for deployment.
3.1. Control plane restart
In case a restart of the first-hop router needs to be performed there
may be some forwarding state churn at the Hypervisor / Server. It
would be desirable that upon control-plane restart the Network node
uses the same label-allocations than in the previous incarnation.
Unfortunately none of the BGP graceful restart extensions allows to
re-aquire previous incarnations label-mapping state from the network.
Therefore a restarting node will be allocating FECs to labels in
temporal incoming order. This degrades to pseudo-random, non-
predictable label allocations. It is desirable that a BGP-LU
implementation allocates the labels in a deterministic way, such that
temporal control-plane loss does not impact forwarding between the
Hypervisor / Server and the network.
A BGP-LU Prefix SID speaking networking node MUST therefore implement
a MPLS label-allocation strategy which produces a deterministic,
local allocated label-block for all of its Prefix SIDs.
For example an Implementation MAY statically allocate a Label Base of
800000 and a block-size of 16000 labels and delegate that label block
exclusively to BGP-LU Prefix SID allocations, such that the same
label-base is being used across control-plane restarts.
Gredler Expires September 10, 2015 [Page 4]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
3.2. BGP-LU as Server Control Plane
In this case the Hypervisor / Server has a "client-only" BGP-LU stack
in order to interface to the network. This is the most distributed
way of building label switched paths across the network. As soon as
there is a reachability change then all of the Hypervisors / Servers
get notified instantly. There is almost no time-lag for updating
servers due to the inherent PUSH model of the BGP Protocol.
Most of the implementation complexity of a BGP implementation comes
from the BGP Update generation subsystem. For a client-only BGP
implementation this is fortunately negligible as typically one or two
(for redundancy reasons) BGP sessions are required. So the BGP
Update Generation complexity stays limited.
3.3. Labeled-ARP as Server Control Plane
The Labeled ARP Protocol [I-D.kompella-mpls-larp] may be used as a
lightweight alternative to the BGP-LU protocol. Labeled ARP is a
soft-state protocol and therefore needs special consideration for e.g
Refresh-timers, Labels in the network etc. needs to be taken. Yet
it is a distributed variant of LSP state propagation and hence re-
acts immediately to network topology changes / label to FEC changes.
3.4. Static Labels and Controller as Server Control Plane
Static labels do not need a control-plane sessions between
Hypervisors / Servers and the network. The assumption is that an
external controller transfers the routing/label information into the
Hypervisor / Server. The main disadvantage of that model is that the
update process is not distributed and hence a controller needs to
have excellent horizontal scaling abilities in order to update order
of 100K routes/labels to order of 100K servers.
4. BGP Prefix-SID Attribute
In order to facilitate dense packing of Network nodes and Node labels
to a deterministic label-range like described in Section 3.1 a new
Protocol extension called the "BGP Prefix SID Attribute" is proposed.
The BGP Prefix SID is a new optional, transitive BGP path attribute.
The attribute type code for BGP Prefix SID attribute is to be
assigned by IANA.
The value field of the BGP Prefix SID attribute is defined here to be
a set of elements encoded as "Type/Length/Value" (i.e., a set of
TLVs). Each such TLV is encoded as shown in Figure Figure 2.
Gredler Expires September 10, 2015 [Page 5]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
~ ~
| Value (variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: TLV format
o Type: A single octet encoding the TLV Type. Unrecognized Types
are preserved and propagated. In order to compare NLRIs with
unknown TLVs all TLVs MUST be ordered in ascending order by TLV
Type. If there are more TLVs of the same type, then the TLVs MUST
be ordered in ascending order of the TLV value within the TLVs
with the same type. All TLVs that are not specified as mandatory
are considered optional.
o Length: Two octets encoding the length of the value portion in
octets (thus a TLV with no value portion would have a length of
zero). The TLV is not padded to four-octet alignment.
o Value: A field containing zero or more octets.
The following TLV types are defined in this document:
+------+-------------+
| Type | Name |
+------+-------------+
| 1 | Label Index |
| 2 | Label Base |
| 3 | Label Range |
+------+-------------+
Table 1: Prefix SID TLVs
Use of other TLV types is outside the scope of this document.
4.1. Label Index TLV
o Type: 1
o Length: 4
o Value: Label Index
Only one Label Index TLV per Prefix SID Attribute is allowed.
Gredler Expires September 10, 2015 [Page 6]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
4.2. Label Base TLV
o Type: 2
o Length: 3
o Value: Label Base
One or more occurences of the Label Base TLV are allowed. A Label
Base TLV MUST be followed by a Label Range TLV.
4.3. Label Range TLV
o Type: 3
o Length: 3
o Value: Label Range
One or more occurences of the Label Range TLV are allowed. A Label
Range TLV MUST be preceeded by a Label Range TLV.
5. Acknowledgements
Many thanks to TBD for their detailed review and insightful comments.
6. IANA Considerations
This document requests a code point from the BGP Path Attributes
registry named 'Prefix SID'
This document requests creation of a new registry for BGP Prefix SID
TLVs. Value 0 is reserved. The maximum value is 255. The registry
will be initialized as shown in Table 1. Allocations within the
registry will require documentation of the proposed use of the
allocated value (=Specification required) and approval by the
Designated Expert assigned by the IESG (see [RFC5226]).
7. Security Considerations
This document does not introduce any change in terms of BGP security.
8. References
Gredler Expires September 10, 2015 [Page 7]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
Label Switching Architecture", RFC 3031, January 2001.
[RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in
BGP-4", RFC 3107, May 2001.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", BCP 26, RFC 5226,
May 2008.
8.2. Informative References
[I-D.ietf-isis-segment-routing-extensions]
Previdi, S., Filsfils, C., Bashandy, A., Gredler, H.,
Litkowski, S., Decraene, B., and J. Tantsura, "IS-IS
Extensions for Segment Routing", draft-ietf-isis-segment-
routing-extensions-03 (work in progress), October 2014.
[I-D.ietf-mpls-seamless-mpls]
Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz,
M., and D. Steinberg, "Seamless MPLS Architecture", draft-
ietf-mpls-seamless-mpls-07 (work in progress), June 2014.
[I-D.ietf-ospf-segment-routing-extensions]
Psenak, P., Previdi, S., Filsfils, C., Gredler, H.,
Shakir, R., Henderickx, W., and J. Tantsura, "OSPF
Extensions for Segment Routing", draft-ietf-ospf-segment-
routing-extensions-04 (work in progress), February 2015.
[I-D.ietf-rtgwg-bgp-routing-large-dc]
Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for
routing in large-scale data centers", draft-ietf-rtgwg-
bgp-routing-large-dc-01 (work in progress), February 2015.
[I-D.kompella-mpls-larp]
Kompella, K., Rajagopalan, B., and G. Swallow, "Label
Distribution Using ARP", draft-kompella-mpls-larp-02 (work
in progress), October 2014.
Gredler Expires September 10, 2015 [Page 8]
Internet-Draft Prefix-SID extensions for BGP-LU March 2015
Author's Address
Hannes Gredler
Juniper Networks, Inc.
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
US
Email: hannes@juniper.net
Gredler Expires September 10, 2015 [Page 9]