Internet DRAFT - draft-bashandy-idr-bgp-repair-label
draft-bashandy-idr-bgp-repair-label
Network Working Group A. Bashandy
Internet Draft B. Pithawala
Intended status: Standards Track Cisco Systems
Expires: October 2012 Jakob Heitz
Ericsson
May 1, 2012
Scalable, Loop-Free BGP FRR using Repair Label
draft-bashandy-idr-bgp-repair-label-04.txt
Abstract
Consider a BGP free core scenario. Suppose the provider edge BGP
speakers PE1, PE2,..., PEn know about a prefix P/m via the external
routers CE1, CE2,..., CEm. If the PE router PEi loses connectivity to
the primary path, it is desirable to immediately restore traffic by
rerouting packets arriving from the core to PEi and destined to the
prefix P/m to one of the other PE routers that advertised P/m, say PEj,
until BGP re-converges. However if the loss of connectivity of PEi to
the primary path also resulted in the loss of connectivity between PEj
and CEj, rerouting a packet before the control plane converges may
result in a loop. In this document, we propose using a repair label for
traffic restoration while avoiding loops. We propose advertising the
"repair" label through BGP.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Bashandy Expires April 30, 2012 [Page 1]
Internet-Draft BGP FRR using Repair Label October 2011
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 1, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction...................................................3
1.1. Conventions used in this document.........................4
1.2. Terminology...............................................4
2. Protocol Operation.............................................5
2.1. Control plane Operation...................................5
2.1.1. Additional Rules for allocating and advertising a Repair
label.......................................................6
2.2. Forwarding Plane Operation................................6
2.3. Example...................................................7
3. How to Disseminate Repair Label Information....................9
3.1.1. Structure of the Repair Label Path Attribute........10
3.1.2. Semantics of the Repair Label Attribute.............10
3.1.3. Additional Rule when Forwarding Advertisements
Containing the Repair Path Attribute.......................11
4. Security Considerations.......................................12
5. IANA Considerations...........................................12
6. Conclusions...................................................12
Bashandy Expires April 1, 2012 [Page 2]
Internet-Draft BGP FRR using Repair Label October 2011
7. References....................................................12
7.1. Normative References.....................................12
7.2. Informative References...................................13
8. Acknowledgments...............................................13
1. Introduction
In a BGP free core, where traffic is tunneled between edge routers
and edge routers assign labels to prefixes, BGP speakers advertise
reachability information about prefixes and associate a local label
with each prefix such as L3VPN [9], 6PE [10], and Softwire [8].
Suppose that a given edge router is chosen as the best next-hop for
a prefix P/m. An ingress router that receives a packet from an
external router and destined for the prefix P/m pushes the label
advertised by the egress edge router and then "tunnels" the packet
across the core to that egress router. Upon receiving the labeled
packet from the core, the egress router uses the label on the packet
to take the appropriate forwarding decision.
In modern networks, it is not uncommon to have a prefix reachable
via multiple edge routers. One example is the best external path
[7]. Another more common and widely deployed scenario is L3VPN [9]
with multi-homed VPN sites. As an example, consider the L3VPN
topology depicted in Figure 1.
+--------------------------+
| |
| BGP free Core |
| |
| +------------------PE1----+
| / | \
| / | \
| / | \
| / | \
| / | *
PE3 | CE....... VPN prefix
| \ | * (P/m)
| \ | /
| \ | /
| \ | /
| \ | /
| +------------------PE2----+
| |
| |
+--------------------------+
Figure 1 VPN prefix reachable via multiple PEs
Bashandy Expires April 1, 2012 [Page 3]
Internet-Draft BGP FRR using Repair Label October 2011
PE3 is the ingress PE. PE1 and PE2 are both egress PEs connected to
CE. CE advertises one or more VPN prefixes, denoted by P/m. PE1 and
PE2 advertise P/m as VPNv4 or VPNv6 routes to all ingress PEs,
including PE3, and associates a label with each route.
Suppose that the ingress PE, PE3, chooses PE1 as the next-hop for
the prefix P/m. In order to minimize traffic loss, it is highly
desirable for PE1 to reroute all traffic destined to P/m to PE2 as
soon as the connectivity to CE is lost without waiting for the
control plane (whether it is IGP or BGP) to re-converge and compute
the new best path. In doing so, PE1 pushes the label advertised by
PE2 for the prefix P/m, and then "tunnels" the packet to PE2.
However if the loss of PE1-CE connectivity was due to CE crash, then
PE2 will also reroute the traffic back to PE1, resulting in a loop.
Due to ultra scalability requirements, where there is a need to
support thousands of peers and hundreds of thousands of prefixes,
there is a need to support quick traffic restoration without waiting
for the control plane to converge and without risking loops.
1.1. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [1].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC-2119 significance.
1.2. Terminology
This section outlines the terms used in this document. For ease of
use, we will use terms similar to those used by L3VPN [9]
o Protected prefix: a prefix P/m (of any AFI) that a BGP speaker
has an external path to. The BGP speaker may learn about the
prefix from an external peer through BGP, some other protocol, or
manual configuration. The protected prefix is advertised to some
or all the internal peers.
o Primary egress PE: an IBGP peer that can reach the protected
prefix P/m through an external path and advertised the prefix to
the other IBGP peers. The primary egress PE was chosen as the
best path by one or more internal peers. In other words, the
primary egress PE is an egress PE that will normally be used when
there is no failure. Referring to Figure 1, PE1 is a primary
egress PE.
Bashandy Expires April 1, 2012 [Page 4]
Internet-Draft BGP FRR using Repair Label October 2011
o CE: an external router through which an egress PE can reach a
prefix P/m. The router "CE" in Figure 1 is an example of such a
CE
o Ingress PE: a BGP speaker that learns about a prefix through
another IBGP peer and chooses that IBGP peer as the next-hop for
the prefix. PE3 in Figure 1 is an example of an ingress PE
o Repairing PE: the egress PE that attempts to restore traffic when
the primary path is no longer reachable "without" waiting for BGP
to re-converge. The repairing PE restores the traffic by
rerouting the traffic (through a tunnel) towards the pre-
calculated repair PE when it detects that the primary path is no
longer reachable. Referring to Figure 1, if PE3 chooses PE1 as
the primary egress PE and PE1 decides to reroute traffic to PE2
on losing reachability with CE, then PE1 is a repairing PE.
o Primary label: the label advertised by the primary egress PE to
be used for normal traffic forwarding.
o Repair egress PE: an egress PE other than the primary egress PE
that can reach the protected prefix P/m through an external
neighbor. The repair PE is pre-calculated via other repairing PEs
prior to any failure
o Repair label: the label that will be pushed on the packet when
the repairing PE reroutes the traffic (through a tunnel) towards
the repair egress PE. Section 2. discusses how the repair label
is used. Section 3. discusses semantics of and the method for
disseminating repair label information.
o Repair path: the repair egress PE and the repair label.
o internal and external: internal or external to the core.
2. Protocol Operation
This section explains the operation of the control and forwarding
planes of routers participating in BGP-free core traffic
restoration.
2.1. Control plane Operation
1. As usual, each PE allocates a local label for each prefix it can
reach through an external neighbor CE. This is the primary label
used for normal traffic forwarding.
Bashandy Expires April 1, 2012 [Page 5]
Internet-Draft BGP FRR using Repair Label October 2011
2. To provide repair path information to all PEs, the PE also
allocates a repair label to the prefix if it can reach that
prefix via an external neighbor. Different repair label
allocation schemes are proposed in Section 3. .
3. The PE advertises both the primary and repair labels to all IBGP
peers.
4. When a PE receives the label advertisement from egress PEs, it
calculates a primary egress PE and a repair egress PE based on
its internal path selection criteria. Note that the method of
choosing the repair path is beyond the scope of this document.
5. In the end, for some of the prefixes advertised by more than one
PE, a PE will have
o a primary path
o a repair path consisting of a repair PE and a repair label
advertised by the chosen repair PE.
6. A PE "never" protects a repair label. Hence on any PE, a repair
label only has paths towards the CE. However a primary label may
have a repair path towards a chosen repair PE
2.1.1. Additional Rules for allocating and advertising a Repair label
o A repair PE MUST NOT advertise a repair label for a prefix if it
does NOT have an external path to the prefix
o The forwarding entry for the repair label on the repair PE MUST
NOT point to an internal path
o Repair labels SHOULD be advertised with labeled address families
only. That is AFI/SAFI 1/4, 2/4, 1/128, and 2/128.
2.2. Forwarding Plane Operation
This section specifies the forwarding plane operation when a PE
receives a packet and any of the following two conditions are true:
o The PE lost the primary path and has not yet calculated another
primary path and programmed it in the forwarding plane.
Bashandy Expires April 1, 2012 [Page 6]
Internet-Draft BGP FRR using Repair Label October 2011
o The arriving packet arrived from the core and the PE does not
have an external path. It is noteworthy to mention that this
condition should be a temporary condition until all ingress PEs
converge and stop sending traffic to that PE.
The forwarding plane processes arriving traffic as follows:
1. If the repairing PE is an egress PE, the packet arrives at the
repairing PE with the primary label at the top because the packet
is "tunneled" from the ingress PE(s). In that case, the repairing
PE swaps the incoming label stack with the "repair label stack"
advertised by the repair egress PE. Section 3.1.2. specifies all
the details
2. The repairing PE tunnels the packet to the repair PE
3. At the repair PE, the packet arrives with the repair label at the
top. The repair PE uses the incoming label stack to take
forwarding decisions
4. If the repair egress PE can reach the CE, the repair PE forwards
the packet towards the CE.
5. If the repair PE cannot reach the CE, the traffic will be dropped
because a PE never protects a repair label
2.3. Example
Consider the L3VPN [9] topology depicted in Figure 2 where two PEs
are connected to the same PE. Assume that the core is LDP. We will
be using an advertised repair label.
PE1
\
\
\
\
LDP core CE....... VPN prefix
/ (10.0.0.0/8)
/
/
/
PE2
Figure 2 : L3VPN Example
Bashandy Expires April 1, 2012 [Page 7]
Internet-Draft BGP FRR using Repair Label October 2011
PE1: Repairing egress PE
PE2: repair PE
Primary VPN label advertised by PE1 to all PEs: 4000
Repair VPN label advertised by PE1 to all PEs: 5000
Primary VPN label advertised by PE2 to all PEs: 2000
Repair VPN label advertised by PE2 all PEs: 3000
LDP label for PE2 on PE1 is 1234
LDP label for PE1 on PE2 is 4567
Before failure
'''''''''''''''
PE1 has the following FIB entries
4000 -----> CE (unlabeled)
-----> PE2, swap 4000 with 3000 and then push 1234
5000 -----> CE (unlabeled)
PE2 has the following
2000 -----> CE (unlabeled)
-----> PE1, swap 2000 with 5000 and then push 4567
3000 ------> CE (unlabeled)
After the CE crashes
''''''''''''''''''''
PE1 has the following entry:
4000 -----> PE2, swap 4000 with 3000 and then push 1234
5000 -----> Drop
PE2 has the following
2000 -----> PE1, swap 2000 with 5000 and then push 4567
3000 ------> Drop
Because of the above routing entries, any traffic arriving from the
core at PE1 and destined for 10.0.0/8, is rerouted towards PE2
using the repair VPN label 3000. PE2 will just drop it instead of
looping it back towards PE1.
After the link between PE1 and CE fails (CE did not crash)
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
PE1 has the following entry:
4000 -----> PE2, swap 4000 with 3000 and then push 1234
5000 -----> Drop
PE2 has the following
2000 -----> CE (unlabeled)
-----> PE1, swap 2000 with 5000 and then push 4567
Bashandy Expires April 1, 2012 [Page 8]
Internet-Draft BGP FRR using Repair Label October 2011
3000 ------> CE
Because of the above routing entries, any traffic arriving from the
core at PE1 and destined for 10.0.0/8 is rerouted towards PE2 using
the repair VPN label 3000. PE2 will forward the traffic towards CE.
3. How to Disseminate Repair Label Information
We propose to advertise the repair label as an optional path
attribute. Advertising the repair label as an optional path
attributes has some advantages:
o An egress PE can benefit from a scalable repair label allocation
schemes such as per-CE repair label allocation
o Allows the repairing PE to share the same repair path among
multiple protected prefixes. Since the repair path is shared by
all labels sharing the path attribute, the repairing PE can
optimize its RIB and FIB by sharing the same repair path data
structure among a large number of protected prefixes.
o Reduces the BGP update message size. Instead of having to send
additional labels per prefix, multiple prefixes can share the
same repair label
o The number of labels used for traffic restoration does not depend
on the number of protected prefixes
o Allows for incremental deployment because the attribute is
optional
The main disadvantage of sharing the same repair path among multiple
primary paths is loss of fine grain control. It is not possible to
manage, control, or provide differentiated handling to traffic on
per prefix basis until the network re-converges. The loss of fine
grain control is limited to the BGP re-convergence period.
It is noteworthy to mention that per-CE and/or per next-hop repair
label allocation has some advantages over per-prefix repair label
allocation. First it results in using fewer labels. Second it allows
for better packing in BGP messages. Third it does not require
special handling in the forwarding plane at the repair PE. Fourth it
simplifies the forwarding plane while maximizing the packet
switching performance because the egress PE can take a forwarding
decision with a single FIB lookup.
Bashandy Expires April 1, 2012 [Page 9]
Internet-Draft BGP FRR using Repair Label October 2011
3.1.1. Structure of the Repair Label Path Attribute
This document defines the repair label attribute as an optional non-
transitive path attribute [2] as follows:
Attribute name: REPAIR_LABEL
Type code: TBD
Attribute Flags:
Optional bit: 1
Transitive bit: 0
Partial bit: 0
Extended Length bit: 0
Length of the attribute: length in octets of the attribute
Attribute Value: The attribute value contains a stack of one or
more labels. The encoding of the labels is identical to encoding
of the "label" field in [4]. The value of the bottom of stack
(BOS) bit is determined at traffic restoration time as specified
in Section 3.1.2. .
3.1.2. Semantics of the Repair Label Attribute
This document specifies the semantics of the repair label attribute
when the attribute carries one repair label only. The semantics of
more than one repair label is beyond the scope of this document.
Suppose a BGP speaker PE1 receives an update message with a repair
label attribute containing the label "Lr2" from the IBGP peer PE2.
Suppose the NLRI in the MP_REACH_NLRI attribute [3] contains the
prefixes R1, R2,. . . , Rn each bound to a label L21, L22,. . . ,
L2n, respectively. This means the following:
1. PE2 will never attempt to repair a packet arriving with the label
"Lr2". Hence PE2 will either forward the packet to an external CE
or drop the packet
2. PE2 expects the following from PE1:
Bashandy Expires April 1, 2012 [Page 10]
Internet-Draft BGP FRR using Repair Label October 2011
a. Case a: The route Ri on PE1 is bound to a local label "L1i".
Suppose PE1 receives a packet from the core with the label
"L1i" at the top of the stack. If the PE1 loses the primary
path for a prefix Ri or PE1 receives a packet from the core
while not having an external path, and PE1 decides that PE2
is the repair PE for the prefix Ri, then PE1 MUST swap the
label "L1i" on the packet with the repair label "Lr2" and
then tunnel the packet to PE2. The bottom of stack (BOS) bit
MUST be copied from the label arriving on the packet to the
label "Lr2"
b. Case b: The route Ri on PE1 is bound to an aggregate label
(e.g. per-vrf label). In that case, if PE1 receives a packet
from the core, PE1 has to perform more than one route lookup
to determine the primary path. Eventually, there will either
be an IP lookup or a label lookup that points to the primary
path:
i. A label lookup points to the primary path: In that case,
PE1 handles the packet as described in item 2.a above.
ii. An IP lookup points to the primary path: In that case,
if the PE1 loses the primary path for a prefix Ri or PE1
receives a packet from the core while not having an
external path, PE1 handles the packet as follows
1. PE1 pops all labels on the packet
2. PE1 MUST push the label "Lr2"
3. PE1 tunnels the packet to PE2. The bottom of stack
(BOS) bit in "Lr2" MUST be set as specified in [5].
3.1.3. Additional Rule when Forwarding Advertisements Containing the
Repair Path Attribute
As specified in Section 3.1.1. , the repair label attribute is a
non-transitive attribute. However there may be cases, such as inter-
AS option (b)[9], route reflectors [11], or confederation [12],
where a router may replace the advertised next-hop with its own
before forwarding an advertisement. If a BGP speaker replaces the
next-hop attribute with its own and the advertisement contains a
repair label attribute with label stack "Sr", there are two options
o Option 1: The BGP speaker MUST NOT advertise the repair label
attribute
Bashandy Expires April 1, 2012 [Page 11]
Internet-Draft BGP FRR using Repair Label October 2011
o Option 2: The BGP speaker MUST replace the repair label stack
"Sr" with a locally allocated label stack "Sr1" before
advertising the route and then advertise the stack "Sr1" in the
repair label attribute. For the forwarding plane, the BGP speaker
MUST install a swap forwarding entry such that if the BGP speaker
receives a packet with the label stack "Sr1", it swaps "Sr1" with
the stack "Sr".
Note that advertising the repair label attribute by the router
depends on whether the router understands the semantics of and
supports the repair label attribute at the time of receiving an
advertisement containing the repair label attribute.
4. Security Considerations
No additional security risk is introduced by using the mechanisms
proposed in this document
5. IANA Considerations
This document defines a new BGP path attribute. IANA maintains a
list of the current BGP attribute typecodes in [6]. This document
proposes defining a new typecode value of "TBD" for the REPAIR_LABEL
path attribute
6. Conclusions
This document proposes using a repair label to allow restoring
traffic prior to BGP convergence while avoiding loops
7. References
7.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
4 (BGP-4), RFC 4271, January 2006
[3] Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
"Multiprotocol Extensions for BGP", RFC 4760, January 2007
[4] Rosen, E., Rekhter, Y., "Carrying Label Information in BGP-4",
RFC 3107, May 2001
[5] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci,
D., Li, T. and A. Conta, "MPLS Label Stack Encoding", RFC
3032, January 2001.
Bashandy Expires April 1, 2012 [Page 12]
Internet-Draft BGP FRR using Repair Label October 2011
7.2. Informative References
[6] BGP Parameters, http://www.iana.org/assignments/bgp-
parameters/bgp-parameters.xhtml
[7] Marques,P., Fernando, R., Chen, E, Mohapatra, P.,
"Advertisement of the best external route in BGP", draft-ietf-
idr-best-external-02.txt, April 2004.
[8] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
Framework", RFC 5565, June 2009.
[9] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006.
[10] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F.,
Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider
Edge Routers (6PE)", RFC 4798, February 2007
[11] Bates, T., Chen, E., and Chandra, R., "BGP Route Reflection:
An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456,
April 2006
[12] Traina, P., McPherson, P., and Scudder, J., "Autonomous System
Confederations for BGP", RFC 5065, August 2007
8. Acknowledgments
Special thanks to Keyur Patel, Robert Raszuk, and Eric Rosen for the
valuable comments
This document was prepared using 2-Word-v2.0.template.dot.
Authors' Addresses
Ahmed Bashandy
Cisco Systems
170 West Tasman Dr, San Jose, CA 95134
Email: bashandy@cisco.com
Burjiz Pithawala
Cisco Systems
170 West Tasman Dr, San Jose, CA 95134
Email: bpithaw@cisco.com
Jakob Heitz
Ericsson
100 Headquarters Drive, San Jose, CA, 95134
Email: jakob.heitz@ericsson.com
Bashandy Expires April 1, 2012 [Page 13]