Internet DRAFT - draft-raszuk-ext-failure-propagation
draft-raszuk-ext-failure-propagation
Network Working Group R. Raszuk
Internet-Draft K. Patel
Expires: January 3, 2006 R. Fernando
Cisco Systems
July 2, 2005
External failure propagation.
draft-raszuk-ext-failure-propagation-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 3, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
The current BGP specification calls for sending prefix based routing
information when a BGP peer fails to all other peers so that they
could converge using the new information.
Certain network events could be communicated to BGP speakers in an
aggregated fashion. This not only minimizes control plane traffic
but more importantly reduces the time to react to these events by the
Raszuk, et al. Expires January 3, 2006 [Page 1]
Internet-Draft EFP-BGP July 2005
network and consequently reduces the time to converge.
This draft suggests extensions to the protocol to react to such
events in a concise manner.In this version of the document the scope
of the propagation will be contained to a single domain.
Table of Contents
1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Specification of Requirements . . . . . . . . . . . . . . . 4
4. Applicability . . . . . . . . . . . . . . . . . . . . . . . 4
5. VLID allocation . . . . . . . . . . . . . . . . . . . . . . 4
6. VLID association to routes and their propagation . . . . . . 5
7. VLID signaling . . . . . . . . . . . . . . . . . . . . . . . 7
7.1 Analysis of propagation options . . . . . . . . . . . . . 7
7.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.3 Message ordering protection . . . . . . . . . . . . . . . 8
8. Operation . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1 BGP paths redundancy requirement . . . . . . . . . . . . . 9
8.2 VLID propagation via route reflectors . . . . . . . . . . 9
8.3 Sequence of events . . . . . . . . . . . . . . . . . . . . 10
9. Security Considerations . . . . . . . . . . . . . . . . . . 10
10. IANA Consideration . . . . . . . . . . . . . . . . . . . . . 10
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
12.1 Normative References . . . . . . . . . . . . . . . . . . 11
12.2 Informative References . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 12
Intellectual Property and Copyright Statements . . . . . . . 13
Raszuk, et al. Expires January 3, 2006 [Page 2]
Internet-Draft EFP-BGP July 2005
1. Terminology
The following list describes acronyms and definitions for terms used
throughout this document:
o 2547 - RFC2547 - MPLS based Virtual Private Networks [4]
o ADD-PATH - Advertisement of Multiple Paths in BGP [10]
o AS - Autonomous System
o ASBR - Autonomous Systems Border Router
o BFD - Bi-directional Failure Detection [8]
o BGP - Border Gateway Protocol [5]
o CE - Customer Edge. A customer-owned device that has as its next
hop a service provider device.
o IGP - Interior Gateway Protocol
o IPv4 - Internet Protocol version 4
o IPv6 - Internet Protocol version 6
o EFP - External failure propagation: The out of bound failure
signaling which is subject of this specification.
o MPLS - Multi Protocol Label Switching.
o PE - Provider Edge. A service provider device that has as its
next hop one or more customer devices.
o VLID - Virtual Link Id: A value assigned by the border router (PE/
ASBR) indicating the state of the peering device or a state of a
link to such peer.
o VPNv4 - Virtual Private Network for IPv4
o VPNv6 - virtual Private Network for IPv6
2. Introduction
In most of today's BGP deployments the external peer's failures
results (on the ASBR or a PE node) in the best path calculation
followed by the per prefix native BGP signaling of a new path or a
withdraw message if no other path is available.
In this proposal we recommend the enhancement to this traditional
paradigm. In parallel to per prefix signaling the proposal suggests
creation of hierarchy. While permitting the traditional BGP
mechanism of per prefix signaling to proceed at it's own pace we
define an abstraction value (VLID) which will be assigned on a per
peer basis and signaled immediately following peer's link failure or
peer's node failure event.
The most important benefit of this behavior is to trigger the
alternate path computation on all BGP speakers domain wide. This
reduces the time taken to flood the event to a single message
propagation delay in the network and makes the protocol messaging as
well as convergence invariant of the number of prefixes involved.
Raszuk, et al. Expires January 3, 2006 [Page 3]
Internet-Draft EFP-BGP July 2005
3. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]
4. Applicability
EFP provides toolset to speed up remote critical failure detection
impacting current data paths without the necessity to modify
forwarding plane. It also eliminates even temporary occurrences of
any sub optimal routing which is not avoidable in any tunnel based
protection solutions. That is achived by triggering switchover to an
alternative exit points on all BGP speakers including the ingress
nodes to the domain thus assuring the optimal BGP path selection
execution.
This solution should be able to address all of the below operational
and deployment scenarios:
o IPv4/IPv6 forwarding without next hop self on ASBR
o IPv4/IPv6 forwarding with next hop self on ASBRs
o IPv4/Ipv6 forwarding with native or tunneled core
o VPNv4/VPNv6 remote CE failure (directly connected p2p to PE)
o VPNv4/VPNv6 remote CE failure (multihop connected p2p to PE)
o VPNv4/VPNv6 remote CE failure (connected via multi-access to PE)
o VPNv4/VPNv6 PE-CE link failure (Point to point or multi-access)
o VPNv4/VPNv6 PE failure
The solution should work equally well for BGP learned routes or
redistributed locally by the border edge router.
5. VLID allocation
In order to be able to later detect and map various failure scenarios
to BGP routing information a proper marking is necessary to take
place ahead of failure time on externally received routes via EBGP or
at the redistribution from any other routing protocol.
We first examine the following failure types:
a. CE node failure
b. PE-CE link failure
c. PE/ASBR failure
The link/node liveness detection of the peer can be done using: IGP
hellos, BFD, physical link failures or even highly discouraged but
still used in practice low BGP keepalive interval. This document
Raszuk, et al. Expires January 3, 2006 [Page 4]
Internet-Draft EFP-BGP July 2005
does not mandate the use of any of them leaving the trigger itself to
the implementation or customer choice. All of the above triggers
should be supported with the proposed extension.
In today's networks next hop handling for external routes can be
divided into two operational scenarios:
o Setting next to self hop on PE/ASBR
o Not setting next hop on PE/ASBR and redistributing peer's next hop
into local IGP
Note that some applications may force user to set next hop on the
edge of the autonomous system (for example 2547). In the case of PE/
ASBR failure when set next hop self operation has occurred on ingress
to the AS the failure propagation of entire node can be accomplished
by the IGP flooding. In some IGP topologies next hop leaking between
IGP areas/levels is necessary. The same IGP based event propagation
can also be used to signal external next hop liveness when no next
hop self set action occurred at the AS boundary.
In all other failure cases there needs to be a functional component
which is responsible for association on a per BGP process basis (per
each independent BGP instance with it's own BGP router id), a unique
value to received route's here called VLID. Such value will
represent remote links and peering devices. The term link here
represents a virtual link and not a physical one.
For point to point interface types virtual link may map directly to
physical/logical link, but on the multi-access interfaces an
abstraction layer will be required which will map CE node to each
virtual link value even if physical medium is shared between a number
of CEs connected to PE.
Such an abstraction also will address all flavors of multihop access
techniques as long as proper detection is in place to notice a
failure in a timely fashion.
6. VLID association to routes and their propagation
The meaning of virtual links IDs is valid only with conjunction with
the VL's originator BGP router-id within a given Autonomous System.
BGP Virtual Link IDs should not be propagated via EBGP sessions,
unless operator allows propagation of unchanged next hops for a given
EBGP peering.
BGP Virtual Link IDs shall not be propagated to those BGP speakers
who did not indicated EFP BGP capability.
Raszuk, et al. Expires January 3, 2006 [Page 5]
Internet-Draft EFP-BGP July 2005
A new BGP Virtual Link Attribute is defined to carry information
about allocated Virtual Link ID from a BGP speaker to all NLRIs
present in corresponding MP_REACH_NLRI attribute.
The format of the new BGP Virtual Link Attribute is defined
in Figure 1:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr flags | Attr Type code| Attr Length | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP Rtr_ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Link ID ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... Virtual Link ID | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1 BGP Virtual Link Attribute
Attribute Flags & Type code fields:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|0|0|0|0|0|0|0| TBD |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2 BGP Virtual Link Attribute Flags
o Bit 0 - Optional attribute (value 1)
o Bit 1 - Non-transitive attribute (value 0)
o Bit 2 - Partial bit (0 for optional and non-transitive)
o Bit 3 - Attribute length of one octet (value 0)
o Bit 4-7 - Unused (value all zeros)
o Type code - Attribute type code (TBD)
o Length - 16 octets
The VL ID assignment scheme can be as flexible as an implementation
allows. In particular an implementation may select to define it's
own internal format for 6 octets VL ID value such that octets
represent various node's failure scenarios. Since the VLIDs have
only local significance the specification of many flavors of their
definitions is not necessary for proper protocol operation.
Raszuk, et al. Expires January 3, 2006 [Page 6]
Internet-Draft EFP-BGP July 2005
7. VLID signaling
7.1 Analysis of propagation options
There are multiple means external events encoded in VLIDs could be
propagated via an AS to other BGP speakers. We considered options to
use IGPs for flooding, forms of reliable multicast flooding as well
as new BGP sub-address family. The last one has been chosen for
following reasons:
o Selective reception of external failure state by only those BGP
speakers who require this type of information
o Easy propagation via entire domain including transit via different
IGP areas or levels
o Elimination of unnecessary transit points to avoid increased
propagation delays
o Containment of the solution within a single protocol thereby
eliminating the need for multiple protocols (and hence multiple
components within a router) to interact with each other to
implement this scheme. It keeps the solution and its
implementation simple.
7.2 Encoding
Virtual link's state information will be propagated across given
domain with a new SAFI. Manually or automatically created new BGP
peering sessions will be required to be established.
The type code for the new EFP SAFI will be assigned (TBD).
The NLRI format for the new EFP SAFI is represented as [BGP_Rtr_ID:
VL_ID] where BGP Rtr ID is a 4 octet value indicating BGP router ID
of the BGP speaker who originated VL's and 6 octet VL IDs
representing the allocated identifiers for external links or peering
nodes.
The minimum length of EFP NLRI can contain just the BGP Rtr ID value
(length of 4 octets) indicating that any prefixes originated by this
node will need to be invalidated regardless of the VL_ID value they
carry (application example: controlled reload of one of BGP processes
during planned maintenance without impacting IGP).
The max NLRI length can be of the size of 10 octets.
The new BGP capability msg has been defined to signal EFP capability
between BGP speakers. Each BGP speaker that wishes to participate in
the new EFP address family must use the Multiprotocol Extensions
Capability Code as defined in [BGP-MP] to advertise the new EFP (AFI,
Raszuk, et al. Expires January 3, 2006 [Page 7]
Internet-Draft EFP-BGP July 2005
SAFI) pair.
A BGP speaker participating in the distribution of EFP information
and configured as Route Reflector should prioritize distribution of
the VL information against it's other BGP data processing to avoid
any delays for remote peers to get the convergence critical
information in a timely fashion.
When implementation supports on a per BGP address family processing
prioritization EFP address family should have the highest priority.
This is recommended mostly for two reasons:
o Any other AF may depend on it's information
o The amount of information required to be send should be much
smaller then the amount of corresponding prefixes to be processed
and propagated.
Support of new EFP address family shall automatically indicate
support for handling BGP optional non-transitive EFP Virtual Link
Attribute.
7.3 Message ordering protection
To support the deployment model of propagating the new EFP AFI/SAFI
via an existing route reflectors in order to accommodate for possible
bgp message propagation delays and update reordering, a 2 octet value
counter has been defined. It's main role is to assure that BGP
reacts only to the latest event and not delayed one (out of sequence)
due to some propagation problems in the network.
The counter shall restart when max has been reached from the value of
1. The counter should only be consulted while processing updates for
the already existed VL NLRIs in the BGP RIB. When the value is lower
then already existing for a given VL NLRI the incoming update should
be dropped and no new BGP action taken. The exception to this rule
will be present when the incoming message contains the value of 1 and
the value of previous message has not reached two octet maximum.
Such a situation will take place when the originator restarts and in
this case we should not impact already advertised prefixes before the
normal BGP route propagation would have taken place.
We define a new BGP community type called Virtual Links Counter
community to carry associated virtual link counter value Figure 1.
The Virtual Links Counter Community is of an extended type.
Virtual Links Counter is a new type of BGP Extended Community
Attribute. It is a non transitive in the AS scope and in the same
time transitive in a per bgp speaker scope within the domain. In
Raszuk, et al. Expires January 3, 2006 [Page 8]
Internet-Draft EFP-BGP July 2005
analogy to a non transitive EFP Attribute it is allowed to be
propagated across EBGP sessions only when next-hop is preservation
has been configured on such a session by the operator.
It carries two octet counter for the associated VL NLRIs in the
corresponding MP_REACH_NLRI attribute and is only allocated on the
ASBR/PE nodes.
The value of the high-order octet of the Type Field is 0x40. The
value of the low-order octet of the Type field for this community is
(TBD). The value of the Global Administrator sub-field (2 octets) is
used to carry the VL counter.
The Local Administrator sub-field is reserved for further use.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x40 | TBD | Virtual Link Counter |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 Virtual Links Counter BGP Extended Community
8. Operation
8.1 BGP paths redundancy requirement
In order to make an instant switching decision at the egress nodes
the ingress node has to propagate best external path to all of his
IBGP peers along with the associated VL IDs.
In the case where there is more then one path in the BGP VRF table
(including VRF zero) received from different peer a second best path
along with it's own VL ID is advised to be propagated. In such a
case even for different RDs per vrf RRs would need to support
ADD_PATH and not eliminate path distribution by propagating only the
best one to it's clients. Not propagating second best path may
result in unnecessary loss of connectivity not greater then would be
today but easy to highly minimize when employing second best and EFP
approach.
8.2 VLID propagation via route reflectors
In traditional IP based switching route reflectors if deployed need
to propagate more then a single best path. That can be accomplished
Raszuk, et al. Expires January 3, 2006 [Page 9]
Internet-Draft EFP-BGP July 2005
with the use of ADD Paths scheme.
For some address families in particular 2547 the same can be
accomplished by configuring different RD per vrf on all PEs. No
additional changes will be required on RRs.
8.3 Sequence of events
The following events are expected to happen at the failure scenario:
a. Each PE allocates a VLID to each received path from remote CEs/
ASBRs
b. BGP marks the received routes from a external peers by adding the
new Virtual Link attribute to the update messages
c. Each PE advertises by IBGP best external path for a given vrf
(incl VRF0).
d. For non 2547 networks route reflectors if used need to be able to
fwd more then only best path received from the ASBRs (use of ADD-
PATH [10] recommended). For 2547 application that extesion to
route reflectors is not required When different RD allocation per
vrf recommendation is in place. That extension is also not
required when route reflectors are not used for given AFI/SAFI.
e. On the event of any PE-CE link failure or CE node failure PE/ASBR
signals transitioned VL IDs state in MP_UNREACH_NLRI and
propagates them via iBGP new EFP AFI/SAFI to the peers
f. BGP Rtr ID + VL ID length pair uniquely identifies invalid paths
and triggers local switchover to other paths for a given prefix.
g. For the transition period sender will also follow up with the
traditional per prefix withdraw or update message. When the
network wide deployment of routers supporting EFP is assured the
need for origination and propagation of per prefix withdraws
following EFP signaling could be eliminated.
9. Security Considerations
This extension to BGP does not change the underlying security issues
inherent in the existing IBGP [2].
10. IANA Consideration
The following type codes have to be allocated by the current
allocation rules:
o New attribute type code for BGP Virtual Link Attribute
o New SAFI value for the new EFP SAFI
o New type code for the Virtual Link Counter BGP Extended community
Raszuk, et al. Expires January 3, 2006 [Page 10]
Internet-Draft EFP-BGP July 2005
11. Acknowledgements
The authors would like to express a special thanks to the following
individuals for contributing their ideas and support for writing this
specification: Tony Li, Yakov Rekhter, David Wardd, Russ White, Enke
Chen.
12. References
12.1 Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[2] Heffernan, A., "Protection of BGP Sessions via the TCP MD5
Signature Option", RFC 2385, August 1998.
[3] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, "Multiprotocol
Extensions for BGP-4", RFC 2858, June 2000.
[4] Rosen, E., "BGP/MPLS IP VPNs", draft-ietf-l3vpn-rfc2547bis-03
(work in progress), October 2004.
[5] Rekhter, Y., "A Border Gateway Protocol 4 (BGP-4)",
draft-ietf-idr-bgp4-26 (work in progress), October 2004.
[6] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", draft-ietf-idr-bgp-ext-communities-08
(work in progress), February 2005.
[7] Sangli, S., Rekhter, Y., Fernando, R., Scudder, J., and E. Chen,
"Graceful Restart Mechanism for BGP", draft-ietf-idr-restart-10
(work in progress), June 2004.
[8] Katz, D. and D. Ward, "Bidirectional Forwarding Detection",
draft-ietf-bfd-base-02 (work in progress), March 2005.
12.2 Informative References
[9] Rosen, E. and Y. Rekhter, "BGP/MPLS VPNs", RFC 2547,
March 1999.
[10] Walton, D., Cook, D., Retana, A., and J. Scudder,
"Advertisement of Multiple Paths in BGP",
draft-walton-bgp-add-paths-00 (work in progress), May 2002.
Raszuk, et al. Expires January 3, 2006 [Page 11]
Internet-Draft EFP-BGP July 2005
Authors' Addresses
Robert Raszuk
Cisco Systems Inc.
170 West Tasman Dr
San Jose, CA 95134
US
Phone: (408)525-7588
Email: raszuk@cisco.com
Keyur Patel
Cisco Systems Inc.
170 West Tasman Dr
San Jose, CA 95134
US
Phone: (408)526-7183
Email: keyupate@cisco.com
Rex Fernando
Cisco Systems Inc.
170 West Tasman Dr
San Jose, CA 95134
US
Phone: (408)525-1253
Email: rex@cisco.com
Raszuk, et al. Expires January 3, 2006 [Page 12]
Internet-Draft EFP-BGP July 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Raszuk, et al. Expires January 3, 2006 [Page 13]