MPLS Working Group | C. Ramachandran |
Internet-Draft | T. Saad |
Updates: 4090 (if approved) | Juniper Networks, Inc. |
Intended status: Standards Track | I. Minei |
Expires: December 22, 2019 | Google, Inc. |
D. Pacella | |
Verizon, Inc. | |
June 20, 2019 |
Refresh-interval Independent FRR Facility Protection
draft-ietf-mpls-ri-rsvp-frr-06
RSVP-TE relies on periodic refresh of RSVP messages to synchronize and maintain the Label Switched Path (LSP) related states along the reserved path. In the absence of refresh messages, the LSP-related states are automatically deleted. Reliance on periodic refreshes and refresh timeouts are problematic from the scalability point of view. The number of RSVP-TE LSPs that a router needs to maintain has been growing in service provider networks and the implementations should be capable of handling increase in LSP scale.
RFC 2961 specifies mechanisms to eliminate the reliance on periodic refresh and refresh timeout of RSVP messages, and enables a router to increase the message refresh interval to values much longer than the default 30 seconds defined in RFC 2205. However, the protocol extensions defined in RFC 4090 for supporting Fast ReRoute (FRR) using bypass tunnels implicitly rely on short refresh timeouts to cleanup stale states.
In order to eliminate the reliance on refresh timeouts, the routers should unambiguously determine when a particular LSP state should be deleted. Coupling LSP state with the corresponding RSVP-TE signaling adjacencies as recommended in RFC 8370 will apply in scenarios other than RFC 4090 FRR using bypass tunnels. In scenarios involving RFC 4090 FRR using bypass tunnels, additional explicit tear down messages are necessary. Refresh-interval Independent RSVP FRR (RI-RSVP-FRR) extensions specified in this document consists of procedures to enable LSP state cleanup that are essential in scenarios not covered by procedures defined in RSVP-TE Scaling Recommendations. Hence, this document updates the procedures defined in RFC 4090 to support Refresh-interval Independent RSVP (RI-RSVP) capability specified in RFC 8370.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 22, 2019.
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
RSVP-TE Fast ReRoute [RFC4090] defines two local repair techniques to reroute Label Switched Path (LSP) traffic over pre-established backup tunnel. Facility backup method allows one or more LSPs traversing a connected link or node to be protected using a bypass tunnel. The many-to-one nature of local repair technique is attractive from scalability point of view. This document enumerates facility backup procedures in [RFC4090] that rely on refresh timeout and hence make facility backup method refresh-interval dependent. The RSVP-TE extensions defined in this document will enhance the facility backup protection mechanism by making the corresponding procedures refresh-interval independent.
Base RSVP [RFC2205] maintains state via the generation of RSVP Path/Resv refresh messages. Refresh messages are used to both synchronize state between RSVP neighbors and to recover from lost RSVP messages. The use of Refresh messages to cover many possible failures has resulted in a number of operational problems.
The problems listed above adversely affect RSVP control plane scalability and RSVP-TE [RFC3209] inherited these problems from standard RSVP. Procedures specified in [RFC2961] address the above mentioned problems by eliminating dependency on refreshes for state synchronization and for recovering from lost RSVP messages, and by eliminating dependency on refresh timeout for stale state cleanup. Implementing these procedures allows implementations to improve RSVP-TE control plane scalability. For more details on eliminating dependency on refresh timeout for stale state cleanup, refer to "Refresh-interval Independent RSVP" section 3 of RSVP-TE Scaling Techniques [RFC8370].
However, the procedures specified in RSVP-TE Scaling Techniques [RFC8370] do not fully address stale state cleanup for facility backup protection [RFC4090], as facility backup protection still depends on refresh timeouts for stale state cleanup.
The procedures specified in this document, in combination with RSVP-TE Scaling Techniques [RFC8370], eliminate facility backup protection dependency on refresh timeouts for stale state cleanup. The document hence updates the semantics of Refresh-interval Independent RSVP (RI-RSVP) capability specified in Section 3 of RSVP-TE Scaling Techniques [RFC8370].
The procedures specified in this document assume reliable delivery of RSVP messages, as specified in [RFC2961]. Therefore this document makes support for [RFC2961] a pre-requisite.
The reader is expected to be familiar with the terminology in [RFC2205], [RFC3209], [RFC4090] and [RFC4558].
Phop node: Previous-hop router along the label switched path
PPhop node: Previous-Previous-hop router along the label switched path
Nhop node: Next-hop router along the label switched path
PPhop node: Next-Next-hop router along the label switched path
PLR: Point of Local Repair router as defined in [RFC4090]
MP: Merge Point router as defined in [RFC4090]
LP-MP node: Merge Point router at the tail of Link-Protecting bypass tunnel
NP-MP node: Merge Point router at the tail of Node-Protecting bypass tunnel
TED: Traffic Engineering Database
LSP state: The combination of "path state" maintained as Path State Block (PSB) and "reservation state" maintained as Reservation State Block (RSB) forms an individual LSP state on an RSVP-TE speaker
B-SFRR-Ready: Bypass Summary FRR Ready Extended Association object defined in Summary FRR extensions [I-D.ietf-mpls-summary-frr-rsvpte] and is added by the PLR for each protected LSP.
Conditional PathTear: A PathTear message containing a suggestion to a receiving downstream router to retain the path state if the receiving router is an NP-MP
Remote PathTear: A PathTear message sent from a Point of Local Repair (PLR) to the MP to delete LSP state on the MP if PLR had not reliably sent the backup Path state before
E / \ / \ / \ / \ / \ / \ A ----- B ----- C ----- D \ / \ / \ / \ / \ / \ / F
Figure 1: Example Topology
In the topology in Figure 1, let us consider a large number of LSPs from A to D transiting B and C. Assume that refresh interval has been configured to be long of the order of minutes and refresh reduction extensions are enabled on all routers.
Also let us assume that node protection has been configured for the LSPs and the LSPs are protected by each router in the following way
In the above condition, assume that B-C link fails. The following is the sequence of events that is expected to occur for all protected LSPs under normal conditions.
While the above sequence of events has been described in [RFC4090], there are a few problems for which no mechanism has been specified explicitly.
The purpose of this document is to provide solutions to the above problems which will then make it practical to scale up to a large number of protected LSPs in the network.
The solution consists of five parts.
A node supporting [RFC4090] facility protection FRR MAY set the RI-RSVP capability (I bit) defined in Section 3 of RSVP-TE Scaling Techniques [RFC8370] only if it supports all the extensions specified in the rest of this document. A node supporting [RFC4090] facility bypass FRR but not supporting the extensions specified in this document MUST reset the RI-RSVP capability (I bit) in the outgoing Node-ID based Hello messages. Hence, this document updates [RFC4090] by defining extensions and additional procedures over facility protection FRR defined in [RFC4090] in order to advertise RI-RSVP capability [RFC8370].
As per the procedures specified in [RFC4090], when a protected LSP comes up and if the "local protection desired" flag is set in the SESSION_ATTRIBUTE object, each node along the LSP path attempts to make local protection available for the LSP.
With regard to the PLR procedures described above and that are specified in [RFC4090], this document specifies the following additional procedures to support RI-RSVP defined in [RFC8370].
A Node-ID based RSVP-TE Hello session is one in which Node-ID is used in the source and the destination address fields of RSVP Hello messages [RFC4558]. This document extends Node-ID based RSVP Hello session to track the state of any RSVP-TE neighbor that is not directly connected by at least one interface. In order to apply Node-ID based RSVP-TE Hello session between any two routers that are not immediate neighbors, the router that supports the extensions defined in the document MUST set TTL to 255 in all outgoing Node-ID based Hello messages exchanged between the PLR and the MP. The default hello interval for this Node-ID hello session SHOULD be set to the default specified in RSVP-TE Scaling Techniques [RFC8370].
In the rest of the document the term "signaling adjacency", or "remote signaling adjacency" refers specifically to the RSVP-TE signaling adjacency.
With regard to the MP procedures that are defined in [RFC4090], this document specifies the following additional procedures to support RI-RSVP defined in [RFC8370].
Each node along an LSP path supporting the extensions defined in this document MUST also include its router ID in the Node-ID sub-object of the RRO object carried in the Resv message of the LSPs. If the PLR has not included a Node-ID sub-object in the RRO object carried in the Path message and if the PLR is in a different IGP area, then the router MUST NOT execute the MP procedures specified in this document for those LSPs. Instead, the node MUST execute backward compatibility procedures defined in Section 4.6.2.2 as if the upstream nodes along the LSP do not support the extensions defined in this document.
A node receiving Path messages should determine whether they contain a B-SFRR-Ready Extended Association object with the Node-ID address of the PLR as the source and its own Node-ID as the destination. In addition the node should determine whether it has an operational remote Node-ID signaling adjacency with the PLR. If either the PLR has not included the B-SFRR-Ready Extended Association object or if there is no operational Node-ID signaling adjacency with the PLR or if the PLR has not advertised RI-RSVP capability in its Node-ID based Hello messages, then the node MUST execute backward compatibility procedures defined in Section 4.6.2.2.
If a matching B-SFRR-Ready Extended Association object is found in the Path message and if there is an operational remote signaling adjacency with the PLR that has advertised RI-RSVP capability (I-bit) [RFC8370] in its Node-ID based Hello messages, then the node SHOULD consider itself as the MP for the corresponding PLR. The matching and ordering rules for Bypass Summary FRR Extended Association specified in RSVP-TE Summary FRR [I-D.ietf-mpls-summary-frr-rsvpte] MUST be followed by the implementations supporting this document.
Once a router concludes it is the MP for a PLR running refresh-interval independent FRR procedures, it SHOULD create a remote path state for the LSP. The only difference between the "remote" path state and the LSP state is the RSVP_HOP object. The RSVP_HOP object in a "remote" path state contains the address that the PLR uses to send Node-ID hello messages to the MP.
The MP SHOULD consider the "remote" path state automatically deleted if:
Unlike the normal path state that is either locally generated on the ingress or created by a Path message from the Phop node, the "remote" path state is not signaled explicitly from the PLR. The purpose of "remote" path state is to enable the PLR to explicitly tear down the path and reservation states corresponding to the LSP by sending a tear message for the "remote" path state. Such a message tearing down "remote" path state is called "Remote" PathTear.
The scenarios in which a "Remote" PathTear is applied are described in Section 4.5.
This section describes the procedures for routers on the LSP path for different kinds of failures. The procedures described on detecting RSVP control plane adjacency failures do not impact the RSVP-TE graceful restart mechanisms ([RFC3473], [RFC5063]). If the router executing these procedures act as helper for neighboring router, then the control plane adjacency will be declared as having failed after taking into account the grace period extended for neighbor by the helper.
Node failures are detected from the state of Node-ID hello sessions established with immediate neighbors. RSVP-TE Scaling Techniques [RFC8370] recommends each router to establish Node-ID hello sessions with all its immediate neighbors. PLR or MP node failure is detected from the state of remote signaling adjacency established according to Section 4.2.2 of this document.
When a router detects Phop link or Phop node failure and the router is not an MP for the LSP, then it SHOULD send a Conditional PathTear (refer to Section 4.4 "Conditional PathTear" below) and delete the PSB and RSB states corresponding to the LSP.
When the Phop link for an LSP fails on a router that is an LP-MP for the LSP, the LP-MP MUST retain the PSB and RSB states corresponding to the LSP till the occurrence of any of the following events.
When a router that is an LP-MP for an LSP detects Phop node failure from the Node-ID signaling adjacency state, the LP-MP SHOULD send a normal PathTear and delete the PSB and RSB states corresponding to the LSP.
When a router that is an NP-MP for an LSP detects Phop link failure, or Phop node failure from the Node-ID signaling adjacency, the router MUST retain the PSB and RSB states corresponding to the LSP till the occurrence of any of the following events.
When a router that is an NP-MP does not detect Phop link or node failure, but receives a Conditional PathTear from the Phop node, then the router MUST retain the PSB and RSB states corresponding to the LSP till the occurrence of any of the following events.
Receiving a Conditional PathTear from the Phop node will not impact the "remote" state from the PPhop PLR. Note that Phop node would send a Conditional PathTear if it was not an MP.
In the example topology in Figure 1, we assume C & D are the NP-MPs for the PLRs A & B respectively. Now when A-B link fails, as B is not an MP and its Phop link has failed, B will delete LSP state (this behavior is required for unprotected LSPs - Section 4.3.1). In the data plane, that would require B to delete the label forwarding entry corresponding to the LSP. So if B's downstream nodes C and D continue to retain state, it would not be correct for D to continue to assume itself as the NP-MP for the PLR B.
The mechanism that enables D to stop considering itself as the NP-MP for B and delete the corresponding "remote" path state is given below.
A router may be simultaneously the LP-MP as well as the NP-MP for the Phop and the PPhop nodes respectively of an LSP. If Phop link fails on such node, the node MUST retain the PSB and RSB states corresponding to the LSP till the occurrence of any of the following events.
If a router that is both LP-MP and NP-MP detects Phop node failure, then the node MUST retain the PSB and RSB states corresponding to the LSP till the occurrence of any of the following events.
In the example provided in the Section 4.3.3, B deletes the PSB and RSB states corresponding to the LSP once B detects its link to Phop went down as B is not an MP. If B were to send a PathTear normally, then C would delete LSP state immediately. In order to avoid this, there should be some mechanism by which B can indicate to C that B does not require the receiving node to unconditionally delete the LSP state immediately. For this, B SHOULD add a new optional CONDITIONS object in the PathTear. The CONDITIONS object is defined in Section 4.4.3. If node C also understands the new object, then C SHOULD delete LSP state only if it is not an NP-MP - in other words C SHOULD delete LSP state if there is no "remote" PLR path state on C.
A router that is not an MP for an LSP SHOULD delete the PSB and RSB states corresponding to the LSP if the Phop link or the Phop Node-ID signaling adjacency goes down (Section 4.3.1). The router SHOULD send a Conditional PathTear if the following are also true.
When a router that is not an NP-MP receives a Conditional PathTear, the node SHOULD delete the PSB and RSB states corresponding to the LSP, and process the Conditional PathTear by considering it as a normal PathTear. Specifically, the node MUST NOT propagate the Conditional PathTear downstream but remove the optional object and send a normal PathTear downstream.
When a node that is an NP-MP receives a Conditional PathTear, it MUST NOT delete LSP state. The node SHOULD check whether the Phop node had previously included the B-SFRR-Ready Extended Association object in the Path. If the object had been included previously by the Phop, then the node processing the Conditional PathTear from the Phop SHOULD remove the corresponding object and trigger a Path downstream.
If a Conditional PathTear is received from a neighbor that has not advertised support (refer to Section 4.6) for the new procedures defined in this document, then the node SHOULD consider the message as a normal PathTear. The node SHOULD propagate the normal PathTear downstream and delete the LSP state.
As any implementation that does not support Conditional PathTear SHOULD ignore the new object but process the message as a normal PathTear without generating any error, the Class-Num of the new object MUST be 10bbbbbb where 'b' represents a bit (from Section 3.10 of [RFC2205]).
The new object is called as "CONDITIONS" object that will specify the conditions under which default processing rules of the RSVP-TE message MUST be invoked.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Class | C-type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved |M| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: CONDITIONS Object
The object has the following format:
If the ingress wants to tear down the LSP because of a management event while the LSP is being locally repaired at a transit PLR, it would not be desirable to wait till the completion of backup LSP signaling to perform state cleanup. To enable LSP state cleanup when the LSP is being locally repaired, the PLR SHOULD send a "Remote" PathTear message instructing the MP to delete the PSB and RSB states corresponding to the LSP. The TTL in the "Remote" PathTear message SHOULD be set to 255.
Let us consider that node C, in example topology (Figure 1), has gone down and B locally repairs the LSP.
If local repair fails on the PLR after a failure, then this should be considered as a case for cleaning up LSP state from the PLR to the Egress. The PLR would achieve this using "Remote" PathTear to clean up the state from the MP. If the MP has retained the LSP state, then it would propagate the PathTear downstream thereby achieving state cleanup. Note that in the case of link protection, the PathTear would be directed to the LP-MP node's IP address rather than the Nhop interface address.
When a PLR router that has already made NP available detects a change in the RRO carried in the Resv message indicating that the router's former NP-MP is no longer present in the LSP path, then the router SHOULD send a "Remote" PathTear directly to its former NP-MP.
In the example topology in Figure 1, let us assume A has made node protection available and C has concluded it is the NP-MP for PLR A. When the B-C link fails then C, implementing the procedure specified in Section 4.3.4 of this document, will retain state till: the remote Node-ID signaling adjacency with A goes down, or a PathTear or a ResvTear is received for its PSB or RSB respectively. If B also has made node protection available, B will eventually complete backup LSP signaling with its NP-MP D and trigger a Resv to A with RRO changed. The new RRO of the LSP carried in the Resv will not contain C. When A processes the Resv with a new RRO not containing C - its former NP-MP, A SHOULD send a "Remote" PathTear to C. When C receives the "Remote" PathTear for its PSB state, C will send a normal PathTear downstream to D and delete both the PSB and RSB states corresponding to the LSP. As D has already received backup LSP signaling from B, D will retain control plane and forwarding states corresponding to the LSP.
If an LSP is preempted on an LP-MP after its Phop or incoming link has already failed but the backup LSP has not been signaled yet, then the node SHOULD send a normal PathTear and delete both the PSB and RSB states corresponding to the LSP. As the LP-MP has retained LSP state expecting the PLR to perform backup LSP signaling, preemption would bring down the LSP and the node would not be LP-MP any more requiring the node to clean up LSP state.
If an LSP is preempted on an NP-MP after its Phop link has already failed but the backup LSP has not been signaled yet, then the node SHOULD send a normal PathTear and delete the PSB and RSB states corresponding to the LSP. As the NP-MP has retained LSP state expecting the PLR to perform backup LSP signaling, preemption would bring down the LSP and the node would not be NP-MP any more requiring the node to clean up LSP state.
Let us consider that B-C link goes down on the same example topology (Figure 1). As C is the NP-MP for the PLR A, C will retain LSP state.
The "Refresh interval Independent FRR" or RI-RSVP-FRR referred below in this section refers to the changes that have been defined in previous sections. Any implementation that does not support them has been termed as "non-RI-RSVP-FRR implementation". The extensions proposed in RSVP-TE Summary FRR [I-D.ietf-mpls-summary-frr-rsvpte] are applicable to implementations that do not support RI-RSVP-FRR. On the other hand, changes proposed relating to LSP state cleanup namely Conditional and "Remote" PathTear require support from one-hop and two-hop neighboring nodes along the LSP path. So procedures that fall under LSP state cleanup category SHOULD be turned on only if all nodes involved in the node protection FRR i.e. the PLR, the MP and the intermediate node in the case of NP, support the extensions. Note that for LSPs requesting only link protection, the PLR and the LP-MP need to support the extensions.
An implementation supporting the extensions specified in previous sections (called RI-RSVP-FRR here after) SHOULD set the flag "Refresh interval Independent RSVP" or RI-RSVP flag in the CAPABILITY object carried in Hello messages. The RI-RSVP flag is specified in RSVP-TE Scaling Techniques [RFC8370].
The procedures defined hereafter are performed on a subset of LSPs that traverse a node, rather than on all LSPs that traverse a node. This behavior is required to support backward compatibility for a subset of LSPs traversing nodes running non-RI-RSVP-FRR implementations.
The procedures on the downstream direction are as follows.
If the node reduces the refresh time from the above procedures, it MUST NOT send any "Remote" PathTear or Conditional PathTear messages.
Consider the example topology in Figure 1. If C does not support the RI-RSVP-FRR extensions, then:
The procedures on the upstream direction are as follows.
The backward compatibility procedures described in the previous sub-sections imply that a router supporting the RI-RSVP-FRR extensions specified in this document can apply the procedures specified in the document either in the downstream or upstream direction of an LSP, depending on the capability of the routers downstream or upstream in the LSP path.
For example, if an implementation supporting the RI-RSVP-FRR extensions specified in this document is deployed on all routers in particular region of the network and if all the LSPs in the network request node protection, then the FRR extensions will only be applied for the LSP segments that traverse the particular region. This will aid incremental deployment of these extensions and also allow reaping the benefits of the extensions in portions of the network where it is supported.
The security considerations pertaining to the original RSVP protocol [RFC2205], [RFC3209] and [RFC5920] remain relevant.
This document extends the applicability of Node-ID based Hello session between immediate neighbors. The Node-ID based Hello session between the PLR and the NP-MP may require the two routers to exchange Hello messages with non-immediate neighbor. So, the implementations SHOULD provide the option to configure Node-ID neighbor specific or global authentication key to authentication messages received from Node-ID neighbors. The network administrator MAY utilize this option to enable RSVP-TE routers to authenticate Node-ID Hello messages received with TTL greater than 1. Implementations SHOULD also provide the option to specify a limit on the number of Node-ID based Hello sessions that can be established on a router supporting the extensions defined in this document.
RSVP Change Guidelines [RFC3936] defines the Class-Number name space for RSVP objects. The name space is managed by IANA.
IANA registry: RSVP Parameters
Subsection: Class Names, Class Numbers, and Class Types
A new RSVP object using a Class-Number from 128-183 range called the "CONDITIONS" object is defined in Section 4.4 of this document. The Class-Number from 128-183 range will be allocated by IANA.
We are very grateful to Yakov Rekhter for his contributions to the development of the idea and thorough review of content of the draft. Thanks to Raveendra Torvi and Yimin Shen for their comments and inputs.
Markus Jork
128 Technology
Email: mjork@128technology.net
Harish Sitaraman
Individual Contributor
Email: harish.ietf@gmail.com
Vishnu Pavan Beeram
Juniper Networks, Inc.
Email: vbeeram@juniper.net
Ebben Aries
Arrcus, Inc.
Email: exa@arrcus.com
Mike Taillon
Cisco Systems, Inc.
Email: mtaillon@cisco.com
[RFC5920] | Fang, L., "Security Framework for MPLS and GMPLS Networks", RFC 5920, DOI 10.17487/RFC5920, July 2010. |