Routing Area Working Group | S. Litkowski, Ed. |
Internet-Draft | B. Decraene |
Intended status: Standards Track | Orange |
Expires: December 21, 2015 | C. Filsfils |
K. Raza | |
Cisco Systems | |
M. Horneffer | |
Deutsche Telekom | |
P. Sarkar | |
Juniper Networks | |
June 19, 2015 |
Operational management of Loop Free Alternates
draft-ietf-rtgwg-lfa-manageability-09
Loop Free Alternates (LFA), as defined in RFC 5286 is an IP Fast ReRoute (IP FRR) mechanism enabling traffic protection for IP traffic (and MPLS LDP traffic by extension). Following first deployment experiences, this document provides operational feedback on LFA, highlights some limitations, and proposes a set of refinements to address those limitations. It also proposes required management specifications.
This proposal is also applicable to remote LFA solution.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 21, 2015.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Following the first deployments of Loop Free Alternates (LFA), this document provides feedback to the community about the management of LFA.
[RFC5286] introduces the notion of tie breakers when selecting the LFA among multiple candidate alternate next-hops. When multiple LFA exist, RFC 5286 has favored the selection of the LFA providing the best coverage of the failure cases. While this is indeed a goal, this is one among multiple and in some deployment this lead to the selection of a suboptimal LFA. The following sections details real use cases of such limitations.
Note that the use case of LFA computation per destination (per-prefix LFA) is assumed throughout this analysis. We also assume in the network figures that all IP prefixes are advertised with zero cost.
P1 --------- P2 ---------- P3 --------- P4 | 1 100 1 | | | | 100 | 100 | | | 1 100 1 | 1 5k P5 --------- P6 ---------- P7 --------- P8 --- P9 -- PE1 | | | | | | 5k| |5k 5k| |5k | 5k | 5k | | | | | | | +-- PE4 --+ | +---- PE2 ----+ | | | +---- PE5 ----+ | 5k | PE3 Figure 1
Px routers are P routers using n*10G links. PEs are connected using links with lower bandwidth.
In figure 1, let us consider the traffic flowing from PE1 to PE4. The nominal path is P9-P8-P7-P6-PE4. Let us consider the failure of link P7-P8. For P8, P4 is not an LFA and the only available LFA is PE2.
When the core link P8-P7 fails, P8 switches all traffic destined to PE4/PE5 towards the node PE2. Hence a PE node and PE links are used to protect the failure of a core link. Typically, PE links have less capacity than core links and congestion may occur on PE2 links. Note that although PE2 was not directly affected by the failure, its links become congested and its traffic will suffer from the congestion.
In summary, in case of P8-P7 link failure, the impact on customer traffic is:
Besides the congestion aspects of using an Edge router as an alternate to protect a core failure, a service provider may consider this as a bad routing design and would like to prevent it.
P1 --------- P2 ------------ P3 -------- P4 | 1 100 | 1 | | | | | 100 | 30 | 30 | | | | 1 50 50 | 10 | 1 5k P5 --------- P6 --- P10 ---- P7 -------- P8 --- P9 -- PE1 | | | | \ | 5k| |5k 5k| |5k \ 5k | 5k | | | | \ | | +-- PE4 --+ | +---- PE2 ----+ | | | +---- PE5 ----+ | 5k | PE3 Figure 2
Px routers are P routers meshed with n*10G links. PEs are meshed using links with lower bandwidth.
In the figure 2, let us consider the traffic coming from PE1 to PE4. Nominal path is P9-P8-P7-P10-P6-PE4. Let us consider the failure of the link P7-P8. For P8, P4 is a link-protecting LFA and PE2 is a node-protecting LFA. PE2 is chosen as best LFA due to its better protection type. Just like in case 1, this may lead to congestion on PE2 links upon LFA activation.
+--- PE3 --+ / \ 1000 / \ 1000 / \ +----- P1 ---------------- P2 ----+ | | 500 | | | 10 | | | 10 | | | | R5 | 10 | 10 R7 | | | | | 10 | | | 10 | | 500 | | +---- P3 ---------------- P4 -----+ \ / 1000 \ / 1000 \ / +--- PE1 ---+ Figure 3
Px routers are P routers. P1-P2 and P3-P4 links are 1G links. All others inter Px links are 10G links.
In the figure above, let us consider the failure of link P1-P3. For destination PE3, P3 has two possible alternates:
P4 is chosen as best LFA due to its better protection type. However, it may not be desirable to use P4 for bandwidth capacity reason. A service provider may prefer to use high bandwidth links as prefered LFA. In this example, prefering shortest path over protection type may achieve the expected behavior, but in cases where metric are not reflecting bandwidth, it would not work and some other criteria would need to be involved when selecting the best LFA.
P1 P2 | \ / | 50 | 50 \/ 50 | 50 | /\ | PE1-+ +-- PE2 \ / 45 \ / 45 -PE3-+ (OL set) Figure 4
In the figure above, PE3 has its overload bit set (permanently, for design reason) and wants to protect traffic using LFA for destination PE2.
On PE3, the loop-free condition is not satisfied : 100 !< 45 + 45. PE1 is thus not considered as an LFA. However thanks to the overload bit set on PE3, we know that PE1 is loop-free so PE1 is an LFA to reach PE2.
In case of overload condition set on a node, LFA behavior must be clarified.
As per [RFC6571], LFA coverage highly depends on the used network topology. Even if remote LFA ([RFC7490]) extends significantly the coverage of the basic LFA specification, there is still some cases where protection would not be available. As network topologies are constantly evolving (network extension, capacity addings, latency optimization ...), the protection coverage may change. Fast reroute functionality may be critical for some services supported by the network, a service provider must constantly know what protection coverage is currently available on the network. Moreover, predicting the protection coverage in case of network topology change is mandatory.
Today network simulation tool associated with whatif scenarios functionality are often used by service providers for the overall network design (capacity, path optimization ...). Section 7.5, Section 7.4 and Section 7.3 of this document propose to add LFA informations into such tool and within routers, so a service provider may be able :
Implementers SHOULD document their LFA selection algorithms (default and tuning options) in order to leave possibility for 3rd party modules to model these policy-LFA expressions.
As all FRR mechanism, LFA installs backup paths in Forwarding Information Base (FIB). Depending of the hardware used by a service provider, FIB resource may be critical. Activating LFA, by default, on all available components (IGP topologies, interface, address families ...) may lead to waste of FIB resource as generally in a network only few destinations should be protected (e.g. loopback addresses supporting MPLS services) compared to the amount of destinations in RIB.
Moreover a service provider may implement multiple different FRR mechanism in its networks for different usages (MRT, TE FRR). In this scenario, an implementation MAY permit to compute alternates for a specific destination even if the destination is already protected by another mechanism. This will bring redundancy and let the ability for the operator to select the best option for FRR using a policy langage.
Section 6 of this document propose some implementation guidelines.
Controlling best alternate and LFA activation granularity is a requirement for Service Providers. This section defines configuration requirements for LFA.
The granularity of LFA activation should be controlled (as alternate next hop consume memory in forwarding plane).
An implementation of LFA SHOULD allow its activation with the following criteria:
An implementation of LFA MAY allow its activation with the following criteria:
When multiple alternates exist, LFA selection algorithm is based on tie breakers. Current tie breakers do not provide sufficient control on how the best alternate is chosen. This document proposes an enhanced tie breaker allowing service providers to manage all specific cases:
In addition to connected LFAs, tunnels (e.g. IP, LDP, RSVP-TE or Segment Routing) to distant routers may be used to complement LFA coverage (tunnel tail used as virtual neighbor). When a router has multiple alternate candidates for a specific destination, it may have connected alternates and remote alternates (reachable via a tunnel). Connected alternates may not always provide an optimal routing path and it may be preferable to select a remote alternate over a connected alternate. Some usage of tunnels to extend LFA ([RFC5286]) coverage is described in either [RFC7490] or [I-D.francois-segment-routing-ti-lfa]. These documents present some use cases of LDP tunnels ([RFC7490]) or Segment Routing tunnels ([I-D.francois-segment-routing-ti-lfa]). This document considers any type of tunneling techniques to reach remote alternates (IP, GRE, LDP, RSVP-TE, L2TP, Segment Routing ...) and does not restrict the remote alternates to the usage presented in the referenced document.
In figure 1, there is no P router alternate for P8 to reach PE4 or PE5 , so P8 is using PE2 as alternate, which may generate congestion when FRR is activated. Instead, we could have a remote alternate for P8 to protect traffic to PE4 and PE5. For example, a tunnel from P8 to P3 (following shortest path) can be setup and P8 would be able to use P3 as remote alternate to protect traffic to PE4 and PE5. In this scenario, traffic will not use a PE link during FRR activation.
When selecting the best alternate, the selection algorithm MUST consider all available alternates (connected or tunnel). For example with Remote LFA, computation of PQ set ([RFC7490]) SHOULD be performed before best alternate selection.
An implementation of LFA MUST support the following criteria:
An implementation of LFA SHOULD support the following enhanced criteria:
[RFC5286] Section 3. proposes to reuse GMPLS IGP extensions to encode SRLGs ([RFC4205] and [RFC4203]). The section is also describing the algorithm to compute SRLG protection.
When SRLG protection is computed, and implementation SHOULD permit to :
When applying SRLG criteria, the SRLG violation check SHOULD be performed on source to alternate as well as alternate to destination paths based on the SRLG set of the primary path. In the case of remote LFA, PQ to destination path attributes would be retrieved from SPT rooted at PQ.
Link coloring is a powerful system to control the choice of alternates. Protecting interfaces are tagged with colors. Protected interfaces are configured to include some colors with a preference level, and exclude others.
Link color information SHOULD be signalled in the IGP. How signalling is done is out of scope of the document but it may be useful to reuse existing admin-groups from traffic-engineering extensions or link attributes extensions like in [I-D.ietf-ospf-prefix-link-attr].
PE2 | +---- P4 | / PE1 ---- P1 --------- P2 | 10Gb 1Gb | | P3 Figure 8
P1 is configured to protect the P1-P4 link. We assume that given the topology, all neighbors are candidate LFA. We would like to enforce a policy in the network where only a core router may protect against the failure of a core link, and where high capacity links are prefered.
In this example, we can use the proposed link coloring by:
Using this, PE links will never be used to protect against P1-P4 link failure and 10Gb link will be be preferred.
The main advantage of this solution is that it can easily be duplicated on other interfaces and other nodes without change. A Service Provider has only to define the color system (associate color with a significance), as it is done already for TE affinities or BGP communities.
An implementation of link coloring:
As mentioned in previous sections, not taking into account bandwidth of an alternate could lead to congestion during FRR activation. We propose to base the bandwidth criteria on the link speed information for the following reason :
Based on this, it is not useful to gather available bandwidth on alternate paths, as the router does not know how much bandwidth it requires for protection. The proposed link speed approach provides a good approximation with a small cost as information is easily available.
The bandwidth criteria of the policy framework SHOULD work in at least two ways :
Rather than tagging interface on each node (using link color) to identify alternate node type (as example), it would be helpful if routers could be identified in the IGP. This would permit a grouped processing on multiple nodes. As an implementation need to exclude some specific alternates (see Section 6.2.3), an implementation :
A specific alternate may be identified by its interface, IP address or router ID and group of alternates may be identified by a marker (tag) (for example, those IGP extensions can be used : [I-D.ietf-isis-node-admin-tag], [I-D.ietf-ospf-node-admin-tag], [I-D.ietf-isis-prefix-attributes], [I-D.ietf-ospf-prefix-link-attr] ). Using a tag is referred as Node coloring in comparison to link coloring option presented in Section 6.2.4.2.
PE3 | | PE2 | +---- P4 | / PE1 ---- P1 -------- P2 | 10Gb 1Gb | | P3 Figure 9
Consider the following network:
A simple policy could be configured on P1 to choose the best alternate for P1->P4 based on router function/role as follows :
The alternate path is composed of two distinct parts : PLR to alternate and alternate to destination.
N1 -- R1 ---- R2 /50 \ \ / R3 --- R4 / \ S -------- E ------- D \\ // \\ // N2 ---- PQ ---- R5 Figure 5
In the figure above, we consider a primary path from S to D, S using E as primary nexthop. All metrics are 1 except {S,N1}=50. Two alternate paths are available:
As displayed in the figure, some part of the alternate path may fanout in multipath due to ECMP.
Some criterions listed in the previous sections are requiring to retrieve some characteristic of the alternate path (SRLG, bandwidth, color, tag ...). We call these characteristics "path attributes". A path attribute can record a list of node properties (e.g. node tag) or link properties (e.g. link color).
This document defines two types of path attributes:
N1 -- R1 ---- R2 / \ / 50 R4 / \ S -------- E ------- D
In the figure above, N1 is a connected alternate to each D from S. We consider that all links have a RED color except {R1,R2} which is BLUE. We consider all links to be 10Gbps, except {N1,R1} which is 2.5Gbps. The bandwidth attribute collected for the alternate path will be 10Gbps. As the attribute is unitary, only the link speed of the first link {S,N1} is recorded. The link color attribute collected for the alternate path will be {RED,RED,BLUE,RED,RED}. As the attribute is cumulative, the value of the attribute on each link along the path is recorded.
For alternate path using a connected alternate:
N1 -- R1 ---- R2 50//50 \ // \ i1//i2 \ S -------- E -------- D Figure 6
In the figure above, we consider a primary path from S to D, S using E as primary nexthop. All metrics are considered as 1 expect {S,N1} links which are using metric of 50. We consider the following SRLG groups on links:
S is connected to the alternate using two interfaces i1 and i2.
If i1 and i2 are not part of an ECMP group, the evaluation of attributes is done once per interface, and each interface is considered as a separate alternate path. Two alternate paths will be available with the associated SRLG attributes :
Alternate path #1 is sharing risks with primary path and may be depreferred or pruned by user defined policy.
If i1 and i2 are part of an ECMP group, the evaluation of attributes is done once per ECMP group, and the implementation considers a single alternate path {S,N1 using if1|if2,R1,R2,D} with the following SRLG attributes: SRLG1,SRLG10,SRLG2,SRLG20,SRLG3,SRLG4,SRLG5. Alternate path is sharing risks with primary path and may be depreferred or pruned by user defined policy.
For alternate path using a remote alternate (tunnel) :
The number of remote alternates may be very high. . In case of remote LFA, simulations of real-world network topologies have shown that order of hundreths of PQ may be possible. The computational overhead to collect all path attributes of all PQ to destination paths may grow beyond practical reason.
To handle this situation, it is needed to limit the number of remote alternates to be evaluated to a finite number before collecting alternate path attributes and running the policy evaluation. [I-D.ietf-rtgwg-rlfa-node-protection] Section 2.3.3 provides a way to reduce the number of PQ to be evaluated.
Some other remote alternate techniques using static or dynamic tunnels may not require this pruning.
Link Remote Remote alternate alternate alternate ------------- ------------------ ------------- Alternates | LFA | | rLFA (PQs) | | Static/ | | | | | | Dynamic | sources | | | | | tunnels | ------------- ------------------ ------------- | | | | | | | -------------------------- | | | Prune some alternates | | | | (sorting strategy) | | | -------------------------- | | | | | | | ------------------------------------------------ | Collect alternate attributes | ------------------------------------------------ | | ------------------------- | Evaluate policy | ------------------------- | | Best alternates
As described in Section 6.2.5, there may be some situation where an alternate path or part of an alternate path fans out to multiple paths (e.g. ECMP). When collecting path attributes in such case, an implementation SHOULD consider the union of attributes of each sub-path.
In the figure 5 (in Section 6.2.5), S has two alternates paths to reach D. Each alternate path fans out into multipath due to ECMP. Considering the following link color attributes : all links are RED except {R1,R3} which is BLUE. The user wants to use an alternate path with only RED links. The first alternate path {S,N1,R1,R2|R3,R4,D} does not fit the constraint, as {R1,R3} is BLUE. The second alternate path {S,N2,PQ,R5,D} fits the constraint and will be preferred as it uses only RED links.
10 PE2 - PE3 | | 50 | 5 | 50 P1----P2 \\ // 50 \\ // 50 PE1 Figure 7
Links between P1 and PE1 are L1 and L2, links between P2 and PE1 are L3 and L4
In the figure above, primary path from PE1 to PE2 is through P1 using ECMP on two parallel links L1 and L2. In case of standard ECMP behavior, if L1 is failing, postconvergence next hop would become L2 and there would be no longer ECMP. If LFA is activated, as stated in [RFC5286] Section 3.4., "alternate next-hops may themselves also be primary next-hops, but need not be" and "alternate next-hops should maximize the coverage of the failure cases". In this scenario there is no alternate providing node protection, LFA will so prefer L2 as alternate to protect L1 which makes sense compared to postconvergence behavior.
Considering a different scenario using figure 7, where L1 and L2 are configured as a layer 3 bundle using a local feature, as well as L3/L4 being a second layer 3 bundle. Layer 3 bundles are configured as if a link in the bundle is failing, the traffic must be rerouted out of the bundle. Layer 3 bundles are generally introduced to increase bandwidth between nodes. In nominal situation, ECMP is still available from PE1 to PE2, but if L1 is failing, postconvergence next hop would become ECMP on L3 and L4. In this case, LFA behavior SHOULD be adapted in order to reflect the bandwidth requirement.
We would expect the following FIB entry on PE1 :
On PE1 : PE2 +--> ECMP -> L1 | | | +----> L2 | +--> LFA(ECMP) -> L3 | +---------> L4
If L1 or L2 is failing, traffic must be switched on the LFA ECMP bundle rather than using the other primary next hop.
As mentioned in [RFC5286] Section 3.4., protecting a link within an ECMP by another primary next hop is not a MUST. Moreover, we already presented in this document, that maximizing the coverage of the failure case may not be the right approach and policy based choice of alternate may be preferred.
An implementation SHOULD permit to prefer to protect a primary next hop by another primary next hop. An implementation SHOULD permit to prefer to protect a primary next hop by a NON primary next hop. An implementation SHOULD permit to use an ECMP bundle as a LFA.
In [RFC5286], Section 3.5, the setting of the overload bit condition in LFA computation is only taken into account for the case where a neighbor has the overload bit set.
In addition to RFC 5286 inequality 1 Loop-Free Criterion (Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(S, D)), the IS-IS overload bit of the LFA calculating neighbor (S) SHOULD be taken into account. Indeed, if it has the overload bit set, no neighbor will loop back to traffic to itself.
Service providers often perform manual link shutdown (using router CLI) to perform some network changes/tests. A manual link shutdown may be done at multiple level : physical interface, logical interface, IGP interface, BFD session ... Especially testing or troubleshooting FRR requires to perform the manual shutdown on the remote end of the link as generally a local shutdown would not trigger FRR.
To enhance such situation, an implementation SHOULD support triggering/activating LFA Fast Reroute for a given link when a manual shutdown is done on a component that currently supports FRR activation.
An implementation MAY also support FRR activation for a specific interface or a specific prefix on a primary next-hop interface and revert without any action on any running component of the node (links or protocols). In this use case, the FRR activation time need to be controlled by a timer in case the operator forgot to revert traffic on primary path. When the timer expires, the traffic is automatically reverted to the primary path. This will make easier tests of fast-reroute path and then revert back to the primary path without causing a global network convergence.
For example :
LFA introduction requires some enhancement in standard routing information provided by implementations. Moreover, due to the non 100% coverage, coverage informations is also required.
Hence an implementation :
It is pretty easy to evaluate the coverage of a network in a nominal situation, but topology changes may change the coverage. In some situations, the network may no longer be able to provide the required level of protection. Hence, it becomes very important for service providers to get alerted about changes of coverage.
An implementation SHOULD :
An implementation MAY :
Although the procedures for providing alerts are beyond the scope of this document, we recommend that implementations consider standard and well used mechanisms like syslog or SNMP traps.
The operator may choose to run simulations in order to ensure full coverage of a certain type for the whole network or a given subset of the network. This is particularly likely if he operates the network in the sense of the third backbone profiles described in [RFC6571], that is, he seeks to design and engineer the network topology in a way that a certain coverage is always achieved. Obviously a complete and exact simulation of the IP FRR coverage can only be achieved, if the behavior is deterministic and if the algorithm used is available to the simulation tool. Thus, an implementation SHOULD:
This document does not introduce any change in security consideration compared to [RFC5286].
Significant contributions were made by Pierre Francois, Hannes Gredler, Chris Bowers, Jeff Tantsura, Uma Chunduri and Mustapha Aissaoui which the authors would like to acknowledge.
This document has no action for IANA.
[I-D.francois-segment-routing-ti-lfa] | Francois, P., Filsfils, C., Bashandy, A. and B. Decraene, "Topology Independent Fast Reroute using Segment Routing", Internet-Draft draft-francois-segment-routing-ti-lfa-00, November 2013. |
[I-D.ietf-isis-node-admin-tag] | Sarkar, P., Gredler, H., Hegde, S., Litkowski, S., Decraene, B., Li, Z., Aries, E., Rodriguez, R. and H. Raghuveer, "Advertising Per-node Admin Tags in IS-IS", Internet-Draft draft-ietf-isis-node-admin-tag-02, June 2015. |
[I-D.ietf-isis-prefix-attributes] | Ginsberg, L., Decraene, B., Filsfils, C., Litkowski, S., Previdi, S., Xu, X. and U. Chunduri, "IS-IS Prefix Attributes for Extended IP and IPv6 Reachability", Internet-Draft draft-ietf-isis-prefix-attributes-00, May 2015. |
[I-D.ietf-ospf-node-admin-tag] | Hegde, S., Raghuveer, H., Gredler, H., Shakir, R., Smirnov, A., Li, Z. and B. Decraene, "Advertising per-node administrative tags in OSPF", Internet-Draft draft-ietf-ospf-node-admin-tag-02, June 2015. |
[I-D.ietf-ospf-prefix-link-attr] | Psenak, P., Gredler, H., Shakir, R., Henderickx, W., Tantsura, J. and A. Lindem, "OSPFv2 Prefix/Link Attribute Advertisement", Internet-Draft draft-ietf-ospf-prefix-link-attr-06, June 2015. |
[I-D.ietf-rtgwg-rlfa-node-protection] | Sarkar, P., Gredler, H., Hegde, S., Bowers, C., Litkowski, S. and H. Raghuveer, "Remote-LFA Node Protection and Manageability", Internet-Draft draft-ietf-rtgwg-rlfa-node-protection-02, June 2015. |