Routing Area Working Group | S. Litkowski |
Internet-Draft | Orange Business Service |
Intended status: Informational | B. Decraene |
Expires: June 24, 2019 | Orange |
M. Horneffer | |
Deutsche Telekom | |
December 21, 2018 |
Link State protocols SPF trigger and delay algorithm impact on IGP micro-loops
draft-ietf-rtgwg-spf-uloop-pb-statement-09
A micro-loop is a packet forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet forwarding paradigm.
In this document, we are trying to analyze the impact of using different Link State IGP (Interior Gateway Protocol) implementations in a single network, with respect to micro-loops. The analysis is focused on the SPF (Shortest Path First) delay algorithm.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 24, 2019.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Link State IGP protocols are based on a topology database on which the SPF algorithm is run to find a consistent set of non-looping routing paths.
Specifications like IS-IS ([RFC1195]) propose some optimizations of the route computation (See Appendix C.1 of [RFC1195]) but not all the implementations follow those non-mandatory optimizations.
We will call "SPF triggers", the events that would lead to a new SPF computation based on the topology.
Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS ([RFC1195]), are using multiple timers to control the router behavior in case of churn: SPF delay, PRC (Partial Route Computation) delay, LSP (Link State Packet) generation delay, LSP flooding delay, LSP retransmission interval...
Some of those timers (values and behavior) are standardized in protocol specifications, while some are not. The SPF computation related timers have generally remained unspecified.
For non standardized timers, implementations are free to implement them in any way. For some standardized timers, we can also see that rather than using static configurable values for such timer, implementations may offer dynamically adjusted timers to help controlling the churn.
We will call "SPF delay", the timer that exists in most implementations that specifies the required delay before running SPF computation after a SPF trigger is received.
A micro-loop is a packet forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet forwarding paradigm. We can observe that these micro-loops are formed when two routers do not update their Forwarding Information Base (FIB) for a certain prefix at the same time. The micro-loop phenomenon is described in [I-D.ietf-rtgwg-microloop-analysis].
Two micro-loop mitigation techniques have been defined by IETF. [RFC6976] has not been widely implemented, presumably due to the complexity of the technique. [RFC8333] has been implemented. However, it does not prevent all micro-loops that can occur for a given topology and failure scenario.
In multi-vendor networks, using different implementations of a link state protocol may favor micro-loops creation during the convergence process due to discrepancies of timers. Service Providers are already aware to use similar timers (values and behavior) for all the network as a best practice, but sometimes it is not possible due to limitations of implementations.
This document will present why it sounds important for service providers to have consistent implementations of Link State protocols across vendors. We are particularly analyzing the impact of using different Link State IGP implementations in a single network in regards of micro-loops. The analysis is focused on the SPF delay algorithm.
[RFC8405] defines a solution that satisfies this problem statement and this document captures the reasoning of the provided solution.
S ---- E | | 10 | | 10 | | D ---- A | 2 Px
Figure 1 - Network topology suffering from micro-loops
The micro-loop appears due to the asynchronous convergence of nodes in a network when an event occurs.
Multiple factors (or a combination of these factors) may increase the probability for a micro-loop to appear:
Even if all of these factors may increase the probability for a micro-loop to appear, the SPF delay, especially in case of churn, plays a significant role. As the number of IGP events increase, the delta between SPF delay values used by routers becomes significant and the major part (especially when one router increases its timer exponentially while another one increases it in a more smoother way). Another important factor is the time to update the FIB. As of today, total FIB update time is the major factor for IGP convergence. However, for micro-loops, what's matter is not the total time, but the difference to install the same prefix between nodes. The time to update the FIB may be the main part for the first iteration but is not for subsequent IGP events. In addition, the time to update the FIB is very implementation specific and difficult/impossible to standardize, while the SPF delay algorithm may be standardized.
As a consequence, this document will focus on the analysis of the SPF delay behavior and associated triggers.
Depending on the change advertised in LSPDU (Link State Protocol Data Unit) or LSA (Link State Advertisement), the topology may be affected or not. An implementation may avoid running the SPF computation (and may only run IP reachability computation instead) if the advertised change does not affect the topology.
Different strategies exists to trigger the SPF computation:
As noted in Section 1, SPF optimizations are not mandatory in specifications. This has led to the implementation of different strategies.
Implementations of link state routing protocols use different strategies to delay the SPF computation. The two most common SPF delay behaviors are the following:
These behaviors will be explained in the next sections.
The SPF delay is managed by four parameters:
Example: Rapid delay (RD) = 50msec, Rapid runs = 3, Slow delay (SD) = 1sec, Wait time = 2sec
SPF delay time ^ | | SD- | x xx x | | | RD- | x x x x | +---------------------------------> Events | | | | || | | < wait time >
Figure 2 - Two phase delay algorithm
The algorithm has two modes: the fast mode and the backoff mode. In the fast mode, the SPF delay is usually delayed by a very small amount of time (fast reaction). When an SPF computation has run in the fast mode, the algorithm automatically moves to the backoff mode (a single SPF run is authorized in the fast mode). In the backoff mode, the SPF delay is increasing exponentially at each run. When the network becomes stable, the algorithm moves back to the fast mode. The SPF delay is managed by four parameters:
Example: First delay (FD) = 50msec, Incremental delay (ID) = 50msec, Maximum delay (MD) = 1sec, Wait time = 2sec
SPF delay time ^ MD- | xx x | | | | | | x | | | | x | FD- | x x x ID | +---------------------------------> Events | | | | || | | < wait time > FM->BM -------------------->FM
Figure 3 - Exponential delay algorithm
In Figure 1, we consider a flow of packet from S to D. We consider that S is using optimized SPF triggering (Full SPF is triggered only when necessary), and two steps SPF delay (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is optimized, Partial Reachability Computation (PRC) is available. We consider the same timers as SPF for delaying PRC. We consider that E is using a SPF trigger strategy that always compute a Full SPF for any change, and uses the exponential backoff strategy for SPF delay (start=150ms, inc=150ms, max=1s)
We also consider the following sequence of events:
Time | Network Event | Router S events | Router E events |
---|---|---|---|
t0=0 | Prefix DOWN | ||
10ms | Schedule PRC (in 150ms) | Schedule SPF (in 150ms) | |
160ms | PRC starts | SPF starts | |
161ms | PRC ends | ||
162ms | RIB/FIB starts | ||
163ms | SPF ends | ||
164ms | RIB/FIB starts | ||
175ms | RIB/FIB ends | ||
178ms | RIB/FIB ends | ||
200ms | Prefix UP | ||
212ms | Schedule PRC (in 150ms) | ||
214ms | Schedule SPF (in 150ms) | ||
370ms | PRC starts | ||
372ms | PRC ends | ||
373ms | SPF starts | ||
373ms | RIB/FIB starts | ||
375ms | SPF ends | ||
376ms | RIB/FIB starts | ||
383ms | RIB/FIB ends | ||
385ms | RIB/FIB ends | ||
400ms | Prefix DOWN | ||
410ms | Schedule PRC (in 300ms) | Schedule SPF (in 300ms) | |
710ms | PRC starts | SPF starts | |
711ms | PRC ends | ||
712ms | RIB/FIB starts | ||
713ms | SPF ends | ||
714ms | RIB/FIB starts | ||
716ms | RIB/FIB ends | RIB/FIB ends | |
1000ms | S-D link DOWN | ||
1010ms | Schedule SPF (in 150ms) | Schedule SPF (in 600ms) | |
1160ms | SPF starts | ||
1161ms | SPF ends | ||
1162ms | Micro-loop may start from here | RIB/FIB starts | |
1175ms | RIB/FIB ends | ||
1612ms | SPF starts | ||
1615ms | SPF ends | ||
1616ms | RIB/FIB starts | ||
1626ms | Micro-loop ends | RIB/FIB ends |
In the Table 1, we can see that due to discrepancies in the SPF management, after multiple events of a different type, the values of the SPF delay are completely misaligned between node S and node E, leading to the creation of micro-loops.
The same issue can also appear with only a single type of event as shown below:
Time | Network Event | Router S events | Router E events |
---|---|---|---|
t0=0 | Link DOWN | ||
10ms | Schedule SPF (in 150ms) | Schedule SPF (in 150ms) | |
160ms | SPF starts | SPF starts | |
161ms | SPF ends | ||
162ms | RIB/FIB starts | ||
163ms | SPF ends | ||
164ms | RIB/FIB starts | ||
175ms | RIB/FIB ends | ||
178ms | RIB/FIB ends | ||
200ms | Link DOWN | ||
212ms | Schedule SPF (in 150ms) | ||
214ms | Schedule SPF (in 150ms) | ||
370ms | SPF starts | ||
372ms | SPF ends | ||
373ms | SPF starts | ||
373ms | RIB/FIB starts | ||
375ms | SPF ends | ||
376ms | RIB/FIB starts | ||
383ms | RIB/FIB ends | ||
385ms | RIB/FIB ends | ||
400ms | Link DOWN | ||
410ms | Schedule SPF (in 150ms) | Schedule SPF (in 300ms) | |
560ms | SPF starts | ||
561ms | SPF ends | ||
562ms | Micro-loop may start from here | RIB/FIB starts | |
568ms | RIB/FIB ends | ||
710ms | SPF starts | ||
713ms | SPF ends | ||
714ms | RIB/FIB starts | ||
716ms | Micro-loop ends | RIB/FIB ends | |
1000ms | Link DOWN | ||
1010ms | Schedule SPF (in 1s) | Schedule SPF (in 600ms) | |
1612ms | SPF starts | ||
1615ms | SPF ends | ||
1616ms | Micro-loop may start from here | RIB/FIB starts | |
1626ms | RIB/FIB ends | ||
2012ms | SPF starts | ||
2014ms | SPF ends | ||
2015ms | RIB/FIB starts | ||
2025ms | Micro-loop ends | RIB/FIB ends | |
Using the same event sequence as in Table 1, we may expect fewer and/or shorter micro-loops using a standardized SPF delay.
Time | Network Event | Router S events | Router E events |
---|---|---|---|
t0=0 | Prefix DOWN | ||
10ms | Schedule PRC (in 150ms) | Schedule PRC (in 150ms) | |
160ms | PRC starts | PRC starts | |
161ms | PRC ends | ||
162ms | RIB/FIB starts | PRC ends | |
163ms | RIB/FIB starts | ||
175ms | RIB/FIB ends | ||
176ms | RIB/FIB ends | ||
200ms | Prefix UP | ||
212ms | Schedule PRC (in 150ms) | ||
213ms | Schedule PRC (in 150ms) | ||
370ms | PRC starts | PRC starts | |
372ms | PRC ends | ||
373ms | RIB/FIB starts | PRC ends | |
374ms | RIB/FIB starts | ||
383ms | RIB/FIB ends | ||
384ms | RIB/FIB ends | ||
400ms | Prefix DOWN | ||
410ms | Schedule PRC (in 300ms) | Schedule PRC (in 300ms) | |
710ms | PRC starts | PRC starts | |
711ms | PRC ends | PRC ends | |
712ms | RIB/FIB starts | ||
713ms | RIB/FIB starts | ||
716ms | RIB/FIB ends | RIB/FIB ends | |
1000ms | S-D link DOWN | ||
1010ms | Schedule SPF (in 150ms) | Schedule SPF (in 150ms) | |
1160ms | SPF starts | ||
1161ms | SPF ends | SPF starts | |
1162ms | Micro-loop may start from here | RIB/FIB starts | SPF ends |
1163ms | RIB/FIB starts | ||
1175ms | RIB/FIB ends | ||
1177ms | Micro-loop ends | RIB/FIB ends |
As displayed above, there could be some other parameters like router computation power, flooding timers that may also influence micro-loops. In all the examples in this document comparing the SPF timer behavior of router S and router E, we have made router E a bit slower than router S. This can lead to micro-loops even when both S and E use a common standardized SPF behavior. However, we expect that by aligning implementations of the SPF delay, service providers may reduce the number and the duration of micro-loops.
This document does not introduce any security consideration.
Authors would like to thank Mike Shand and Chris Bowers for their useful comments.
This document has no action for IANA.
[RFC1195] | Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, DOI 10.17487/RFC1195, December 1990. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC2328] | Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998. |
[RFC8174] | Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. |
[RFC8405] | Decraene, B., Litkowski, S., Gredler, H., Lindem, A., Francois, P. and C. Bowers, "Shortest Path First (SPF) Back-Off Delay Algorithm for Link-State IGPs", RFC 8405, DOI 10.17487/RFC8405, June 2018. |
[I-D.ietf-rtgwg-microloop-analysis] | Zinin, A., "Analysis and Minimization of Microloops in Link-state Routing Protocols", Internet-Draft draft-ietf-rtgwg-microloop-analysis-01, October 2005. |
[RFC6976] | Shand, M., Bryant, S., Previdi, S., Filsfils, C., Francois, P. and O. Bonaventure, "Framework for Loop-Free Convergence Using the Ordered Forwarding Information Base (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July 2013. |
[RFC8333] | Litkowski, S., Decraene, B., Filsfils, C. and P. Francois, "Micro-loop Prevention by Introducing a Local Convergence Delay", RFC 8333, DOI 10.17487/RFC8333, March 2018. |