Network Working Group | B. Decraene |
Internet-Draft | Orange |
Intended status: Standards Track | C. Bowers |
Expires: September 10, 2020 | Jayesh. J |
Juniper Networks, Inc. | |
T. Li | |
Arista Networks | |
G. Van de Velde | |
Nokia | |
March 9, 2020 |
IS-IS Flooding Parameters advertisement
draft-decraene-lsr-isis-flooding-speed-03
This document proposes a mechanism that can be used to increase the speed at which link state information is exchanged between two routers when multiple LSPs need to be flooded, such as in case of a node failure. It also reduces the likelihood of overloading the router receiving the LSPs. This document defines a new TLV to be advertised in SNP and or Hello messages. This TLV may carry a set of parameters indicating the performance capacity to receive LSPs: the number of LSPs which can the received back to back, the minimum delay between further two consecutive LSPs and the minimum delay before retransmission of an LSP.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 10, 2020.
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
IGP flooding is paramount for Link State IGP as routing computations assume that the Link State DataBases (LSDBs) are always in sync across all nodes in the flooding domain.
Slow flooding directly translates to delayed network reaction to failure and LSDB inconsistencies across nodes. The former increases packet loss. The latter translates to routing inconsistencies and possibly micro-loops leading to packet loss, link overload, and jitter for all classes of service. Note that across the network, multiple links may be affected by these forwarding issues, even in the case of a single link failure.
In addition, one single event in the network can require the flooding of multiple LSPs. The typical case is a node failure which requires the flooding of at least one LSP per neighbor of the failed node. Hence, if a node has N IGP neighbors, the failure of this node requires the advertisement and flooding of at least N LSPs. The network won't be able to converge to the new topology until all N LSPs are received by all nodes. Hence there is a need to be able to quickly exchange N LSPs. This document addresses this requirement by allowing the fast flooding of some number of consecutive LSPs.
IGP flooding is hard. One would want fast flooding when the network is stable and slow enough flooding to not overload the neighbor(s) when the network is less stable. Since flooding is performed hop by hop, not overloading the adjacent receiver is sufficient. This document addresses these requirements by having the receiving node advertise the rate at which it can receive LSPs, using a TLV in SNP and/or IS-IS Hello (IIH) messages. This allows the LSP transmitter to adapt to the receiver capability and to send LSP quickly, but not too quickly. This avoids both unnecessary transmission delays and overloading the receiving IS. Multiple flooding parameters may be advertised through the use of sub-TLVs.
One parameter in the advertisement is the LSP receive window. This is the number of un-acknowledged LSPs that the IS transmitter may send at any rate, including back to back.
Another parameter in the advertisement is the shaping delay between two consecutive LSPs, once the received window is full.
Note that this parameterization of flooding behavior is aligned with existing implementations: with an LSP receive window of 1, most implementations already implement the shaping between LSPs. And some implementations allows for the fast sending of N LSPs with no shaping delay. Existing implementations rely on parameters statically configured on the transmitter to control the transmission rate. However, the need is to prevent overloading the receiver. In theory, the transmission rate parameter could be configured on each IS transmitter using the knowledge of each of its neighbor in the topology and the receiving capabilities of those neighbors. However, in practice, this configuration is difficult to maintain over time as the network topology change. In addition, as things currently stand, each network operator needs to evaluate the receiving capacity of each type of platform, depending on its hardware, software version and number of IS-IS adjacencies. Such platform performance is better known by its designer (the vendor). Even if validation tests are required, one single validation test by the vendor is more effective than N validations from N network providers. Finally, the reasoning behind the original choices of default value is not clear. Default values have largely remained unchanged over many years, despite very large increases in interface speeds and processing speed. This has resulted in default values that are very sub-optimal. For example, typical default values are one LSP per 33ms or 100ms, resulting in the ability to only send 30 or 10 LSPs per second. In contrast, the same vendors recommend setting a BGP DDoS policer to 10,000 packets per second, which is two or three order of magnitude higher.
A third parameter in the advertisement is the minimum delay before re-transmitting a lost LSP.
Improving the communication speed and efficiency between IS-IS neighbhors improves IS-IS scaling. These extensions do not compete with proposed extensions to reduce LSP flooding traffic by reducing the flooding topology such as [I-D.ietf-lsr-dynamic-flooding]. Instead, the extensions complement those proposals. Indeed reducing the flooding topology does not reduce the size of the LSDB or the total number of LSPs to exchange between two nodes. So increasing the overall flooding speed can be beneficial for nodes implementing dynamic flooding. The reverse is also true: as dynamic flooding reduces the number of neighbhors with flooding enabled, this allows nodes implementing the flooding parameter extensions to focus their flooding resources on those neighbhors by sending better parameters to the selected flooding nodes and worse parameters to non-selected flooding nodes.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC 2119 RFC 8174 when, and only when, they appear in all capitals, as shown here.
This document defines a new TLV called "Flooding Parameters TLV" that may be included in SNP and/or IIH PDUs.
Type: TBD1.
Length: variable, the size in octet of the Value field.
Value: a list of sub-TLVs.
Three sub-TLVs are defined in this document.
The sub-TLV InterfaceLSPReceiveWindow advertises the maximum number of un-acknowledged LSPs that the node can receive/process with no separation interval between LSPs.
Type: 1.
Length: 4 octets.
Value: number of un-acknowledged LSPs which can be sent back to back .
The sub-TLV minimumInterfaceLSPTransmissionInterval advertises the minimum interval, in micro-seconds, between LSPs arrivals which can be processed/received on this interface, once the maximum number of un-acknowledged LSPs has been sent.
Type: 2.
Length: 4 octets.
Value: minimum interval, in micro-seconds, between two consecutive LSPs sent after the receive window has been used.
The sub-TLV minimumLSPTransmissionInterval advertises the ISO minimumLSPTransmissionInterval, in micro-seconds, that the LSP transmitter may use.
Type: 3.
Length: 4 octets.
Value: minimum interval, in micro-seconds, before further propagating another Link State PDU from the same source system.
By sending the InterfaceLSPReceiveWindow sub-TLV with a value N1, the node advertises to its IS-IS neighbor, its ability to receive, over that interface, a maximum of N1 un-acknowledged LSPs with no separation interval. This is akin to a reception window or sliding window in flow control.
By sending the minimumInterfaceLSPTransmissionInterval sub-TLV with a value N2, the node advertises to its IS-IS neighbor, its ability to receive, over that interface, after the receive window is full, LSPs separated by at least N2 micro-seconds.
The IS transmitter MUST NOT exceed these parameters. After having send N1 un-acknowledged LSPs, it MUST send the following LSPs with an interval of at least N2 micro-seconds between each LSP.
Note however that if either the LSP transmitter or receiver does not adhere to these parameters, for example because of transient conditions, this causes no fatal condition to the operation of IS-IS. The worst case, the loss of LSP on the IS receiver, is already accounted for in [ISO10589]. As per [ISO10589], after a few seconds, respectively 2 and 10 by default in [ISO10589], neighbors will exchange PSNP (for point to point interface) or CSNP (for broadcast interface) and recover from the lost LSPs. This worst case, overrunning the receiver, should however be avoided as those additional seconds are impacting the network and the traffic as the LSDB in not fully synchronized. Hence it is better to err on the conservative side and to underun the receiver rather then overrun it.
For a given IS-IS adjacency, the Flooding Parameters TLV does not need to be advertised in each SNP and IIS. The IS transmitter uses the latest value received of each parameter (sub-TLV) until a new value is advertised by the IS receiver. Note however that CSNP and IIH are not reliability exchanged, hence some PDU may never be received. For a parameter which has never been advertised, the IS transmitter use its local default value. That value SHOULD be configurable on a per node basis and MAY be configurable on a per interface basis.
As per [ISO10589], on point to point interfaces, the LSP receiver dynamically acknowledges the received LSPs by sending PSNP messages. By acknowledging the LSPs before the InterfaceLSPReceiveWindow is exhausted, the receiver can achieve dynamic flow control and increase the flooding speed without risking to overload any IS-IS router. If the InterfaceLSPReceiveWindow is large enough, the downstream flooding node can acknowledge a set of multiple LSPs up to the maximum size of a PSNP (90 LSPs) which allows dynamic flow control with limited or even no increasing in the number of PSNPs.
The way LSPs are acknowledged faster is a local decision on the receiving IS. Without limiting the possibilities, there are at least two options:
As per [ISO10589], an IS transmitter resends a un-acknowledged LSP no sooner than minimumLSPTransmissionInterval, which is 5 seconds by default. As this document allows the faster transmission of LSP acknoledgement, the transmitter should be able to retransmit faster, with a delay compatible (higher) than the partialSNPInterval or the delay needed to acknowledge the received LSPs.
The reception of the parameter minimumLSPTransmissionInterval means that the IS transmitter MAY set its minimumLSPTransmissionInterval to this value or higher.
The interval advertised in minimumLSPTransmissionInterval MUST be higher than the effective partialSNPInterval of the receiver plus the Round Trip Time (RTT) of the interface. The effective partialSNPInterval of the receiver is the maximum amount of time that the receiver is expected to take to acknowledge the LSP. This would be the partialSNPInterval on a receiver following only [ISO10589], or an effective value if the receiver has implemented a faster method to acknowledge LSPs faster, as discussed in Section 4 . The goal is that the receiver should not be telling the transmitter to resend un-acknowledged LSPs after waiting for a time shorter than the receiver is planning acknowledge LSPs it has actually received.
An IS receiver MAY update this value depending on certain conditions. For example, it can advertise a higher minimumLSPTransmissionInterval value when a large number of LSPs are been received and hence it is busy. Or it can advertise a lower value when an LSP storm has passed, especially if there is reason to believe that some LSPs may have been lost.
On a LAN interface an IS receiver will generally receive LSPs from many IS transmitters. And the LSPs sent by a given IS transmitter will be received by all of the IS receivers. In this section, we clarify how the flooding paramaters should be interpretted in the context of a LAN.
An IS receiver on a LAN will communicate its desired flooding paramaters using a single Flooding Parameters TLV, copies of which will be received by all N transmitters. The flooding parameters sent by the IS receiver MUST be understood as instructions from the receiver to each transmitter about the desired maximum transmit characteristics of each transmitter. For example, the receiver will be aware that there are N transmitters that can send LSPs to the receiver LAN interface. In this example, the receiver might want to take that into account by advertising a higher value of InterfaceLSPTransmissionInterval on this LAN interface than what it would advertise on a point to point interface. When the transmitters receive the InterfaceLSPTransmissionInterval value advertised by the DIS receiver, the transmitters should rate limit LSPs according to the advertised flooding parameters. They should not apply any further interpretation to the flooding parameters advertised by the receiver.
On the other hand, a given IS trasmitter will receive flooding paramater advertisements using N different Flooding Parameters TLVs, which could carry different flooding parameter values. A given transmitter SHOULD adjust the flooding behavior on this LAN interface such that none of the receivers receives more un-acknowledged LSPs or LSPs at a higher rate than indicated by their individual flooding parameter advertisements.
In order for the InterfaceLSPReceiveWindow to be a useful parameter, an IS transmitter needs to be able to keep track of the number of un-acknowledged LSPs it has sent to a given IS receiver. On a LAN there is no explicit acknowledgement of the reciept of LSPs between a given IS transmitter and a given IS receiver. However, an IS transmitter on a LAN can infer whether or not any IS receivers on the LAN have requested retransmission of LSPs from the DIS by monitoring PSNPs generated on the LAN. If no PSNPs have been generated on the LAN for a suitable period of time, then an IS transmitter can safely set the number of un-acknowledged LSPs to zero.
[ISO10589] describes a mechanism that limits the rate at which LSPs from the same source system are sent out on interfaces. (See the description of the parameter minimumBroadcastLSPTranLSPTransmissionInterval in section 7.3.15.6 of [ISO10589] .) In practice, however, router vendors have implemented mechanisms that limit the rate of LSPs sent on a given interface. This is often configurable on a per-interface basis using 'lsp-interval' or 'lsp-pacing-interval' CLI configuration.) The mechanism described in the current document extends the practice of limiting the rate of LSPs sent on a given interface, by using parameters advertised by the LSP receiver. When the mechanism described in the current document is used, the mechanism described in section 7.3.15.6 of [ISO10589] is not used.
The values that a receiving IS advertises do not need to be close to perfection. It is OK to be too low and hence not to use the full bandwidth or CPU resources. It is OK to be too high during some situation and hence have the receiver drop some LSPs as the IS-IS protocol has mechanisms to recover. What is not OK is to flood multiple order of magnitudes slower than both nodes can achieve, or to consistently overload the receiver.
The values may not need to be dynamic as a form of dynamicity is provided by the dynamic acknowledgment of LSPs in SNP messages which provides a feedback loop on how fast/slower the LSPs are processed by the receiver. By advertising relatively static parameters, we expect to produce overall flooding behavior similar to what might be achieved by manually configuring per-interface LSP rate limiting on all interfaces in the network. The advertised values may be based, for example, on an off line tests of the overall LSP processing speed for a particular set of hardware and the number of interfaces configured for IS-IS. With such a formula, the values advertised in the Flooding Parameters TLV would only change when additional IS-IS interfaces are configured.
Nevertheless, the values may also be changed dynamically. In this case, care must be taken when choosing the parameters influencing the values, in order to avoid undesirable feedback loops. It would be undesirable to use a formula that depends, for example, on an active measurement of the instantaneous CPU load to modify the values advertised in the Flooding Parameters TLV. This could introduce feedback into the IGP flooding process that could produce unexpected behavior. The value may also be based on average measured flooding statistics: if LSPs are regularly dropped, or the queue regularly comes close to being filled, then values may be too high. On the other hand, if the queue is barely used (by IS-IS), then values may be too low.
IANA is requested to allocate one TLV from the IS-IS TLV codepoint registry.
Type Description IIH LSP SNP Purge ---- --------------------------- --- --- --- --- TBD Flooding Parameters TLV y n y n
Figure 1
TBD: registry for sub-TLVs of the Flooding Parameters TLV.
Any new security issues raised by the procedures in this document depend upon the ability of an attacker to inject a false but apparently valid SNP, the ease/difficulty of which has not been altered.
As with others TLV advertisements, the use of a cryptographic authentication as defined in [RFC5304] or [RFC5310] allows the authentication of the peer and the integrity of the message. As this document defines a TLV for SNP message, the relevant cryptographic authentication is for SNP message.
In the absence of cryptographic authentication, as IS-IS does not run over IP but directly over the link layer, it's considered difficult to inject false SNP without having access to the link layer.
If a false SNP is sent with a Flooding Parameters TLV set to low values, the attacker can reduce the flooding speed between the two adjacent neighbors which can result in LSDB inconsistencies and transient forwarding loops. However, is not significantly different than filtering or altering LSPDUs which would also be possible with access to the link layer. In addition, if the downstream flooding neighbor has multiple IGP neighbors, which is typically the case for reliability or topological reasons, it would receive LSPs at a regular speed from its other neighbors and hence would maintain LSDB consistency.
If a false SNP is sent with a Flooding Parameters TLV set to high values, the attacker can increase the flooding speed which can either overload a node or more likely generate loss of LSPs. However, it is not significantly different than sending many LSPs which would also be possible with access to the link layer, even with cryptographic authentication enabled. In addition, IS-IS has procedures to detect the loss of LSPs and recover.
This TLV advertisement is not flooded across the network but only sent between adjacent IS-IS neighbors. This would limit the consequences in case of forged messages, and also limits the dissemination of such information.
The authors would like to thank Henk Smit for his review and comments.
[ISO10589] | International Organization for Standardization, "Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473)", ISO/IEC 10589:2002, Second Edition, Nov 2002. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC5304] | Li, T. and R. Atkinson, "IS-IS Cryptographic Authentication", RFC 5304, DOI 10.17487/RFC5304, October 2008. |
[RFC5310] | Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R. and M. Fanto, "IS-IS Generic Cryptographic Authentication", RFC 5310, DOI 10.17487/RFC5310, February 2009. |
[RFC8174] | Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. |
[I-D.ietf-lsr-dynamic-flooding] | Li, T., Psenak, P., Ginsberg, L., Chen, H., Przygienda, T., Cooper, D., Jalil, L. and S. Dontula, "Dynamic Flooding on Dense Graphs", Internet-Draft draft-ietf-lsr-dynamic-flooding-04, November 2019. |
[RFC Editor: Please remove this section before publication]
00: Initial version.
01: Two notes added in section 3 "Operation".
02: Refresh, no technical change.
03: Flooding Parameters TLV: name changed, moved to SNP rather than Hello, contains sub-TLVs, parameters encoded in 4 octets.
Terminology: upstream/downstream terms removed, in favor of terms from ISO specification (transmitter, receiver); burst-size rename to receive-window.
Significant editorials changes.
New section on the faster acknowledgment of LSPs.
New section on the faster retransmission of lost LSPs.