Internet DRAFT - draft-chen-isis-black-hole-avoid
draft-chen-isis-black-hole-avoid
Network Working Group Z. Chen
Internet-Draft Huawei
Intended status: Standards Track X. Xu
Expires: March 9, 2019 Alibaba
D. Cheng
Huawei
September 5, 2018
Avoiding Traffic Black-Holes for Route Aggregation in IS-IS
draft-chen-isis-black-hole-avoid-03
Abstract
When the Intermediate System to Intermediate System (IS-IS) routing
protocol is adopted by a highly symmetric network such as the Leaf-
Spine or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches
in datacenters) are recommended to be prevented from receiving other
nodes' explicit routes in order to achieve scalability. However,
such a setup would cause traffic black-holes or suboptimal routing if
link failure happens in the network. This document introduces
INFINITE cost to IS-IS LSPs to solve this problem.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 9, 2019.
Chen, et al. Expires March 9, 2019 [Page 1]
Internet-Draft IS-IS Black-Hole Avoiding September 2018
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Problem Description . . . . . . . . . . . . . . . . . . . . . 3
3. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
5. Security Considerations . . . . . . . . . . . . . . . . . . . 5
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 5
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction
When running the Intermediate System to Intermediate System (IS-IS)
routing protocol in a highly symmetric network such as the Leaf-Spine
or Fat-Tree network, the Leaf nodes (e.g., Top of Rack switches in
datacenters) are recommended to be prevented from receiving other
nodes' explicit routes in order to achieve scalability, as proposed
in [IS-IS-SL-Extension], [IS-IS-Overhead-Reduction], [RIFT], and
[OpenFabric]. In particular, each Leaf node SHOULD simply maintain a
default (or aggregated) route (e.g., 0.0.0.0/0) in its routing table,
of which the next hop SHOULD be an Equal Cost Multi Path (ECMP) group
including all Spines nodes that the Leaf node connects to. However,
such a setup would cause traffic black-holes or suboptimal routing if
link failure happens in the network, since the Leaf nodes are not
aware of any topology information.
To solve this problem, this document introduces INFINITE cost to IS-
IS LSPs. When link failure happens between a Spine node and a Leaf
node, the Spine node SHOULD advertise all prefixes attached to the
Leaf node, whose costs SHOULD be set to be INFINITE, to every other
Leaf node it connects to. On receiving the prefixes (with INFINITE
Chen, et al. Expires March 9, 2019 [Page 2]
Internet-Draft IS-IS Black-Hole Avoiding September 2018
cost), each Leaf node SHOULD add the prefixes to its routing table,
thus avoiding traffic black-holes and suboptimal routing.
2. Problem Description
This section illustrates why link failure would cause traffic black-
hole or suboptimal routing when Leaf nodes only maintain default (or
aggregated) routes.
+--------+ +--------+ +--------+
| Spine1 | | Spine2 | | Spine3 |
+-+-+-+-++ +-+-+-+-++ +-+-+-+-++
+------+ | | | | | | | | | | |
| +------|-|-|-------------+ | | | | | | X
| | +----|-|-|---------------|-|-|-------------+ | | X
| | | | | | +------+ | | | | X
| | | | | | | +------|-|---------------+ | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | +-------+ +-----+
| | | | | | | | | +---------|-------------+ |
| | | | | | | | +---------+ | | |
| | | | | +--------|-|----------------|-|-----------+ | |
| | | | +----------|-|--------------+ | | | | |
| | | +----------+ | | | | | | | |
+-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+
| Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 |
+-------+ +-------+ +-------+ +-------+
| |
--- ---
prefixA prefixB
Figure 1: Topology Example
Figure 1 shows a Spine-Leaf topology example where Leaf1 to Leaf4 are
connected to Spine1 to Spine3, and prefixA and prefixB are attached
to Leaf4. To achieve scalability, as proposed in [IS-IS-SL-
Extension], [IS-IS-Overhead-Reduction], [RIFT], and [OpenFabric],
Leaf1 to Leaf4 SHOULD NOT receive explict routes from each other nor
the Spine nodes. Instead, each of them maintains a default (or
aggregated) route (e.g., 0.0.0.0/0) in the routing table, of which
the next hop is an ECMP group including Spine1, Spine2, and Spine3.
Flows from one Leaf node to another are shared among Spine1, Spine2,
and Spine3 based on the well known 5-tuple hashing.
However, such a setup would cause traffic black-hole or suboptimal
routing when link failure happens in the network. For example, if
Chen, et al. Expires March 9, 2019 [Page 3]
Internet-Draft IS-IS Black-Hole Avoiding September 2018
the link between Spine3 and Leaf4 is broken, Leaf1, Leaf2, and Leaf3
could not get aware of the failure. As a result, these Leaf nodes
will still send a portion of traffic destined for prefixA or prefixB
toward Spine3, which makes the traffic be discarded at Spine3,
causing traffic black-hole. On the other hand, if there are a set of
links or a higher tier of switches interconnecting Spine1, Spine2,
and Spine3, the traffic will be steered to other spine nodes or the
higher-tier switches by Spine3, causing suboptimal routing.
Therefore, this document introduces INFINITE cost to IS-IS LSPs to
solve this problem.
3. Solution
This document introduces the INFINITE cost to IS-IS LSPs, whose value
is to be determined. When link failure happens between a Spine node
and a Leaf node, the Spine node SHOULD 1) encode all prefixes
attached to the Leaf node into the IP Reachability TLV, 2) set the
cost of the prefixes to be INFINITE, 3) append the IP Reachability
TLV to the IS-IS LSP, and 4) send the LSP to every other Leaf node it
connects to.
When a Leaf node receives the prefixes (with INFINITE cost)
advertised by a Spine node, it SHOULD install each of the prefixes
into its routing table, of which the next hop SHOULD be set an ECMP
group including all Spine nodes it connects to except the one who
advertises the prefix.
For example, if the link between Spine3 and Leaf4 in Figure 1 is
broken, Spine3 SHOULD advertise prefixA and prefixB to Leaf1, Leaf2,
and Leaf3, by sending them an IS-IS LSP containing the IP
Reachability TLV. The cost of prefixA and prefixB SHOULD be set
INFINITE. On receiving the LSP, Leaf1, Leaf2, and Leaf3 SHOULD
install prefixA and prefixB into their routing tables, and the next
hop of prefixA or prefixB SHOULD be set an ECMP group including
Spine1 and Spine2. For instance, the routing table of Leaf1 before
and after the link failure is shown in Figure 2 and Figure 3,
respectively.
Note that the mechanism described above could achieve minimal
signaling latency, which helps to avoid black-hole or suboptimal
routing rapidly when link failure happens.
Chen, et al. Expires March 9, 2019 [Page 4]
Internet-Draft IS-IS Black-Hole Avoiding September 2018
+-----------+-----+---+----+-----+-------+--------------+
|Destination|Proto|Pre|Cost|Flags|NextHop|Interface |
+-----------+-----+---+----+-----+-------+--------------+
|0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 |
| |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 |
| |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 |
+-----------+-----+---+----+-----+-------+--------------+
Figure 2: Routing Table of Leaf1 before link failure
+-----------+-----+---+----+-----+-------+--------------+
|Destination|Proto|Pre|Cost|Flags|NextHop|Interface |
+-----------+-----+---+----+-----+-------+--------------+
|0.0.0.0/0 |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 |
| |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 |
| |ISIS |15 |20 |D |Spine3 |Ethernet0/0/2 |
+-----------+-----+---+----+-----+-------+--------------+
|prefixA |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 |
| |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 |
+-----------+-----+---+----+-----+-------+--------------+
|prefixB |ISIS |15 |20 |D |Spine1 |Ethernet0/0/0 |
| |ISIS |15 |20 |D |Spine2 |Ethernet0/0/1 |
+-----------+-----+---+----+-----+-------+--------------+
Figure 3: Routing Table of Leaf1 after link failure
4. IANA Considerations
TBD.
5. Security Considerations
TBD.
6. Acknowledgements
TBD.
7. References
[IS-IS-Overhead-Reduction]
Chen, Z., Xu, X., and D. Cheng, "Overheads Reduction for
IS-IS Enabled Spine-Leaf Networks", draft-chen-isis-sl-
overheads-reduction-03 (work in progress) , March 2018.
Chen, et al. Expires March 9, 2019 [Page 5]
Internet-Draft IS-IS Black-Hole Avoiding September 2018
[IS-IS-SL-Extension]
Shen, N., Ginsberg, L., and S. Thyamagundalu, "IS-IS
Routing for Spine-Leaf Topology", draft-shen-isis-spine-
leaf-ext-06 (work in progress) , June 2018.
[OpenFabric]
White, R. and S. Zandi, "IS-IS Support for Openfabric",
draft-white-openfabric-06 (work in progress) , June 2018.
[RFC1195] Callon, R., "Use of OSI IS-IS for Routing in TCP/IP and
Dual Environments", RFC 1195 , December 1990.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic
Engineering", RFC 5305 , October 2008.
[RIFT] Przygienda, T., Sharma, A., Drake, J., and A. Atlas,
"RIFT: Routing in Fat Trees", draft-ietf-rift-rift-02
(work in progress) , June 2018.
Authors' Addresses
Zhe Chen
Huawei
No. 156 Beiqing Rd
Beijing 100095
China
Email: chenzhe17@huawei.com
Xiaohu Xu
Alibaba
Email: xiaohu.xxh@alibaba-inc.com
Dean Cheng
Huawei
Email: dean.cheng@huawei.com
Chen, et al. Expires March 9, 2019 [Page 6]