RTGWG | P. Thubert, Ed. |
Internet-Draft | Cisco |
Intended status: Standards Track | P. Bellagamba |
Expires: January 3, 2015 | Cisco Systems |
July 2, 2014 |
Available Routing Constructs
draft-thubert-rtgwg-arc-02
This draft introduces the concept of ARC, a two-edged routing construct that forms its own fault isolation and recovery domain. The new paradigm can be leveraged to improve the network utilization and resiliency for unicast and multicast traffic in multiple environments.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 3, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Traditional routing and forwarding uses the concept of path as the basic routing paradigm to get a packet from a source to a destination by following an ordered sequence of arrows between intermediate nodes. In this serial design, a path is broken as soon as a single arrow is, and getting around a breakage can require path recomputation, network reconvergence, and incur delays to till service is restored.
Multiple paths can be bound together for instance to form a Directed Acyclic Graph (DAG) to a destination, but that technique can be difficult to balance and cannot provide a full path redundancy even in the case of a biconnected graph. For instance, if the node that is closest to the DAG destination has only one link to that destination, then it does not have a alternate path to get to that destination.
It is also possible to compute an alternate routing topology for fast rerouting to a given destination, in which case some signalling, tagging or labelling can be put in place to indicate whether a packet follows the normal path or was rerouted over an alternate topology. Once a packet is rerouted, it is bound to the alternate topology so only one breakage can be handled with looplessness guarantees in most practical situations.
This draft introduces the concept of an Available Routing Construct (ARC) as a routing construct made of a bidirectional sequence of nodes and links with 2 outgoing edges, so that, upon a single breakage, each lively node in along ARC can still reach one of the outgoing edges.
The routing graph to reach a certain destination is expressed as a cascade of ARCs, each ARC providing its own independent domain of fault isolation and recovery. Unicast traffic may enter an ARC via any node but it may only leave the ARC through one of its two edges. One node along the ARC is designated as the Cursor. In normal unicast operations, the traffic inside an ARC flows away from the Cursor towards an edge. Upon a failure, packets may bounce on the breakage point and flow the other way along the ARC to take the other exit.
Aa a result an ARC is resilient to any single failure, and the recovery can be driven either from the data plane or the control plane. A second failure occurring within a same ARC will isolate an ARC segment. This can be further corrected from the control plane by reversing all the incoming Edges in a process that might recurse till an exit is found. When ARC reversal is applied, an ARC topology is resilient to some cases of Shared Risk Link Group (SRLG) failures.
This draft presents the concept and provides an intuition of how ARCs can simplify the operation and improve the network utilization and resiliency for all sorts of traffic in multiple environments, but defers to further documents to elaborate on the algorithms and optimizations in the different application domains.
For instance, ARCs can also be used in datacenters for the purpose of fast-reroute, or within a service provider network to simplify load balancing operations or leverage optimally the ring topologies [RFC5921]. An ARC topology can be flooded over itself and serve as a backbone for bicasted [I-D.thubert-rtgwg-arc-bicast] and reliable multicasted operations.
The definition of the constituent parts of the "OAM" term is found in [RFC6291].
| | V C+EAN /|\ / | \ | V | V V E: Edge ARC Node -| collapsed in a single node C: Cursor -|
Figure 1: Collapsed ARC
| \ | | | \ | | | V V V | _->IAN<---->IAN<---->IAN<---->IAN<-_ | / + C \ | / \| V V EAN EAN | /|\ | / | \ | | V | V V V IAN: Intermediate ARC Node -| EAN: Edge ARC Node |- All are Safe Nodes C: Cursor -|
Figure 2: Infrastructure ARC
| \ | | | \ | | | V V V | _->IAN<---->IAN<---->IAN<---->IAN<-_----->IAN<-_ / + C \ | \ / \| \ V V V EAN EAN EAN | /|\ | | / | \ | | | V | | V V V V
Figure 3: Comb with Buttressing ARC
The draft uses the following terminology:
H2<==>H<==>H1<---I--->J1<==>J--->K1<===>K | | | | | | | | | | V V V V V D1<==>D<==>D3 E1<==>E F1<==>F<==>F2 G | | | | | | / \ | | | | | | / \ V V V V V V V V B1<==>B2<==>B3<==>B--->A<==>A1<------C2<==>C<==>C4 | | | | | | | | | V V | +--------------------> Omega <-------------------+
Figure 4: Routing Graph View
+========I========+ | | | +====J====+ | | | +====H====+ | +=====K=====+ | | | | | +====D====+ +====E====+ +====F====+ +====G====+ | | | | | | | | +=========B=========+ | | +=========C=========+ | | | | | | | +======A=======+ | | | | | ------------------------------------------------------------------Omega
Figure 5: ARC Representation
+====+ +====+ +====+ +====+ | H | <--- | I | ---> | J | ---> | K | | \__ | ___/ | | \ | / | V _| V |_ V +====+ +====+ +====+ +====+ | D | | E | | F | <--- | G | \ \ __/ \__ __/ \__ / / \ \ / \ / \ / / _| _| |_ _| |_ _| |_ |_ +====+ +====+ +====+ | B | ---> | A | <--- | C | | | | | V V V V ------------------------------------------------------------------Omega
Figure 6: ARC DAG
An ARC Set can be represented in a number of fashions:
This section has to be refined. ARCs probably apply to both unicast and multicast and the authors expect further documents to explain how that is done. The examples below are provided as an indication but is not limiting the field of applications.
In normal conditions, only the Cursor may distribute its traffic between the two Edge Nodes. If an Edge Node is still congested after the Cursor forwards all its traffic towards the other Edge Node, then the Cursor can be moved towards the congested Edge in order to derive even more traffic towards the other Edge. If both Edges are congested, then a back-pressure can be applied on the incoming ARCs so that they move their own traffic towards their own alternate Edge. The process may recurse.
It is expected that control frames similar to those defined for MPLS Fault Management Operations, Administration, and Maintenance (OAM) [RFC6291] will echo from Edge Node to Edge Node provide information such as liveliness and load. In order to establish a control loop between the Edge Nodes and the Cursor, the OAM frame would carry at least a logical information whether:
If the load should be decreased towards of congested Edge Node and the load may be increased towards the other then the Cursor may adjust its balancing of the load, or move Cursor ownership towards the congested Edge if it is already redirecting all the traffic towards the non-congested Edge.
If the Cursor is balancing traffic away from the default position due to a past congestion notification and the Edge that was congested now reports that the load may be increased, then the reverse operation can happen and the Cursor may rebalance the load back to the original position taking the reverse steps as above.
If the OAM can not be forwarded due to a link or a node failure, then the last node towards the broken Edge becomes Cursor and echoes the OAM frames advertising that it is an Edge node that is blocked, not capable of forwarding data down to the next ARC.
If both Edges are experiencing a congestion then the condition should be reported to the Edge Nodes of all incoming ARCs. Same goes when both Edges are blocked.
The ARC methods may be used to build and/or leverage routing hierarchies, allowing high availability at multiple hierarchical levels. In one hand, the view of an ARC Set can be simplified by abstracting an ARC as a node in a DAG. The view of the routing topology is thus simplified, as illustrated in Figure 6. In the other hand, ARCs may be used inside a sub-topology, such as a ring, to enable forwarding inside a ring towards a next ring. Then, abstracting a full ring as a node, ARCs can be applied to a graph of rings, providing another level of redundancy and an abstract end to end path computation that is represented as a cascade of ARCs of rings.
The open Lowest ARC First(oLAF) algorithm is presented below in such a way as to help the reader figure how an ARC Set can be obtained but not in a computer-optimized fashion that is left to be determined. oLAF is based on Dijkstra's algorithm for Shortest Path First (SPF) computation, and is designed in such a fashion that the reverse SPF tree towards a destination is conserved and preferred for forwarding along the resulting ARC Set.
We make the computation on behalf of Omega, that is an abstraction, but could represent the node or the set of nodes that we want to reach with an ARC Set. If Omega is instantiated as an actual destination node, then that node may be a fine location for an ARC Computing Engine.
So we start with an proverbial Initial Set of Nodes that are interconnected by Links, and Omega that is the destination that we want to reach with an ARC Set.
If there is no Heir, we're done. If there is a single Heir then the graph is monoconnected, so we restart the computation taking that Heir off the Set of Nodes and making it Omega.
Else, if Omega is a single Node, or if Omega is composed of multiple nodes but we are willing to accept that both ends of an ARC terminate in a same node in Omega, then we create virtual Omega nodes, a minimum of two and at most one per Heir, and we make them the new Omega. Note: we need at least two destinations because both ends of an ARC cannot terminate in a same node.
Now we can start building an ARC Set towards the resulting Omega.
In this process, we create so-called Dependent Sets of nodes, each owned by a Safe Node S, DSet(S). DSet(S) contains nodes that are not determined to be Safe at the current stage of the computation and for which S, the owner Safe Node, is the last single point of failure on the shortest path tree to Omega. It results that a given node can be at most in one DSet, and that a Safe Node belongs to its own DSet.
For each node S in Omega we create a DSet(S) in which we place S.
And then the process goes like this:
We select the node in the Set of Nodes that is closest to Omega using the cost towards Omega as if we were building a traditional reverse SPF tree and we place the selected node in the same Dependent Set as its parent in the reverse SPF tree. Note that for a Heir, the parent might be a real node in Omega, or a virtual Omega node.
If we kept it at that, we would be building subtrees that are hanging off a Safe Node and together would represent the reverse shortest path tree towards Omega, each subtree being grown separately inside DSet(S) where S is the (virtual) Safe node that is the root of the subtree.
But once we have placed the selected node in a DSet, we consider its neighbors one by one. If at least one of the neighbors is already in a different DSet than this node, we select the neighbor that provides the shortest alternate path to Omega for the selected node.
Doing so, we have isolated two paths:
Because the two sets are different and have no intersection, these paths are non-congruent. And because the two non-congruent paths lead to two different Safe Nodes, this node is Safe.
It might happen that:
If both conditions are met for a same AN, then that AN forms a collapsed ARC by itself.
Now we form an ARC as follows:
For each ARC Node along the ARC:
The nodes left in the Dependent Sets but the owner Safe Node are still not Safe. They are moved back to the original Set of Nodes to enable forming additional ARCs which might depend on this ARC in the ARC Set.
We are done processing the particular node we had picked in the original Set of Nodes. If the Set of Nodes as it stands now is not empty, we continue from Section 5.2.
If the Set of Nodes went empty, we are done with this pass and we consider the Dependent Sets that we have put together. In a biconnected graph, there should be one set per node and one node per set, denoting that every node is a Safe Node.
If some portion of the graph is monoconnected, then each monoconnected portion forms the Dependent Set of the Safe Node that is its single point of failure. In order to be maximally redundant, we need to form the ARCs again, within the Dependent Set.
To do so, we remove the Safe Node from the Dependent set and make it Omega. We make the resulting DSet our Set of Nodes and run the algorithm again.
This may recurse a number of times if the graph has monoconnected zones within others.
Under normal conditions, the traffic flows away from the Cursor of the current ARC and cascades into the next ARC on that side of the Cursor, with the Height of the current ARC decreasing monotonically from ARC to ARC till Omega is reached.
The same goes for a generic Comb construct. When Buttressing ARCs are applied on a main ARC or other Buttressing ARCs, the final construct assumes the shape of a tree. The tree may be walked in different manners but the shortest path requires to start going down the current ARC or Buttressing ARC to its Edge.
In case of Label forwarding, the same recursivity is applied and a multiple ARC label path is constructed. Each ARC has is own set of label path per Omega, each ARC Set label path being merged into the lower ARC label set, thus at the interconnection point. At minimum, ARC label path should be built from the Cursor toward each edge, but this would require label path recompilation upon Cursor move, the proposed approach is then to build for the normal flow to an Omega one pair of label path from edge to edge.
As this label construct maps the ARC topology with local significant label, the Label Distribution Protocol (LDP) could be reused to announce label association to neighbors on the ARC.
Upon a breakage inside an ARC, until a corrective action takes place, some traffic will be lost. The corrective action might be either operated at the control plane or the data plane, if immediate action and near-zero packet loss is required.
Upon a first breakage in an ARC, the Cursor is moved to the breakage point, either a node or a link, so that traffic flows away from the Cursor again.
Upon a second breakage within a same ARC, a segment of the ARC is now isolated. Both breakage points become sinks till an additional corrective action, such as modifying the ARC Set, takes place. All incoming links in the isolated segment are blocked , causing the traffic to exit at the other end of the incoming ARCs.
Blocking an Edge Link in the incoming ARC may create an isolated segment in the incoming ARC as well if it is a second breakage there too, or if both edges of the incoming ARC tterminate in the broken segment. In that case the process recurses and the broken zone can be determined as the collection of the isolated segments.
If a segment of an ARC is getting isolated by a dual failure but that ARC segment has incoming Edges then the ARC can be reversed. This reversal is done by reversing of all the incoming Edges, which become outgoing. The segment that was isolated now benefits from multiple exits in a loop free fashion. This process might in turn isolate a segment of an ARC that was incoming and the process recurses and some links flap. If a real exit exits the process will stabilize, but a count to infinity must be put in place to avoid a permanent flapping when a whole ARC Subset is physically isolated. One may consider that this process is in fact the classical link reversal technique, as applied to the DAG of ARCs.
Upon a breakage inside an ARC, it is possible in the data plane to reverse the direction of -to turn- a given packet once along the ARC so the packets exits over the other Edge Link. But in order to avoid loops, it is undesirable to reverse the direction of a given packet a second time.
Note that once a given packet leaves an ARC to enter the next, it is free to bounce again in the next ARC. In other Words, the domain that is impacted by a turn is limited to the current ARC itself; the ARC forms the event horizon wherein the notion that a turn happened may cause a loop.
So a local strategy must be put in place inside an ARC to allow a given packet to bounce once upon a breakage, and get dropped upon a second breakage.
In the case of IP packet forwarding, a packet can be tagged when it bounces inside an ARC, or when it passes the Cursor, for instance by reserving a TOS bit for that purpose. When the packet bounces, the bit is set and when the packet leaves the ARC, the bit is reset and may be used again in the next ARC. In the generic case of a Comb, a strategy must be put in place to walk the structure and drop a packet that tries all the Edges. it attempts to pass the Cursor twice in a same direction, meaning that more than a full walk was already accomplished.
In the case of MPLS forwarding, the same result can be achieved with either 3 or 4 Labels Switched Paths (LSPs) along the ARC.
This specification describes a generic model. Protocols and management will come later
This specification does not require IANA action.
This specification is not found to introduce new security threat.
The authors wishes to thank Dirk Anteunis, Stewart Bryant, IJsbrand Wijnands, George Swallow, Eric Osborne, Clarence Filsfils and Eric Levy-Abegnoli for their participation and continuous support to the work presented here.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC6291] | Andersson, L., van Helvoort, H., Bonica, R., Romascanu, D. and S. Mansfield, "Guidelines for the Use of the "OAM" Acronym in the IETF", BCP 161, RFC 6291, June 2011. |
[I-D.thubert-rtgwg-arc-bicast] | Thubert, P. and I. Wijnands, "Applying Available Routing Constructs to bicasting", Internet-Draft draft-thubert-rtgwg-arc-bicast-01, October 2013. |
[RFC5921] | Bocci, M., Bryant, S., Frost, D., Levrau, L. and L. Berger, "A Framework for MPLS in Transport Networks", RFC 5921, July 2010. |