Internet DRAFT - draft-wh-rtgwg-adaptive-routing-arn

draft-wh-rtgwg-adaptive-routing-arn







Network Working Group                                            H. Wang
Internet-Draft                                                  H. Huang
Intended status: Standards Track                                  Huawei
Expires: 25 April 2024                                   23 October 2023


                   Notification for Adaptive Routing
                 draft-wh-rtgwg-adaptive-routing-arn-00

Abstract

   Large-scale supercomputing and AI data centers utilize multipath to
   implement load balancing and improve link reliability.  Adaptive
   routing (AR), which is widely used in direct topology such as
   dragonfly, can dynamically adjust routing policies based on path
   congestion and failures.  When congestion or failure occurs, in
   addition for the local node to apply AR, the congestion/failure
   information also needs to be sent to other nodes in a timely and
   accurate manner, so as to enforce AR in other nodes to avoid
   exacerbating congestion on the path.  This document specifies
   Adaptive Routing Notification (ARN) for disseminating congestion
   detection and congestion elimination proactively.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 25 April 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.



Wang & Huang              Expires 25 April 2024                 [Page 1]

Internet-Draft                     ARN                      October 2023


   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
     1.2.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  ARN Mechanism . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Triggering ARN  . . . . . . . . . . . . . . . . . . . . .   4
     2.2.  ARN for Congestion Detection  . . . . . . . . . . . . . .   5
     2.3.  ARN for Congestion Elimination  . . . . . . . . . . . . .   5
   3.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     5.1.  Normative References  . . . . . . . . . . . . . . . . . .   6
     5.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .   6
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   Large-scale supercomputing centers require interconnection of large-
   scale computing nodes.  However, the scaling-out of clusters
   increases network latency and deployment costs, which cannot meet
   computing power and deployment requirements.  Directly connected
   network topology (such as
   Dragonfly[I-D.draft-agt-rtgwg-dragonfly-routing]) shows the
   advantages of scalability with small network diameter, which is
   widely adopted in HPC and supercomputing systems networks.

   In the network that adopts the directly connected topology, there are
   multiple but non-equivalent paths to the destination node.  In most
   cases, the shortest path is preferred to be selected for forwarding
   traffic.  However, traffic congestion or link failures may occur on
   the shortest path.  To this end, adaptive routing is widely used for
   nodes to make dynamic routing decisions based on dynamics of network
   topology (e.g., link failure) as well as variations of traffic (e.g.,
   link congestion).

   By proactively detecting link congestion status, the network node
   could forward packets along a shorter but non-congested path,
   improving overall throughput and resilience as well as reducing the
   latency.  When the link is non-congested, packets are forwarded over



Wang & Huang              Expires 25 April 2024                 [Page 2]

Internet-Draft                     ARN                      October 2023


   the shortest path.  When congestion occurs on the shortest path, the
   local node that detects it applies adaptive routing immediately and,
   at the same time, explicitly advertises congestion signals to other
   remote nodes.
   In this way, the network selects another non-congested but non-
   shortest path to forward packets temporarily until congestion
   elimination signal is received.  Adaptive routing enables the network
   to mitigate traffic collisions and make use of idle links to improve
   bandwidth utilization.

   This document proposes a proactive congestion notification mechanism
   for adaptive routing, and describes the conditions when to trigger
   the dissemination, as well as what information to carry in ARN.
   Adaptive Routing Notifications (ARNs) are not only applicable to
   directly connected topologies such as Dragonfly, but to any
   topologies that aim to apply dynamic multipath optimization.  ARN is
   also useful for advertising failures of link or interface, in which
   case traffic is desired to bypass the failed path.

1.1.  Terminology

   AR: Adaptive Routing

   ARN: Adaptive Routing Notification

   BPT: Best Path Table

1.2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  ARN Mechanism

   ARN can be triggered whenever local congestion is detected to appear
   or disappear.  Congestion signal is sent by the detected node to
   other nodes of interests.











Wang & Huang              Expires 25 April 2024                 [Page 3]

Internet-Draft                     ARN                      October 2023


               +----------------+            +----------------+
               |                |            |                |
               |     Group 2    | -----------|     Group 3    |
               |                |            |                |
               +----------------+            +----------------+
                        |                             |
                        |                             |
                        |                             |
     +------------------|-------------------+         |
     |                  *                   |         |
     |      @@     +----*---+     @@        |         |
     |     +-------+  Node1 +--------+      |         |
     |     |       +----+---+        |      |         |
     |     |            |            |      |         |
     | +---v----+       |       +----v---+  |         |
     | | Node2  |       |@      |  Node4 +------------+
     | +--------+       |@      +--------+  |
     |                  |                   |
     |             +----v---+               |
     |             |  Node3 |               |
     |             +--------+               |   **: congestion
     |  Group 1                             |   @@: ARN
     +--------------------------------------+

                         Figure 1: Topology Example

   Figure 1 depicts a simplified dragonfly topology (only relevant links
   are drawn).  The nodes in each Group are directly connected to each
   other.  The groups are all connected with direct links.  As shown in
   Figure 1, Node1 has a direct link connecting Group1 and Group2.  When
   the direct link (Node1 <-> Group2) is congested, all nodes of Group1
   should be notified and immediately update the path selection policy.
   For example, partial or all flows originating from group1 to group2
   may choose Group3 as transmit instead of using direct link (Node1 <->
   Group2) until congestion elimination.

2.1.  Triggering ARN

   The local node could determine whether congestion occurs by
   monitoring interface status, such as bandwidth utilization and queue
   depth of the interface.

   When the monitored value exceeds the preset threshold, the state is
   determined to be in congestion and congestion notification is
   triggered.  When the monitored value falls back below the preset
   threshold, the state is determined to be in non-congestion and a
   notification of congestion elimination is triggered.




Wang & Huang              Expires 25 April 2024                 [Page 4]

Internet-Draft                     ARN                      October 2023


   When the local node detects any change in congestion status, it can
   send the corresponding ARN continuously to other network nodes in the
   same group.  The notifications can be sent to multiple nodes using
   multicast technology provided by the network.  ARN packets SHOULD be
   set as high priority to ensure that they can be processed in a timely
   manner.  The congestion level is RECOMMENDED to be present in ARN in
   order for fine-grained control of adaptive routing.

2.2.  ARN for Congestion Detection

   An ARN packet for congestion detection SHOULD include the Severity
   information which is used to indicate the level of congestion or the
   type of failure.

   Whenever a network node receives an ARN packet indicating congestion
   detection, if the optimal forwarding path in the local best path
   table (BPT) should pass through the relevant interface, the network
   node deletes the path from the BPT and choose other sub-optimal
   paths.  How to organize and maintain BPT is out of scope in this
   document.

   An ARN packet for congestion detection MUST include neccesary
   information (e.g., ID of peer group connected by the compromised
   link) to locate susceptible paths in BPT.

2.3.  ARN for Congestion Elimination

   When the network node receives the ARN that represents congestion
   elimination, it checks that whether the Cost value of the forwarding
   path through the relevant interface (P1) is less than the forwarding
   path stored in the current BPT (P2), the forwarding path (P1) is
   stored in the BPT and replaces the current path (P1) in the table.
   How to organize and maintain BPT is out of scope in this document.

   An ARN packet for congestion elimination MUST include neccesary
   information (e.g., ID of peer group connected by the compromised
   link) to locate susceptible paths in BPT.

3.  Security Considerations

   TBD.

4.  IANA Considerations

   TBD.

5.  References




Wang & Huang              Expires 25 April 2024                 [Page 5]

Internet-Draft                     ARN                      October 2023


5.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

5.2.  Informative References

   [I-D.draft-agt-rtgwg-dragonfly-routing]
              Afanasiev, D., Roman, and J. Tantsura, "Routing in
              Dragonfly+ Topologies", Work in Progress, Internet-Draft,
              draft-agt-rtgwg-dragonfly-routing-00, 10 July 2023,
              <https://datatracker.ietf.org/doc/html/draft-agt-rtgwg-
              dragonfly-routing-00>.

Acknowledgements

Contributors

Authors' Addresses

   Haibo Wang
   Huawei
   Email: rainsword.wang@huawei.com


   Hongyi  Huang
   Huawei
   Email: hongyi.huang@huawei.com

















Wang & Huang              Expires 25 April 2024                 [Page 6]