Internet DRAFT - draft-li-coinrg-compute-resource-scheduling

draft-li-coinrg-compute-resource-scheduling







Network Working Group                                              Z. Li
Internet-Draft                                                    K. Yao
Intended status: Informational                                     Y. Li
Expires: 26 April 2022                                      China Mobile
                                                         23 October 2021


  A Compute Resources Oriented Scheduling Mechanism based on Dataplane
                            Programmability
             draft-li-coinrg-compute-resource-scheduling-00

Abstract

   With massive data growing in the internet, how to effectively use the
   compute resources has become a quite hot topic.  In order to cool
   down the pressure in today's large data centers, some compute
   resources have been moved towards the edge, gradually forming a
   distributed Compute Force Network.  Force is a physical cause which
   can change the state of a motion or an object.  We refer the
   definition from physics and extend its philosophy to network that in
   future, the network can be a compute force which can facilitate the
   integration of different kinds of compute resources, no matter
   hardware or software, making the computation fast and effective.  In
   this draft, we present a compute resources oriented scheduling
   mechanism based on dataplane programmability, which can effectively
   schedule and manage compute resources in the network.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 26 April 2022.

Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.



Li, et al.                Expires 26 April 2022                 [Page 1]

Internet-Draft            Network Working Group             October 2021


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions Used in This Document . . . . . . . . . . . . . .   3
     2.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   3.  Design  . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Network Topology  . . . . . . . . . . . . . . . . . . . .   3
     3.2.  Mechanism Statement . . . . . . . . . . . . . . . . . . .   5
   4.  Typical Way of Realization  . . . . . . . . . . . . . . . . .   7
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   7.  Normative References  . . . . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   As Moore's law has been gradually reaching its limitation, the
   computation of massive data and diverse computational requirements
   can not be satisfied by simply upgrading the computation resources on
   a single chip.  There become an emerging trend that domain specific
   computation resources like GPU, DPU and programmable switches are
   becoming more and more popular, generating diverse use cases in the
   network.  For example, in network computing and in memory computing.
   In network computing means using programmable switches or DPUs to
   offload network functions so as to accelerate network speed.  And in
   memory computing means that the computer memory does not only serve
   as the storage, but also provide the computation.  With the
   development of these domain specific architectures, network should
   serve as a force which could facilitate the integration of all these
   different types of computation resources, in turn forming a Compute
   Force Network.  In CFN, how to effectively schedule these computation
   resources is a topic that's worthy of studying.

   Current ways to do compute resources allocation include extending
   protocols like DNS so as to realize the awareness and scheduling of
   compute resources, but the management of these compute resources must
   be done in the centralized controller. a DNS client wants to do some
   computing tasks, e.g.  Machine learning models training, and the



Li, et al.                Expires 26 April 2022                 [Page 2]

Internet-Draft            Network Working Group             October 2021


   client will send a request to DNS server.  Then, DNS server will
   inform the client which compute node is available at the moment.
   However, activating and deactivate this compute node to work, e.g.
   creating a virtual machine, is done by centralized controller, which
   we think is not very efficient and timely, considering massive data
   waits to be computed in the network.  The weakness above has provoked
   an idea to realize the scheduling and management of compute resources
   by extending current routing protocols like SRv6 with the help of
   programmable network elements.  The detailed design is presented in
   this draft.

2.  Conventions Used in This Document

2.1.  Terminology

   CFN Compute Force Network

   DNS Domain Name Service

   SRv6 Segment Routing over IPv6

   GPU Graphics Processing Unit

   DPU Data Processing Unit

2.2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14[RFC2119][RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Design

   The detailed design of the mechanism is presented in this section.  A
   typical topology will be shown below and the definition of each part
   of the network topology will be given, and then the whole procedure
   will be explained clearly the second subsection.

3.1.  Network Topology

   The network topology is shown in figure below where there are several
   major parts inside, namely consumer, computation manament node,
   compute node with programmable DPU, and programmable network element.






Li, et al.                Expires 26 April 2022                 [Page 3]

Internet-Draft            Network Working Group             October 2021


                            +------------------+
                            |Compute node with |
                            |programmable DPU  |
   +------------------+     +---------+--------+    +-----------------+
   |Compute node with |               |             |Compute node with|
   |programmable DPU  |      +--------+------+      |programmable DPU |
   +--------+---------+      | programmable  |      +--------+--------+
            |             +--+network element+---+           |
            |             |  +---------------+   |           |
     +------+-------+     |                      |   +-------+-------+
     | programmable +-----+                      +---+ programmable  |
     |network element+----+                      +---+network element|
     +--------------+     |                      |   +---------------+
                          |                      |
                          |    +-----------+     |
                          +----+ Consumer  +-----+
                               +-----------+   |   +-----------------+
                                               ----+   Computation   |
                                                   | management node |
                                                   +-----------------+

                    Figure 1: Figure 1: Network Topology

   - Consumer: End node generating computing tasks which need to be done
   by compute resources

   - Compute node: A network node that has the resources to finish
   computing tasks generated by consumers,e.g. a server or a cluster of
   servers.

   - Programmable DPU: An unit that is connected to a compute node and a
   programmable element, responsible for the lifetime management of
   compute node and the communication with programmable element.

   - Programmable network element: A network device which communicates
   with customers and programmable DPU, forwarding messages
   bidirectionaly including requests for computing resources, activating
   or deactivating specific compute resource, and other routing
   messages.

   - Computation management node: A network node that has the full view
   of the computation resources in the network, dynamically managing
   these resources and generate consuming receipt.








Li, et al.                Expires 26 April 2022                 [Page 4]

Internet-Draft            Network Working Group             October 2021


3.2.  Mechanism Statement

   In this section, the detailed procedure of the communication between
   the consumer and the compute management node which passes through
   programmable DPU, programmable network element, and compute node will
   be declared step by step .

             1.Computation
                     Request   +---------------+
   +----------+ +------------> | Programmable  |
   | Consumer |                |               |
   +----------+ <------------+ |Network Element|
                 4.Compuation  +---+-+---------+
                   Response        ^ |
                                   | |
                                   | |
        2.Compute Resource         | |  3.Registration
         Consuming Request         | |   Response
          Registration             | |
                     +-------------+ |
                     |               |
               +-----+------+        |
               |  Compute   +<-------+
               | Management |
               |    Node    |
               +------------+

             Figure 2: Figure 2: Computation Request Procedure

   * Step1: computation request registration.  When a consumer wants to
   do some computing tasks, e.g. machine learning model training, it
   first needs to send a request message to the compute management node
   for computation resource pre-allocation.  The message is passed
   through programmable network element where some modification on the
   packet header can be done on the dataplane.  Information like
   computation category, configuration template can be added into packet
   header, which could notify the compute management node that what kind
   of computation resource it needs to shedule,e.g. how many GPUs are
   needed in the task.  Afterwards, The management node will send back a
   message in which the specific computation node IP address is
   inserted.  If no such comptation node is available at the moment, the
   manament node will send back a refusal.  And at last, the
   programmable network element will forward the message to the
   consumer.







Li, et al.                Expires 26 April 2022                 [Page 5]

Internet-Draft            Network Working Group             October 2021


                  1.computation
                      task     +---------------+
   +----------+ +------------> | Programmable  |
   | Consumer |                |               |
   +----------+ <------------+ |Network Element|
                               +-----+--+------+
                                     |  ^
                                     |  | 2.Computation
                                     |  | Message Routing
                                     |  |
                  3.Activation       v  |
     +----------+              +-----+--+------+
     | Compute  | <----------+ | Programmable  |
     |   Node   |              |     DPU       |
     +----------+ +----------> +---------------+

                 Figure 3: Figure 3: Computation Activation

   * Step 2:Computation activation.  Consumer will send the actual
   computation task to programmable network element which will do some
   modification on the packet.  The activation message of the compute
   node will be encapsulated into the packet which could enable the
   lifetime management of the computation and the working progress of
   the compute node.  And then, the message will be forwarded to the
   programmable DPU directly connected to the compute node where the
   decapsulation of the packet will be done.  The DPU will tell the
   compute node to work and dynamically monitor the state of the compute
   node until the task is finished.























Li, et al.                Expires 26 April 2022                 [Page 6]

Internet-Draft            Network Working Group             October 2021


                               +---------------+
                               |  Computation  |
                               |Management Node|
                               +----+---+------+
                                    |   ^
                        3.Response  |   | 2.Finish
                                    |   |  Notification
                                    |   |
                 1.Consumption      |   |
                    Finish          v   |
                    Request    +----+---+------+
   +----------+ +------------> | Programmable  |
   | Consumer |                |               |
   +----------+ <------------+ |Network Element|
                               +------+--+-----+
                                      |  ^
                                      |  |
                                      |  |
                       4.Deactivation |  |
                                      v  |
   +----------+                +------+--+-----+
   | Compute  | +------------> | Programmable  |
   |   Node   |                |     DPU       |
   +----------+ <------------+ +---------------+
                 5.Resource
                   Reclaim

                   Figure 4: Figure 4: Consumption Finish

   * Step 3: When the compute node notify the consumer that the task has
   been finished, the consumer will decide whether there is any waiting
   task, if not, the consumer will send a consumption finish request to
   the computation management node.  Like computation request
   registration, the programmable network element will then insert
   information of the compute node and forward the notification message
   to the computation management node. when the programmable network
   element receives a response message, it will start deactivation
   procedure and tell the compute node to collect back the resource used
   for previous computation.  This is the end the lifetime of
   computation of a single task.

4.  Typical Way of Realization

   The mechanism stated in above section can be realized by extending
   protocols like SRv6.  The lifetime management message can be inserted
   dynamically in dataplane with the help of those programmable
   hardware.  Such modification can be done flexibly and in line rate.




Li, et al.                Expires 26 April 2022                 [Page 7]

Internet-Draft            Network Working Group             October 2021


5.  Security Considerations

   TBD.

6.  IANA Considerations

   TBD.

7.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Authors' Addresses

   Zhiqiang Li
   China Mobile
   Beijing
   100053
   China

   Email: lizhiqiangyjy@chinamobile.com


   Kehan Yao
   China Mobile
   Beijing
   100053
   China

   Email: yaokehan@chinamobile.com


   Yang Li
   China Mobile
   Beijing
   100053
   China

   Email: liyangzn@chinamobile.com





Li, et al.                Expires 26 April 2022                 [Page 8]