zhou-sfc-sinc-00.txt

Internet DRAFT - draft-zhou-sfc-sinc
draft-zhou-sfc-sinc

Last Version:	draft-zhou-sfc-sinc-00.txt	Tracker Entry
Date:	`24-Oct-2022`
Disposition:	.draft-zhou-rtgwg-sinc





sfc                                                               D. Lou
Internet-Draft                                                L. Iannone
Intended status: Experimental                                    Y. Zhou
Expires: 26 April 2023                                          C. Zhang
                                                                  Huawei
                                                         23 October 2022


            Signaling In-Network Computing operations (SINC)
                         draft-zhou-sfc-sinc-00

Abstract

   This memo introduces "Signaling In-Network Computing operations"
   (SINC), a mechanism to enable in-packet operation signaling for in-
   network computing for specific scenarios like NetReduce,
   NetDistributedLock, NetSequencer, etc.  In particular, this solution
   allows to flexibly communicate computation parameters to be used in
   conjunction with the packets' payload, to signal to in-network SINC-
   enabled devices the computing operations to be performed.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 26 April 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components



Lou, et al.               Expires 26 April 2023                 [Page 1]

Internet-Draft              SINC Architecture               October 2022


   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  SINC Relevant Use Cases . . . . . . . . . . . . . . . . . . .   3
     4.1.  NetReduce . . . . . . . . . . . . . . . . . . . . . . . .   4
     4.2.  NetDistributedLock  . . . . . . . . . . . . . . . . . . .   4
     4.3.  NetSequencer  . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Simple Generic Operations . . . . . . . . . . . . . . . . . .   5
   6.  SINC Overview . . . . . . . . . . . . . . . . . . . . . . . .   6
   7.  SINC Header . . . . . . . . . . . . . . . . . . . . . . . . .   7
   8.  SFC for Signal In-Network Computing . . . . . . . . . . . . .   8
     8.1.  SFC Elements  . . . . . . . . . . . . . . . . . . . . . .   9
     8.2.  SINC NSH encapsulation  . . . . . . . . . . . . . . . . .  10
     8.3.  NSH Base Header . . . . . . . . . . . . . . . . . . . . .  10
     8.4.  NSH Service Path Header . . . . . . . . . . . . . . . . .  10
     8.5.  Complete SINC NSH Header  . . . . . . . . . . . . . . . .  10
   9.  SFC-based SINC Workflow . . . . . . . . . . . . . . . . . . .  11
   10. SINC Control Plane  . . . . . . . . . . . . . . . . . . . . .  12
   11. Security Considerations . . . . . . . . . . . . . . . . . . .  12
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  13
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  13
   References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
     Normative References  . . . . . . . . . . . . . . . . . . . . .  13
     Informative References  . . . . . . . . . . . . . . . . . . . .  14
   Appendix A.  Computing Capability Operation abstraction . . . . .  16
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

1.  Introduction

   According to the original design, the Internet performs just "store
   and forward" of packets, and leaves more complex operations at the
   end-points.  However, new emerging applications could benefit from
   in-network packet processing to improve the overall system efficiency
   ([GOBATTO], [ZENG]).

   The formation of the COIN Research Group [COIN] in IRTF encourages
   people to explore this emerging technology and its impact on the
   Internet architecture.  The "Use Cases for In-Network Computing"
   draft [I-D.irtf-coinrg-use-cases] introduces some use cases to
   demonstrate how real applications can benefit from COIN and show
   essential requirements demanded by COIN applications.




Lou, et al.               Expires 26 April 2023                 [Page 2]

Internet-Draft              SINC Architecture               October 2022


   Recent research has shown that network devices undertake some
   computing tasks can greatly improve the network and application
   performance in some scenarios like aggregating path-computing
   [NetReduce], key-value(K-V) cache [NetLock], and strong consistency
   [GTM].  Their implementations are mainly based on programmable
   network devices, by using P4 or other languages.  In the context of
   such heterogeneity of scenarios, it is desirable to have a generic
   and flexible protocol to explicitly signal the computing operation to
   be performed by network devices, which is applicable to many use
   cases, enabling easier deployment of these research results.

   This document specifies a signaling architecture for in-network
   computing operation.  The computing functions are hosted on network
   devices, which can be perceived as network SINC service instances.

   It focuses on the design of the data plane, while the control plane
   will be depicted in a separate draft.  Service Function Chaining
   (SFC) [RFC7665] is used as a running example on how to tunnel the
   SINC header to the in-network device and implement the desired in-
   network computation.  Nevertheless, the mechanism can be adapted to
   other transport protocols, like Remote Direct Memory Access (RDMA)
   [ROCEv2], but such adaptation is out of the scope of this document.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] and [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Terminology

   This document uses the terms as defined in [RFC7498], [RFC7665] and
   [RFC8300].  This document assume that the reader is familiar with the
   Service Function Chaining architecture.

4.  SINC Relevant Use Cases

   Hereafter a few relevant use cases are described, namely NetReduce,
   NetDistributedLock, and Net Sequencer, in order to help understanding
   of the requirements for a general framework.  Such a framework,
   should be generic enough to accommodate a large variety of use cases,
   besides the ones described in this document.







Lou, et al.               Expires 26 April 2023                 [Page 3]

Internet-Draft              SINC Architecture               October 2022


4.1.  NetReduce

   Over the last decade, the rapid development of the Deep Neural
   Networks (DNN) has greatly improved the performance of many
   Artificial Intelligence (AI) applications like computer vision and
   natural language processing.  However, DNN training is a computation
   intensive and time consuming task, which has been increased
   exponentially (computation time gets doubled every 3.4 months
   [OPENAI]) in the past 10 years.  Scale-up techniques concentrating on
   the computing capability of a single device cannot meet the
   expectation.  Distributed DNN training approaches with synchronous
   data parallelism like Parameter Server and All-Reduce are commonly
   employed in practice, which on the other hand, become increasingly a
   network-bound workload since communication becomes a bottleneck at
   scale ([PARAHUB],[MGWFBP]).

   Comparing with the host oriented solutions, in-network aggregation
   approaches like SwithML [SwitchML] and SHARP [SHARP] could
   potentially reduce nearly half of the bandwidth needed for data
   aggregation by offloading gradients aggregation from the host to the
   network switch.  The SwitchML solution uses UDP for network
   transport.  The system solely relies on application layer logic to
   trigger retransmission for packet loss, which leads to extra latency
   and reduces the training performance.  The SHARP solution on the
   contrary, uses Remote Direct Memory Access (RDMA) to provide reliable
   transmission [ROCEv2].  As the Infini-Band (IB) technology requires
   specific hardware support, this solution is not very cost-effective.
   NetReduce [NetReduce] doesn't depend on dedicated hardware and
   provides a general in-network aggregation solution that is suitable
   for Ethernet networks.

4.2.  NetDistributedLock

   In the majority of distributed system, the lock primitive is a widely
   used concurrency control mechanism.  For large distributed systems,
   there is commonly a dedicated lock manager that nodes compete to gain
   read and/or write permissions of a resource.  The lock manager is
   often abstracted as Compare And Swap (CAS) or Fetch Add (FA)
   operations.

   The lock manager is typically running on a server, causing a
   limitation on the performance by the speed of disk I/O transaction.
   When the load increases, for instance in the case of database
   transactions processed on a single node, the lock manager becomes a
   major performance bottleneck, consuming nearly 75% of transaction
   time [OLTP].  The multi-node distributed lock processing superimposes
   the communication latency between nodes, which makes the performance
   even worse.  Therefore offloading the lock manager function from the



Lou, et al.               Expires 26 April 2023                 [Page 4]

Internet-Draft              SINC Architecture               October 2022


   server to the network switch might be a much better choice, as the
   switch is capable of managing lock function efficiently.  Meanwhile
   it releases the server for other computation tasks.

   The test results in NetLock [NetLock] show that the lock manager
   running on a switch is able to answer 100 million requests per
   second, nearly 10 times more than what a lock server can do.

4.3.  NetSequencer

   Transaction managers are centralized solutions to the consistency
   issue for distributed transactions, such as GTM in Postgre-XL ([GTM],
   [CALVIN]).  However, as a centralized module, transaction managers
   have became a bottleneck in large scale high-performance distributed
   systems.
   The work [HPRDMA] introduces a server based networked sequencer,
   which is a kind of task manager assigning monotonically increasing
   sequence number for transactions.  In [HPRDMA], the authors shows
   that the maximum throughput is 122 Million requests per second
   (Mrps), at the cost of an increased average latency.  This bounded
   throughput will impact the scalability of distributed systems.
   Meanwhile, the authors also test the bottlenecks for varies
   optimization methods, including CPU, DMA bandwidth and PCIe RTT,
   which is introduced by the CPU centric architecture.

   For a programmable switch, a sequencer is a rather simple operation
   to implement, while the pipeline architecture can avoid bottlenecks.
   It is worth trying to implement a switch based sequencer, which set
   the performance goals as hundreds of Mrps and latency in the order of
   microseconds.

5.  Simple Generic Operations

   The COIN use case draft [I-D.irtf-coinrg-use-cases] illustrates some
   general requirements for scenarios like in-network control and
   distributed AI, where the aforementioned use cases belong to.  One of
   the requirements defined in [I-D.irtf-coinrg-use-cases] is that any
   in-network computing system must provide means to specify the
   constraints for placing execution logic in certain logical execution
   points (and their associated physical locations).  In case of
   NetReduce, NetDistributedLock and NetSequencer, data aggregation,
   lock management and sequence number generation functions can be
   offloaded respectively onto the network switch.








Lou, et al.               Expires 26 April 2023                 [Page 5]

Internet-Draft              SINC Architecture               October 2022


   We can see that those functions are based on some "simple" and
   "generic" operators, as shown in Table 1.  Programmable switches are
   capable of performing those basic operations by executing one or more
   operators, without impacting the forwarding performance ([NetChain],
   [ERIS]).

    +==============+===============+=================================+
    | Use Case     | Operation     | Description                     |
    +==============+===============+=================================+
    | NetReduce    | Sum value     | The network device sums the     |
    |              | (SUM)         | collected parameters together   |
    |              |               | and outputs the resulting       |
    |              |               | value.                          |
    +--------------+---------------+---------------------------------+
    | NetLock      | Compare And   | By comparing the request value  |
    |              | Swap or       | with the status of its own      |
    |              | Fetch-and-Add | lock, the network device sends  |
    |              | (CAS or FA)   | out whether the host has the    |
    |              |               | acquired the lock.  Through the |
    |              |               | CAS and FA, host can implement  |
    |              |               | shared and exclusive locks.     |
    +--------------+---------------+---------------------------------+
    | NetSequencer | Fetch-and-Add | The network device offers a     |
    |              | (FA)          | counter service and provides a  |
    |              |               | monotonically increasing        |
    |              |               | sequence number for the host.   |
    +--------------+---------------+---------------------------------+

                Table 1: Example of in-network operators.

6.  SINC Overview

   This section describes the various elements and functional modules in
   the SINC system and explains how they work together.

   The SINC computing protocol and extensions are designed for limited
   domains such as the data center network instead of across the
   Internet.  The requirements and semantics are specifically limited,
   as defined in the previous sections.

   The main deployment model is to place SINC-capable switches/routers,
   aiming to take over part of the data computing operations during the
   data transmission.  For instance, in the case of NetLock, Top-of-Rack
   switches can be equipped with SINC capabilities to manage I/O locks.
   In the case of NetReduce, SINC-capable switches can be deployed in a
   centric point where all data has to pass through, to achieve on-path
   aggregation/reduction.




Lou, et al.               Expires 26 April 2023                 [Page 6]

Internet-Draft              SINC Architecture               October 2022


   Figure 1 shows the architecture of a SINC network.  In the computing
   service chain, a host sends out packets containing data operations to
   be executed in the network.  The data operation description should be
   carried in the packet itself by using the SINC header.

   Once the packet is in the SINC domain, it includes a SINC header, so
   that SINC-enabled switches and router have access to such header and
   can perform the desired operation directly on the in-network device.
   Note that hosts can also be SINC enabled, in that case the proxies
   are not necessary.

   +---------+                    +---------+
   | Hosts   |                    | Hosts   |
   +---------+                    +---------+
        |        +-------------+       |
        |        |  SINC SW/R  |       |
   +---------+   |  +-------+  |   +---------+
   | SINC    |   |  |SINC   |  |   | SINC    |
   | Ingress |-->|->|Service|->|-->| Egress  |
   | Proxy   |   |  +-------+  |   | Proxy   |
   +---------+   +-------------+   +---------+

                        Figure 1: SINC Architecture.

7.  SINC Header

   The SINC header, has a fixed length of 16 octets and it is appended
   right after the Service Path Header, carries the data operation
   information, used for on-path in-switch SFs.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Reserved    |L|                    Group ID                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     No. of Data Sources       |    Data Source ID             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           SeqNum                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Data Operation          |    Data Offset                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 2: SINC Context Header.

   *  Reserved: Flags field reserved for future use.  MUST be set to
      zero on transmission and ignored on reception.






Lou, et al.               Expires 26 April 2023                 [Page 7]

Internet-Draft              SINC Architecture               October 2022


   *  Loopback flag (L): Zero (0) indicates that the packet should be
      sent to the destination after the data operation.  One (1)
      indicates that the packet should be sent back to the source node
      after the data operation.

   *  Group ID: The group ID identifies different groups.  Each group is
      associated with one task.

   *  Number of Data Sources: Total number of data source nodes that are
      part of the group.

   *  Data Source ID: Unique identifier of the data source node of the
      packet.

   *  Sequence Number (SeqNum): The SeqNum is used to identify different
      requests within one group.

   *  Data Operation: The operation to be executed by the SF (see
      Appendix A).

   *  Data Offset: The in-packet offset from the SINC context header to
      the data required by the operation.

8.  SFC for Signal In-Network Computing

   As previously stated, Service Function Chaining (SFC) [RFC7665] is
   used as a running example on how to tunnel the SINC header to the in-
   network device and implement the desired in-network computation.

   Figure 3 shows the architecture of a SFC-based SINC network.  In the
   computing service chain, a host sends out packets containing data
   operations to be executed in the network.  The data operation
   description should be carried in the packet itself by using a SINC-
   specific NSH encapsulation.

   Once the SINC packet is in the SFC domain, the Service Function
   Forwarder (SFF) [RFC7665] is responsible for forwarding packets to
   one or more connected service functions according to information
   carried in the SFC encapsulation.  The Service Function (SF)
   [RFC7665] is responsible for implementing data operations.











Lou, et al.               Expires 26 April 2023                 [Page 8]

Internet-Draft              SINC Architecture               October 2022


    +---------+                                         +---------+
    | Hosts   |                                         | Hosts   |
    +---------+                                         +---------+
         |                    +-----------+                 |
         |                    | SINC SW/R |                 |
    +-----------+   +-----+   |  +-----+  |   +-----+   +-----------+
    | Ingress   |   |     |   |  |     |  |   |     |   | Egress    |
    | SFC Proxy |-->| SFF |-->|  | SFF |  |-->| SFF |-->| SFC Proxy |
    +-----------+   +-----+   |  +-----+  |   +-----+   +-----------+
                              |     |     |
                              |  +-----+  |
                              |  |  SF |  |
                              |  +-----+  |
                              +-----------+

                    Figure 3: SINC for SFC Architecture.

8.1.  SFC Elements

   As shown in Figure 3, the SFC proxy, SFF, and SINC switch/router
   containing SFF and SF, are used.

   The SFC proxy is required to support SFC-unaware hosts to encapsulate
   the packets with correct NSH header and SINC context header, and to
   forward the packets to a correct SFF.  The SFF forwards packets based
   on the Service Path Header (SPH), as specified in [RFC8300].  The
   SFC-unaware hosts can only add the SINC information in the payload
   after the transport layer encapsulation.

   The SFC proxy needs to associate packets to a group and, hence, to a
   specific operation to be done in-network.  For TCP and UDP packets,
   the five-tuple is sufficient for flow identification.  For RoCEv2
   packets, the destination port number is set to 4791 for the
   indication of the InfiniBand Base Transport Header (IB BTH), which
   cannot be used for flow identification.  Therefore, a combination of
   source IP address, destination IP address, and Destination Queue Pair
   number [ROCEv2] should be used to for flow identification.

   For packets from the SFC-unaware hosts that requires SINC operation,
   the ingress SFC proxy will copy the SINC information to a SINC
   context header and set the Data Offset value accordingly ((see
   Section 7)).

   Based on the Group ID, the SPI is matched and the NSH based header is
   built.  With a SFC encapsulation, the SINC packet will be forwarded
   to SFF.





Lou, et al.               Expires 26 April 2023                 [Page 9]

Internet-Draft              SINC Architecture               October 2022


   The egress SFC proxy removes the NSH header, including the SINC
   context header, before forwarding the packets to destination.

   With the standardized context header, the SFs can be decoupled from
   transport layer encapsulation.  The SFs perform the data operation as
   defined in the headers, update the original payload with the results,
   and forward the packets to the next hop.

8.2.  SINC NSH encapsulation

   This section defines the SINC header fields as part of the NSH
   [RFC8300] encapsulation for SFC [RFC7665].

8.3.  NSH Base Header

   The SINC NSH header is basically another type of NSH MD header.  SINC
   NSH encapsulation uses the NSH Meta Data (MD) fixed-length context
   headers to carry the data operation information.  Please refer to the
   NSH [RFC8300] for a detailed SFC basic header description.  This
   draft suggest the base header specifies MD type = 0x4, to allow a
   fixed length context header immediately following the service path
   header.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |ver|O|U|    TTL    |   Length  |U|U|U|U|MD Type| Next Protocol |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Figure 4: NSH Base Header, where "MD Type is set to 0x4.

8.4.  NSH Service Path Header

   Following the NSH basic header there is the Service Path Header, show
   in Figure 5, as defined in [RFC8300].

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Service Path Identifier (SPI)        | Service Index |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 5: NSH Service Path Header.

8.5.  Complete SINC NSH Header

   By stacking the previously shown headers, the complete SINC NSH
   header, meaning the NSH base header, NSH Service Path Header, and the
   SINC Header, all together are shown in Figure 6.




Lou, et al.               Expires 26 April 2023                [Page 10]

Internet-Draft              SINC Architecture               October 2022


    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |ver|O|U|    TTL    |   Length  |U|U|U|U|MD Type| Next Protocol |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Service Path Identifier (SPI)        | Service Index |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Reserved    |L|                    Group ID                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     No. of Data Sources       |    Data Source ID             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           SeqNum                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Data Operation          |    Data Offset                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 6: SINC NSH Header.

9.  SFC-based SINC Workflow

   This section describes the SINC system workflow, focusing on elements
   and key information changes through the workflow.  Since SINC's use-
   cases will use a programmable switch to host the SF, it is assumed
   that both SFF and SF are colocated on the same switch, as shown in
   Figure 7.

    +---------+                                         +---------+
    | Host A  |                                         | Host B  |
    +---------+                                         +---------+
         |                    +-----------+                 |
         |                    | SINC SW/R |                 |
    +-----------+   +-----+   |  +-----+  |   +-----+   +-----------+
    | Ingress   |   |     |   |  |     |  |   |     |   | Egress    |
    | SFC Proxy |-->| SFF |-->|  | SFF |  |-->| SFF |-->| SFC Proxy |
    +-----------+   +-----+   |  +-----+  |   +-----+   +-----------+
                              |     |     |
                              |  +-----+  |
                              |  |  SF |  |
                              |  +-----+  |
                              +-----------+

                    Figure 7: An Example of SINC system.

   For the sake of clarity, a simple example with one sender (Host A)
   and one Receiver (Host B) is provided.  Packet processing goes
   through the following steps:

   1.  Host A transmit the packet containing data that can be processed
       by the SF on the switch.



Lou, et al.               Expires 26 April 2023                [Page 11]

Internet-Draft              SINC Architecture               October 2022


   2.  The SFC Proxy performs encapsulates the original packet with the
       NSH header containing the SINC Header.  Based on the information
       obtained from control plane, the SFC proxy builds a SINC context
       header pre-pended to the original packet.  The SFC proxy
       encapsulates the packet as the transport protocol indicated by
       the SFC.

   3.  SFF forwards the SINC packets to the specified SF.  As shown in
       Figure 7, when the packet reaches the SINC switch, the packet
       reaches the egress point of the tunnel and the header is removed.
       The SFF looks up the SPI table and SI table and forwards the
       packet to the SF.

   4.  SF performs the Computing Operation according to the content of
       the SINC header.  The SF verifies the Group ID and Data Source ID
       in the SINC context header, then preforms the required computing
       according to the Data Operation field.  When the computing is
       done, the payload is replaced with the result.  The packet is re-
       encapsulated with the NSH SINC header.  The SI is reduce by 1
       while other fields are untouched.  Then, the packet is forwarded
       to the SFC Egress.

   5.  Packets are forwarded to Host B, its the final destination.  When
       the packet reaches the SFC Egress, it looks up the SPI table and
       SI table and realizes it is the egress.  It removes the NSH
       encapsulation and forwards the inner packet to the final
       destination.

10.  SINC Control Plane

   SINC networks need to deploy and control the whole life-cycle of the
   task.  It should be able to manage the full life-cycle from the
   initialization to the end of the computing task and give support to
   the computing tasks.  The detailed design of the control plane will
   be discussed in a separate document.

11.  Security Considerations

   In-network computing exposes computing data to network devices, which
   inevitably raises security and privacy considerations.  The security
   problems faced by in-network computing include, but are not limited
   to:

   *  Trustworthiness of participating devices

   *  Data hijacking and tampering

   *  Private data exposure



Lou, et al.               Expires 26 April 2023                [Page 12]

Internet-Draft              SINC Architecture               October 2022


   This documents assume that the deployment is done in a trusted
   environment.  For example, in a data center network or a private
   network.

   A fine security analysis will be provided in future revisions of this
   memo.

12.  IANA Considerations

   This document defines a new NSH fixed length context header.  As
   such, IANA is requested to add the entry depicted in Table 2, to the
   "NSH MD Types" sub registry of the "Network Service Header Parameter"
   registry.  [Note to RFC Editor: If IANA assign a different value the
   authors will update the document accordingly]

            +=========+====================+=================+
            | MD Type | Description        | Reference       |
            +=========+====================+=================+
            | 0x4     | NSA SINC MD Header | [This Document] |
            +---------+--------------------+-----------------+

               Table 2: NSH MD type allocation for SINC NSH
                             Context Header.

Acknowledgements

   Dirk Trossen's feedback was of great help in improving this document.

References

Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC7498]  Quinn, P., Ed. and T. Nadeau, Ed., "Problem Statement for
              Service Function Chaining", RFC 7498,
              DOI 10.17487/RFC7498, April 2015,
              <https://www.rfc-editor.org/info/rfc7498>.

   [RFC7665]  Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
              Chaining (SFC) Architecture", RFC 7665,
              DOI 10.17487/RFC7665, October 2015,
              <https://www.rfc-editor.org/info/rfc7665>.





Lou, et al.               Expires 26 April 2023                [Page 13]

Internet-Draft              SINC Architecture               October 2022


   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8300]  Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed.,
              "Network Service Header (NSH)", RFC 8300,
              DOI 10.17487/RFC8300, January 2018,
              <https://www.rfc-editor.org/info/rfc8300>.

Informative References

   [CALVIN]   Thomson, A., Diamond, T., Weng, S., Ren, K., Shao, P., and
              D. Abadi, "Calvin: fast distributed transactions for
              partitioned database systems", Proceedings of the 2012
              international conference on Management of Data -
              SIGMOD '12, DOI 10.1145/2213836.2213838, 2012,
              <https://doi.org/10.1145/2213836.2213838>.

   [COIN]     "Computing in the Network, COIN, proposed IRTF group",
              n.d., <https://datatracker.ietf.org/rg/coinrg/about/>.

   [ERIS]     Li, J., Michael, E., and D. R. K. Ports, "Eris:/
              Coordination-Free Consistent Transactions Using In-Network
              Concurrency Control", SOSP '17:/ Proceedings of the 26th
              Symposium on Operating Systems Principles , 2017.

   [GOBATTO]  Reinehr Gobatto, L., Rodrigues, P., Tirone, M., Cordeiro,
              W., and J. Azambuja, "Programmable Data Planes meets In-
              Network Computing: A Review of the State of the Art and
              Prospective Directions", Journal of Integrated Circuits
              and Systems vol. 16, no. 2, pp. 1-8,
              DOI 10.29292/jics.v16i2.497, August 2021,
              <https://doi.org/10.29292/jics.v16i2.497>.

   [GTM]      "GTM and Global Transaction Management", n.d.,
              <https://www.postgres-xl.org/documentation/xc-overview-
              gtm.html>.

   [HPRDMA]   "Design Guidelines for High Performance RDMA Systems",
              n.d., <https://www.usenix.org/conference/atc16/technical-
              sessions/presentation/kalia>.










Lou, et al.               Expires 26 April 2023                [Page 14]

Internet-Draft              SINC Architecture               October 2022


   [I-D.irtf-coinrg-use-cases]
              Kunze, I., Wehrle, K., Trossen, D., Montpetit, M., de Foy,
              X., Griffin, D., and M. Rio, "Use Cases for In-Network
              Computing", Work in Progress, Internet-Draft, draft-irtf-
              coinrg-use-cases-02, 7 March 2022,
              <https://www.ietf.org/archive/id/draft-irtf-coinrg-use-
              cases-02.txt>.

   [MGWFBP]   Shi, S., Chu, X., and B. Li, "MG-WFBP:/ Efficient data
              communication for distributed synchronous SGD algorithms",
              IEEE INFOCOM 2019-IEEE Conference on Computer
              Communications. IEEE , 2019.

   [NetChain] Jin, X., Li, X., and H. Zhang, "NetChain:/ Scale-free sub-
              RTT coordination", 2018.

   [NetLock]  Z, Y., Y, Z., and B. V, "Netlock:/ Fast, centralized lock
              management using programmable switches", Proceedings of
              the Annual conference of the ACM Special Interest Group on
              Data Communication on the applications, technologies,
              architectures, and protocols for computer communication. ,
              2020.

   [NetReduce]
              Liu, S., Wang, Q., and J. Zhang, "NetReduce:/ RDMA-
              compatible in-network reduction for distributed DNN
              training acceleration", 2020.

   [OLTP]     R, J., I, P., and A. A, "Improving OLTP scalability using
              speculative lock inheritance", Proceedings of the VLDB
              Endowment , 2009.

   [OPENAI]   "OpenAI. AI and compute", 2018,
              <https://openai.com/blog/ai-and-compute/>.

   [PARAHUB]  L, L., J, N., and C. L, "Parameter hub:/ a rack-scale
              parameter server for distributed deep neural network
              training", Proceedings of the ACM Symposium on Cloud
              Computing. , 2018.

   [ROCEv2]   "InfiniBand Architecture Specification Release 1.2.1 Annex
              A17 RoCEv2", InfiniBand Trade Association , September
              2014, <https://cw.infinibandta.org/document/dl/7781>.

   [SHARP]    L, G. R., L, L., and B. D, "Scalable hierarchical
              aggregation and reduction protocol (SHARP) TM streaming-
              aggregation hardware design and evaluation", International
              Conference on High Performance Computing , 2020.



Lou, et al.               Expires 26 April 2023                [Page 15]

Internet-Draft              SINC Architecture               October 2022


   [SwitchML] A, S., M, C., and C. Ho, "Scaling distributed machine
              learning with in-network aggregation", 2019.

   [ZENG]     Zeng, D., Ansari, N., Montpetit, M., Schooler, E., and D.
              Tarchi, "Guest Editorial: In-Network Computing: Emerging
              Trends for the Edge-Cloud Continuum", IEEE Network vol.
              35, no. 5, pp. 12-13, DOI 10.1109/mnet.2021.9606835,
              September 2021,
              <https://doi.org/10.1109/mnet.2021.9606835>.

Appendix A.  Computing Capability Operation abstraction

   Computing tasks and application are becoming increasingly complex.
   The complexities are caused by model extension.  If some computing
   tasks are directly offloaded on network devices, the universality of
   devices will be reduced.  Complex models can be disassembled into
   basic calculation operation, such as addition, subtraction, Max, etc.
   Therefore, a more appropriate offloading method is to disassemble
   complex tasks into basic computing operations.

   The DOIN Network needs to provide a set of general computing
   abilities abstraction framework.  The application, management and
   computing network nodes can negotiate and calculate resources
   according to the abstract computing abilities.  For each calculation
   operation, such as addition, subtraction and maximization, the
   corresponding settings should be found in the abstract scheme and the
   abstraction should be realized.  The abstraction of computing
   abilities represents that network nodes should give the same output
   with the same input and operation.

        +========+================================================+
        | OpName | Operation Explanation                          |
        +========+================================================+
        | Max    | Maximum value of several parameters            |
        +--------+------------------------------------------------+
        | MIN    | Minimum value                                  |
        +--------+------------------------------------------------+
        | SUM    | Sum value                                      |
        +--------+------------------------------------------------+
        | PROD   | Product value                                  |
        +--------+------------------------------------------------+
        | LAND   | Logical and                                    |
        +--------+------------------------------------------------+
        | BAND   | Bit-wise and                                   |
        +--------+------------------------------------------------+
        | LOR    | Logical or                                     |
        +--------+------------------------------------------------+
        | BOR    | Bit-wise or                                    |



Lou, et al.               Expires 26 April 2023                [Page 16]

Internet-Draft              SINC Architecture               October 2022


        +--------+------------------------------------------------+
        | LXOR   | Logical xor                                    |
        +--------+------------------------------------------------+
        | BXOR   | Bit-wise xor                                   |
        +--------+------------------------------------------------+
        | WRITE  | Write value accord to key                      |
        +--------+------------------------------------------------+
        | READ   | Read value accord to key                       |
        +--------+------------------------------------------------+
        | DELETE | Delete value accord to key                     |
        +--------+------------------------------------------------+
        | CAS    | Compare and swap. compare the value of the key |
        |        | and old value.  If not same, swap old value to |
        |        | key value.  Return old key value.              |
        +--------+------------------------------------------------+
        | CAADD  | Compare and add. compare the value of the key  |
        |        | and expected value.  If same, add add-value to |
        |        | key value.  Return old key value.              |
        +--------+------------------------------------------------+
        | CASUB  | Compare and subtract. compare the value of the |
        |        | key and expected value.  If same, sub sub-     |
        |        | value to key value.  Return old key value.     |
        +--------+------------------------------------------------+
        | FA     | Fetch and add.  Fetch value according key.     |
        |        | Add add-value to key value.  Return old key-   |
        |        | value.                                         |
        +--------+------------------------------------------------+
        | FASUB  | Fetch and subtract.Fetch value according key.  |
        |        | Subtract sub-value to key value.  Return old   |
        |        | key value.                                     |
        +--------+------------------------------------------------+
        | FAOR   | Fetch and OR.  Fetch value according key.  Key |
        |        | value get logical or operation with parameter. |
        |        | Return old key value.                          |
        +--------+------------------------------------------------+
        | FAADD  | Fetch and ADD.  Fetch value according key.     |
        |        | Key value get logical add operation with       |
        |        | parameter.  Return old key value.              |
        +--------+------------------------------------------------+
        | FANAND | Fetch and NAND.  Fetch value according key.    |
        |        | Key value get logical NAND operation with      |
        |        | parameter.  Return old key value.              |
        +--------+------------------------------------------------+
        | FAXOR  | Fetch and XOR.  Fetch value according key.     |
        |        | Key value get logical XOR operation with       |
        |        | parameter.  Return old key value.              |
        +--------+------------------------------------------------+




Lou, et al.               Expires 26 April 2023                [Page 17]

Internet-Draft              SINC Architecture               October 2022


                   Table 3: The example of DOIN Operation

   Defining an appropriate abstract model of computing capability is
   helpful for interoperability between computing devices.  They are
   also a necessary condition for the application and practice of In-
   Network computing technology.  Most of the existing papers are based
   on a single computing task, and corresponding private protocols are
   proposed.  The lack of unified protocols makes the equipment complex
   and unstable.  It also makes the research task of In-Network
   computing impossible to disassemble.  For example, scholars who study
   hardware prefer to focus on optimizing the processing efficiency of a
   single operator in the device, but they are not good at the message
   protocol with the design operator.  The computing capability
   abstraction model should support a variety of operators, including
   the possibility of operator extension.

Authors' Addresses

   Zhe Lou
   Huawei Technologies
   Riesstrasse 25
   80992 Munich
   Germany
   Email: zhe.lou@huawei.com


   Luigi Iannone
   Huawei Technologies France S.A.S.U.
   18, Quai du Point du Jour
   92100 Boulogne-Billancourt
   France
   Email: luigi.iannone@huawei.com


   Yujing Zhou
   Huawei Technologies
   Beiqing Road, Haidian District
   Beijing
   100095
   China
   Email: zhouyujing3@huawei.com


   Cuimin Zhang
   Huawei Technologies
   Huawei base in Bantian, Longgang District
   Shenzhen
   China



Lou, et al.               Expires 26 April 2023                [Page 18]

Internet-Draft              SINC Architecture               October 2022


   Email: zhangcuimin@huawei.com


















































Lou, et al.               Expires 26 April 2023                [Page 19]