                    Middle Ware Facilities for CATS


   This draft proposes a method to perceive and process the running
   status of computing resources by introducing a logical Middle Ware
   facility, aiming to avoid directly reflecting continuous and dynamic
   computing resource status in the network domain, match service
   requirements and instance conditions, and ultimately achieve
   computing aware traffic steering and be applicable to various
   possible scheduling strategies.

1.  Introduction

   With computing resources continuously migrating to edges, services
   residing distributedly turn to be delivered in a dynamic way.  More
   fine-grained scheduling strategies awaring of service SLA
   requirements and current computing status are urgently required.

   A framework to fulfill computing status aware traffic steering and
   services provisioning is illustrated in related works,
   [I-D.ldbc-cats-framework] for instance.  Since a learning procedure
   to collect the information of network conditions and computing status
   is the premise to properly steer the traffic, a concise and effective
   learning and processing scheme is required.

   Unlike the collection of network attributes, a learning procedure of
   computing status has its unique characteristics, features and
   objectives which proposes incremental requirements:

   1.  Compared to relatively stable network capabilities, network
       topologies for instance, the variation of the status of computing
       resources is quite dynamic as illustrated in
       [I-D.huang-cats-two-segment-routing].  It is unwise to exert the
       dynamicity of the computing status or the distribution of
       computing resources directly on the network.

   2.  Attributes to describe network status and conditions are
       relatively simple and explicit while massive metadata of
       computing status is heterogeneous and pluralistic.  Various
       computing related services may correlate with different
       attributes of computing resources.  A computing information
       description method is studied in
       [I-D.du-cats-computing-modeling-description].  Furthermore, a
       method to evaluate the performance of a service instance based on
       computing modelling is also associated with the specific service
       and an applied scheduling strategy, and thus is correspondingly

   3.  Metadata collected from the network domain and service instances
       located in distributed sites share both identical attributes and
       different dimensional properties.  The values of identical
       attributes should be analyzed in an accumulative manner while
       attributes with different dimensions should be unified processed
       determined by specific scheduling strategies.

   4.  Overly detailed or micro metadata collected from service
       instances located in distributed sites lack direct interpretation
       semantics by a network domain.  It is suggested to provide simple
       and specific indications for the network to follow.

   Currently, the perception and detection of computing resources can be
   commonly achieved by several schemes partly listed as follows:

   *  Prometheus, as an open-source system monitoring and alerting
      toolkit, is able to collect and store metrics as time series data.
      Prometheus metrics include various aspects, metrics collected from
      Kubernetes API Server and kubelet for instance.  These metrics
      include typical information like node capacity, pod scheduling
      duration and pods in queue which can reflect the detailed
      conditions of CPU, memory, queue, delay, etc.  However, Prometheus
      is designed and deployed for monitoring and visualization and can
      not satisfy the mentioned requirements.

   *  A DNS and GSLB scheme or CLB may apply a "Health Check" mechanism
      to detect whether a server is valid.  Specific methods may be
      implemented through TCP, UDP and HTTP.  A round-robin or weighted
      selection strategy may be further introduced and applied to

      provide and provision the required service.  However, the results
      through a detection is relatively coarse-granular which lack the
      ability to evaluate the performance for services.

   *  In some impressive work and studies, it is also proposed to extend
      IGP or BGP to carry the information of computing resources, aiming
      to be compatible with the current IP routing network.  To be
      noticed, it is worth considering that overly utilization of L3
      protocols may exert extra burden on the network and may not adapt
      well with highly computing resource sensitive services and future

   Thus, this draft proposes a computing resources perception and
   processing method based on a logical Middle Ware facility to solve
   the mentioned problems and to satisfy the corresponding requirements.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Terminology

   *  SLA: Service Level Agreements

   *  DNS: Domain Name System

   *  GSLB: Global Server Load Balance

   *  CLB: Cloud Load Balancer

   *  IGP: Interior Gateway Protocol

   *  BGP: Border Gateway Protocol

   *  SNMP: Simple Network Management Protocol

   *  FTP: File Transfer Protocol

   *  PCEP: Path Computation Element Communication Protocol

   *  OAM: Operation, Administration and Maintenance

   *  DB-Agent: Agent of a database

   *  BE: Best Effort

   *  TE: Traffic Engineering

4.  Framework

   According to the requirements of computing status perception analyzed
   in the previous sections, a framework of metadata collection and
   processing based on Middle Ware Facilities is proposed.

                +---------| Middle Ware |--------+
                |         +-------------+        |
                |                                |
                |                                |
                |                                |
        Network Attributes      Network Attributes+Computing Status
                |                       |               |
                |                       |               |
                |                       |               |
           +----------+             +-------+           |
           | Network  |             |Service|           |
           |Controller|             | Agent |           |
           +----------+             +-------+           |
                |                       |               |
                |                    -------            |
                |                   (       )       +-------+
                |                 (           )     |Service|
                |            +---(  Instances  )    | Agent |
        (---------------)    |    (           )     +-------+
       (                 )---+     -----------          |
       (     Network     )         Cloud  Site       -------
       (                 )---+                      (       )
        (---------------)    |                    (           )
                             +-------------------(  Instances  )
                                                  (           )
                                                   Cloud  Site

     Figure 1: Framework of Metadata Collection Based on a Middle Ware

   A Middle Ware proposed here is a logical facility that has the
   knowledge of the computing status and network conditions, and thus
   the ability to process them.  Considering the specific physical
   implementation, Middle Wares can be mapped to multiple physical
   entities or combinations of them.  The involving entities may include
   a network controller, a superior orchestrator, a distributed
   database, distributed devices, an introduced application monitoring
   system, constructed service agents, etc.  Logical modules of a Middle
   Ware are organized and defined as follows:

                  |                  |  NorthBound      |
                  |                  |  Interface       |
|                 |                                     |
|          +--------------+     Middle Ware     +--------------+          |
|          | Service      |                     | Scheduling   |          |
|          | Registration |---------------------| Strategy     |          |
|          | & Management |                     | Configuration|          |
|          +--------------+                     +--------------+          |
|                 |          +---------------+          |                 |
|                 +----------| ORChestration |----------+                 |
|                            +---------------+                            |
|                                    |                     Other Modules: |
|                              +-----------+               OAM,AI,...     |
|                              | Network & |                              |
|         +--------------------| Computing |--------------------+         |
|         |                +---| Status    |---+                |         |
|         |                |   | DataBase  |   |                |         |
|         |                |   +-----------+   |                |         |
|         |                |                   |                |         |
| +---------------+  +-----------+       +-----------+  +---------------+ |
| | Network       |  | Network   |       | Computing |  | Computing     | |
| | Configuration |  | Status    |       | Status    |  | Configuration | |
| | & Control     |  | Collector |       | Collector |  | & Control     | |
| |               |  |           |       |           |  |               | |
| |  +---------+  |  |+---------+|       |+---------+|  |  +---------+  | |
| |  | Protocol|  |  || Protocol||       || Protocol||  |  | Protocol|  | |
| |  | Service |  |  || Service ||       || Service ||  |  | Service |  | |
| |  +---------+  |  |+---------+|       |+---------+|  |  +---------+  | |
| +---------------+  +-----------+       +-----------+  +---------------+ |
|         |                |                   |                |         |
          |                |     SouthBound    |                |
       (--------------------)    Interface    (--------------------)
      (                      )               (                      )
     (                        )             (                        )
     (        Network         )-------------(         Service        )
     (        Domain          )-------------(         Domain         )
     (                        )             (                        )
      (                      )               (                      )
       (--------------------)                 (--------------------)

               Figure 2: Inner Modules in a Middle Ware

   The logical modules and components are designed with the following
   respective functions and abilities:

   *  NSC (Network Status Collector): NSC collects network status
      through a Protocol Service including telemetry, SNMP, BGP, etc.

   *  CSC (Computing Status Collector): CSC collects the status of
      computing resources through a Protocol Service including FTP,
      gRPC, RESTful, etc.  An application monitoring system may be
      deployed and corresponding interfaces may be introduced and
      required to be designed.

   *  NCC (Network Configuration and Control): NCC publishes network
      configuration including SRv6 policies and other information
      through PCEP and BGP for instance.

   *  CCC (Computing Configuration and Control): CCC publishes computing
      related configuration information and participates in the process
      of resources deployment and scaling.

   *  NCSDB (Network and Computing Status DataBase): NCSDB stores the
      collection of metadata of network and computing status informed by
      NSC and CSC respectively and further integrates relevant
      information.  The meta information is arranged in a hierarchical
      form for further lookup.

   *  SRM (Service Registration and Management): Computing related
      services required by service clients are registered at SRM with
      corresponding service requirements including both network and
      computing attributes.  Evaluation methods mapped to services are
      configured at SRM.  SRM may communicates with outer components to
      receive relevant information.

   *  SSC (Scheduling Strategy and Configuration): SSC processes
      specified services scheduling strategies.  It gets configuration
      from the administration plane through a NorthBound Interface.  SSC
      also correlates with SRM which may influence the configured
      evaluation methods.

   *  ORC (ORChestration): With the registered evaluation scheme and
      configured scheduling strategies, ORC applies corresponding
      functions to calculate the metadata stored in NCSDB.  The
      performance of service instances are evaluated and appropriate
      entries are selected and further distributed to NCC.

   *  There may be other possible logical modules in a Middle Ware,
      including OAM, AI, Portal, etc.

   With the functions defined, the workflow in the control plane to
   fulfill computing aware traffic engineering and service routing is
   described as follows:

   1.  SRM fulfills service subscription.  Corresponding variable and
       controllable service metadata modeling methods are registered and
       configured through the NorthBound Interface, or a local or
       injected configuration profile.

   2.  SSC implements scheduling strategies configuration.  SRM and SSC
       jointly determine specific evaluation methods for registered

   3.  NSC and CSC collect the network and computing status with
       respective Protocol Service modules.  NSC and CSC may communicate
       with network controllers and distributed or centralized service
       agents among multiple sites.

   4.  NCSDB organizes the metadata collected by NSC and CSC in a
       hierarchical manner for further process.

   5.  ORC processes the metadata stored in NCSDB with respective
       evaluation methods determined by SRM and SSC, and then generates
       corresponding entries.  The results are further distributed to
       NCC and CCC.

   6.  NCC ultimately distributes the entries and configurations to the
       underlay network with its Protocol Service module.

   Referring to [I-D.ldbc-cats-framework] and
   [I-D.yao-cats-ps-usecases], incremental requirements are proposed
   cats framework according to this draft:

   *  "R6 MUST realize means for rate control for distributing of
      metrics."  Thus, specific logical modules SHOULD be introduced to
      preprocess running computing status before being distributed to
      the network.

   *  "R4: MUST include network metrics."  "R5 MUST provide mechanisms
      to distribute the metrics."  Thus, specific logical modules SHOULD
      be introduced to record the information of network capabilities
      and computing resources.

   *  "R8: there MUST exist flexibility in term of metrics definition
      and utilization for the selection of service instance."  "R9: MUST
      set up metric information that can be understood by CATS
      components."  Thus, specific logical modules SHOULD be introduced
      to organize and manage service requirements and scheduling

   NSC and NCC mentioned before are relatively similar or identical to
   the current subfunctions of a network controller, and thus will not
   be further discussed in this draft while the detailed design of the
   functions with SRM, SSC, NCSDB and ORC are illustrated as Part 1 to 3
   in the following sections.

5.  Part 1: Service Registration and Modelling Configuration at SRM and

   Service clients propose service requests and get responses including
   corresponding service identifications issued by the administration
   plane.  For instance, a Service ID to represent a globally unique
   service semantic identification is defined in
   [].  With the issued
   Service IDs, the information of constraints and sensitive attributes
   should be considered to generate corresponding modelling and
   evaluation methods for each service represented by a Service ID.  The
   generation patterns of the modeling methods include but are not
   limited to:

   *  Perform configuration directly by administrators by Portal
      operations or through NBI.

   *  Read a pre-prepared local or a distributed configuration profile
      through NBI.

   The metadata of network and computing status can be concluded as
   following typical scheduling attributes:

   *  Experience attributes, end-to-end delay, jitter and packet loss
      for instance, which influence the quality of experience.

   *  Cost attributes consist of economic cost, energy consumption, etc.

   *  Resource attributes consist of load of CPUs, load of the network,

   According to the mentioned scheduling attributes, typical scheduling
   strategies performed can be concluded as:

   *  Experience first: optimize the quality of experience.

   *  Cost first: optimize the cost attributes while guarantee the
      thresholds of experience attributes.

   *  Resource first: optimize the resource attributes while guarantee
      the thresholds of experience attributes and cost attributes.

   Based on specified scheduling strategies, corresponding evaluation
   methods are determined.  With the metadata calculated through
   specific functions, a most appropriate instance or all satisfied
   instances can be identified.  Then, a preferred or balanced strategy
   can be performed which select a single entry or a set of entries to

      |                |    Service ID1   |    Service ID2   | ... |
      |End-to-end Delay|      <50ms       |      <100ms      |     |
      |     Jitter     |                  |       <15ms      |     |
      |      Loss      |      <0.1%       |                  |     |
      |     ......     |                  |                  |     |
      |    CPU Cores   |                  |        >6C       |     |
      |      Load      |       <80%       |                  |     |
      |     ......     |                  |                  |     |
      |                |  Resource first  | Experience first |     |
      |    Metric=     |                  |                  |     |
      |   Function()   | Function1(Delay, | Function2(Delay, |     |
      |                |    Loss,Load)    |   Jitter,CPU)    |     |

         Figure 3: Service Registration and Modelling Configuration

   As shown above, a typical evaluation and modelling method is
   displayed and a function to calculate a metric value can be defined
   as follows.  A to F are preliminary functions to process metadata
   while Function1() and Function2() are evaluation functions.

       A(Delay)             B(Loss)              C(Load)
       ^                    ^                    ^
       |                    |                    |
    MAX|     +----       MAX|     +----       MAX+       +----
       |     |              |     |              |      /
       |     |              |     |              |     /
    MIN+-----+           MIN+-----+           MIN|----+
       |                    |                    |
       +------------->      +------------->      +------------->
             50      Delay       0.1%     Loss       40% 80%   Load

                            MAX,if max{A(Delay),B(Loss)}=MAX,

       D(Delay)             E(Jitter)            F(Cores)
       ^                    ^                    ^
       |                    |                    |
    MAX|       +----     MAX|       +----     MAX+----+
       |      /             |      /             |     \
       |     /              |     /              |      \
    MIN+----+            MIN+----+            MIN|       +----
       |                    |                    |
       +------------->      +------------->      +------------->
           20  100   Delay       5  15    Jitter      6  12    Cores

                             MAX,if max{D(Delay),E(Jitter),F(Cores)}=MAX,

      Figure 4: Service Registration and Modelling Configuration

   The design of functions also correlate with the semantics of the
   calculated metric value.  As indicated above, if any requirement
   registered with the services is not satisfied, the end-to-end delay
   reaches 100ms in Function2() for instance, the overall function value
   reaches MAX which indicates that the corresponding entry fails to
   satisfy the service SLA represented by Service ID2.  Also, a smaller
   metric value represents the better performance.  Therefore, according
   to a simple metric, the performance of instances can be easily

6.  Part 2: Computing Status Collection and Updates at NCSDB

   Based on a set of overall subscribed services and the configured
   respective sensitive attributes of each service in the set, a set of
   attributes that require status updates collection is summarized.  CSC
   then queries or subscribes to the service agents responsible for meta
   information collection at each cloud sites.

   Due to the varying sensitivity and tolerance of different services to
   changes in computing status, as well as the differentiated priorities
   among various services, their requirements for metadata collection
   and update frequency differ from one another.  The frequency of
   collecting a type of meta information should be greater than the
   maximum among the overall requirements.

   With the metadata collected by CSC, the information is further
   organized and stored in NCSDB.  A distributed database is introduced
   here as a sample physical entity which fulfills the functions of a
   corresponding logical module.  A distributed database has the
   advantages of advanced performance, high availability and simple
   extensibility.  It is highly partitionable and allows horizontal
   scaling which satisfies the practical scenarios of large scale of
   service instances.  Also, both keys and values can be anything from
   simple objects to complex compound objects, and thus heterogeneous
   computing resources can be described and stored.

   As shown below, the status of computing resources is modeled as a
   collection of key-value pairs.

                                         ---        ---
                                       ( +------------+ )
                                        (| Instance 1 |)
                    +---------+         (+------------+)
                    |   PE1   |--------( +------------+ )
                    +---------+        ( | Instance 2 | )
                                          Cloud Site 1

                                         ---        ---
                                       ( | Instance 3 | )
                                       ( +------------+ )
                    +---------+        ( +------------+ )
                    |   PE2   |---------(| Instance 4 |)
                    +---------+         (+------------+)
                                       ( +------------+ )
                                       ( | Instance 5 | )
                                          Cloud Site 2

     | ID |  Instance  | Gateway |    Computing Status Index(1-n)    |
     | 01 | Instance 1 |   PE1   |   CPU 1   | Memory  1 |   O/I 1   |
     | 01 | Instance 4 |   PE2   |   CPU 4   | Memory  4 |   O/I 4   |
     | 01 | Instance 5 |   PE2   |   CPU 5   | Memory  5 |   O/I 5   |
     | 02 | Instance 2 |   PE1   |   CPU 2   | Memory  2 |   O/I 2   |
     | 02 | Instance 3 |   PE2   |   CPU 3   | Memory  3 |   O/I 3   |

               Figure 5: Status Table of Computing Resources

   With the introduction of a distributed database, the data of the
   computing resources can be stored in hierarchically organized
   directories.  A typical form to obtain interested information is
   described as below:

   *  /service ID/service instance

   *  /service ID/service instance/Gateway

   *  /service ID/service instance/CPU Load

   *  /service ID/service instance/Memory Remains

   NCSDB can also enable incremental functions.  For instance, a pub-sub
   scheme and a 'Watch' mechanism can be introduced to fulfill service
   OAM and service protection.

    |    Involved Modules     |
    +-------------------------+               +-----------------------+
    |+-------------+          |               |          +-----------+|
    ||Network      |          |               |          | Computing ||
    ||Configuration|          |               |          | Status    ||
    ||& Control    |+--------+|               |+--------+| Collector ||
    || +---------+ ||DB-Agent|| +-----------+ ||DB-Agent||+---------+||
    || | Protocol| |+--------+| | Network & | |+--------+|| Protocol|||
    || | Service | |          | | Computing | |          || Service |||
    || +---------+ |          | | Status    | |          |+---------+||
    |+-------------+          | | Database  | |          +-----------+|
    +-------------------------+ +-----------+ +-----------------------+
           |            |             |              |           |
           |            | Watch       |              |           |
           |            | prefix      |              |           |
           |            |------------>|              |           |
           |            |             |              |           |
           |            |             |<-------------|           |
           |            |             | Write        |           |
           |            |             | (/Service    |           |
           |            |<------------| Instance 1/  |           |
           |            | Notify      | CPU Load 70) |           |
           |            | updates     |              |           |
           |            |             |              |           |
           |            |             |              |           |
           | Notify     |             |              |           |
           | updates    |             |              |           |
           |<-----------|             |              |           |
           |            |             |              |           |

      Figure 6: A 'Watch' Mechanism Applied for a Distributed Database

   The procedure of learning and processing updated computing resource
   status is described as follows:

   *  The CPU load of the container or VM reaches the threshold 70% and
      the updated status is then written into the database in a key-
      value scheme after being collected by CSC.

   *  Relevant modules, NCC for instance, subscribe the information by
      watching the prefix of the key-value pair.

   *  Learning the CPU load reaches 70%, the service routing entries are
      updated or regenerated and a recalculation is performed at the
      control plane.

7.  Part 3: Metadata Processing and Calculation at ORC

   The Middle Ware processes the matadata collected from the network
   domain and multiple cloud sites at ORC which follows the following

               End-to-End Delay=Delay1+Delay2+Delay3+Delay4

                  +-----------+         +---------+
                  +Ingress  PE+---------+Egress PE|
                  +-----------+         +----+----+
                                             | Delay2
                                        (  +-+--+  )
                                       (   | LB |   )
                                       (   +-+--+   )
                                     (       |Delay3  )
                                    (    +---+----+    )
                                     (   |Instance|   )
                                     (   +--------+   )
                                      (    Delay4    )
                                         Cloud Site

                         Figure 7: End-to-end Delay

   *  For instances which provides certain set of services with
      corresponding network paths, ORC integrates the collected metadata
      of the same class.  For instance, as shown above, the
      unidirectional end-to-end delay consists of segmented network
      latency Delay1, Delay2, Delay3 and process delay Delay4 caused by
      possible queue backlog and logical processing.

   *  For a specific service, ORC identifies and filters out the
      sensitive attributes from the integrated attributes as the input
      variables for a corresponding function registed at SRM and SSC.

   *  For a service instance and all possible network forwarding paths
      that reach it, ORC calculates its ability to provide a specific
      type of service in conjunction with a TE policy or BE path, and
      represents it as a single metric value.

   *  According to the designated semantics of metrics, ORC evaluates
      the validity and performance of every entries, further selects
      appropriate entries to inform and to distribute to NCC and
      ultimately work in the forwarding plane of computing aware network

              Service ID1  Instance1  SRv6 Policy1  Metric=15
              Service ID1  Instance3     BE Path    Metric=30
              Service ID1  Instance2  SRv6 Policy2  Metric=10
              Service ID2  Instance4  SRv6 Policy3  Metric=25
              Service ID2  Instance5  SRv6 Policy4  Metric=20
              Service ID2  Instance6     BE Path    Metric=30

                               Control Plane
                              Forwarding Plane

                 |    Index    | Next  Hop |  Interface   |
                 | Service ID1 | Instance2 | SRv6 Policy2 |
                 | Service ID2 | Instance5 | SRv6 Policy4 |

      Figure 8: Entries in the Control Plane and the Forwarding Plane

8.  Conclusion

   With the forementioned logical functions and modules designed in a
   Middle Ware, incremental requirements raised by a learning process of
   computing status can be satisfied:

   *  The dynamicity of running computing status can be restrained and
      controlled at CSC, NCSDB and ORC.

   *  Service instances are able to be evaluated by registered and
      configured methods in a differentiated manner.  SRM and SSC are
      capable of adjusting scheduling strategies and switching
      evaluation methods.

   *  Identical metadata can be processed in an accumulative manner
      while attributes of different dimensions are integrated by the
      registered evaluation methods.

   *  Metadata is not exposed directly but converted into simple metric
      values.  With properly designed semantics of a metric value,
      appropriate entries can be simply determined.

12.  Normative References

              Du, Z., Fu, Y., Li, C., Huang, D., and Z. Fu, "Computing
              Information Description in Computing-Aware Traffic
              Steering", Work in Progress, Internet-Draft, draft-du-
              cats-computing-modeling-description-02, 23 October 2023,

              Huang, D., Du, Z., and C. Zhang, "Hierarchical segment
              routing solution of CATS", Work in Progress, Internet-
              Draft, draft-huang-cats-two-segment-routing-01, 5
              September 2023, <

              Li, C., Du, Z., Boucadair, M., Contreras, L. M., Drake,
              J., Huang, D., and G. S. Mishra, "A Framework for
              Computing-Aware Traffic Steering (CATS)", Work in
              Progress, Internet-Draft, draft-ldbc-cats-framework-04, 8
              December 2023, <

              Ma, L., 付华楷, Zhou, F., lihesong, and D. Yang, "Service
              Identification Header of Service Aware Network", Work in
              Progress, Internet-Draft, draft-ma-intarea-identification-
              header-of-san-01, 4 May 2023,

              Yao, K., Trossen, D., Boucadair, M., Contreras, L. M.,
              Shi, H., Li, Y., and S. Zhang, "Computing-Aware Traffic
              Steering (CATS) Problem Statement, Use Cases, and
              Requirements", Work in Progress, Internet-Draft, draft-
              yao-cats-ps-usecases-03, 30 June 2023,

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <>.

Authors' Addresses

   Dongyu Yuan
   ZTE Corporation
   No.50 Software Avenue
   Jiangsu, 210012

   Fenlin Zhou
   ZTE Corporation
   No.50 Software Avenue
   Jiangsu, 210012

