Internet DRAFT - draft-rcr-opsawg-operational-compute-metrics
draft-rcr-opsawg-operational-compute-metrics
Network Working Group S. Randriamasy
Internet-Draft Nokia Bell Labs
Intended status: Informational L. M. Contreras
Expires: 5 September 2024 Telefonica
J. Ros-Giralt
Qualcomm Europe, Inc.
R. Schott
Deutsche Telekom
4 March 2024
Joint Exposure of Network and Compute Information for Infrastructure-
Aware Service Deployment
draft-rcr-opsawg-operational-compute-metrics-03
Abstract
Service providers are starting to deploy computing capabilities
across the network for hosting applications such as distributed AI
workloads, AR/VR, vehicle networks, and IoT, among others. In this
network-compute environment, knowing information about the
availability and state of the underlying communication and compute
resources is necessary to determine both the proper deployment
location of the applications and the most suitable servers on which
to run them. Further, this information is used by numerous use cases
with different interpretations. This document proposes an initial
approach towards a common understanding and exposure scheme for
metrics reflecting compute and communication capabilities.
About This Document
This note is to be removed before publishing as an RFC.
The latest revision of this draft can be found at
https://giralt.github.io/draft-rcr-opsawg-operational-compute-
metrics/draft-rcr-opsawg-operational-compute-metrics.html. Status
information for this document may be found at
https://datatracker.ietf.org/doc/draft-rcr-opsawg-operational-
compute-metrics/.
Source for this draft and an issue tracker can be found at
https://github.com/giralt/draft-rcr-opsawg-operational-compute-
metrics.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Randriamasy, et al. Expires 5 September 2024 [Page 1]
Internet-Draft TODO - Abbreviation March 2024
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 5 September 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Problem Space and Needs . . . . . . . . . . . . . . . . . . . 4
4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. Distributed AI Workloads . . . . . . . . . . . . . . . . 6
4.2. Open Abstraction for Edge Computing . . . . . . . . . . . 8
4.3. Optimized Placement of Microservice Components . . . . . 9
5. Production and Consumption Scenarios of Compute-related
Information . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Producers of Compute-Related Information . . . . . . . . 9
5.2. Consumers of Compute-Related Information . . . . . . . . 10
6. Metrics Selection and Exposure . . . . . . . . . . . . . . . 10
6.1. Edge Resources . . . . . . . . . . . . . . . . . . . . . 11
6.2. Network Resources . . . . . . . . . . . . . . . . . . . . 11
6.3. Cloud Resources . . . . . . . . . . . . . . . . . . . . . 11
6.4. Considerations about Metrics . . . . . . . . . . . . . . 12
6.5. Metric Dimensions . . . . . . . . . . . . . . . . . . . . 13
6.6. Abstraction Level and Information Access . . . . . . . . 14
6.7. Distribution and Exposure Mechanisms . . . . . . . . . . 14
Randriamasy, et al. Expires 5 September 2024 [Page 2]
Internet-Draft TODO - Abbreviation March 2024
6.7.1. Metric Distribution Computing-Aware Traffic Steering
(CATS) . . . . . . . . . . . . . . . . . . . . . . . 15
6.7.2. Metric Exposure with Extensions of ALTO . . . . . . . 15
6.7.3. Exposure of Abstracted Generic Metrics . . . . . . . 15
7. Related Work . . . . . . . . . . . . . . . . . . . . . . . . 16
8. Guiding Principles . . . . . . . . . . . . . . . . . . . . . 17
9. GAP Analysis . . . . . . . . . . . . . . . . . . . . . . . . 17
10. Security Considerations . . . . . . . . . . . . . . . . . . . 18
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
12.1. Normative References . . . . . . . . . . . . . . . . . . 18
12.2. Informative References . . . . . . . . . . . . . . . . . 19
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20
1. Introduction
Operators are starting to deploy distributed computing environments
in different parts of the network with the objective of addressing
different service needs including latency, bandwidth, processing
capabilities, storage, etc. This translates in the emergence of a
number of data centers (both in the cloud and at the edge) of
different sizes (e.g., large, medium, small) characterized by
distinct dimension of CPUs, memory, and storage capabilities, as well
as bandwidth capacity for forwarding the traffic generated in and out
of the corresponding data center.
The proliferation of the edge computing paradigm further increases
the potential footprint and heterogeneity of the environments where a
function or application can be deployed, resulting in different
unitary cost per CPU, memory, and storage. This increases the
complexity of deciding the location where a given function or
application should be best deployed or executed. This decision
should be jointly influenced on the one hand by the available
resources in a given computing environment, and on the other hand by
the capabilities of the network path connecting the traffic source
with the destination.
Network and compute aware function placement and selection has become
of utmost importance in the last decade. The availability of such
information is taken for granted by the numerous service providers
and bodies that are specifying them. However, deployments may reach
out to data centers running different implementations with different
understandings and representations of compute capabilities and smooth
operation is a challenge. While standardization efforts on network
capabilities representation and exposure are well-advanced, similar
efforts on compute capabilitites are in their infancy.
Randriamasy, et al. Expires 5 September 2024 [Page 3]
Internet-Draft TODO - Abbreviation March 2024
This document proposes an initial approach towards a common
understanding and exposure scheme for metrics reflecting compute
capabilities. It aims at leveraging on existing work in the IETF on
compute metrics definitions to build synergies. It also aims at
reaching out to working or research groups in the IETF that would
consume such information and have particular requirements.
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
3. Problem Space and Needs
Visibility and exposure of both (1) network and (2) compute resources
to the application is critical to enable the proper functioning of
the new class of services arising at the edge (e.g., distributed AI,
driverless vehicles, AR/VR, etc.). To understand the problem space
and the capabilities that are lacking in today's protocol interfaces
needed to enable these new services, we focus on the life cycle of a
service.
At the edge, compute nodes are deployed near communication nodes
(e.g., co-located in a 5G base station) to provide computing services
that are close to users with the goal to (1) reduce latency, (2)
increase communication bandwidth, (3) enable privacy/personalization
(e.g., federated AI learning), and (4) reduce cloud costs and energy.
Services are deployed on the communication and compute infrastructure
through a two-phase life cycle that involves first a service
_deployment stage_ and then a _service selection_ stage (Figure 1).
+-------------+ +--------------+ +-------------+
| | | | | |
| New +------> Service +------> Service |
| Service | | Deployment | | Selection |
| | | | | |
+-------------+ +--------------+ +-------------+
Figure 1: Service life cycle.
*Service deployment.* This phase is carried out by the service
provider, and consists in the deployment of a new service (e.g., a
distributed AI training/inference, an XR/AR service, etc.) on the
communication and compute infrastructure. The service provider needs
to properly size the amount of communication and compute resources
Randriamasy, et al. Expires 5 September 2024 [Page 4]
Internet-Draft TODO - Abbreviation March 2024
assigned to this new service to meet the expected user demand. The
decision on where the service is deployed and how many resources are
requested from the infrastructure depends on the levels of QoE that
the provider wants to guarantee to the user base. To make a proper
deployment decision, the provider must have visibility on the
resources available from the infrastructure, including communication
resources (e.g., latency and bandwidth) and compute (e.g., CPU, GPU,
memory, storage). For instance, to run a Large Language Model (LLM)
with 175 billion parameters, a total aggregated memory of 400GB and 8
GPUs are needed. The service provider needs an interface to query
the infrastructure, extract the available compute and communication
resources, and decide which subset of resources are needed to run the
service.
*Service selection.* This phase is initiated by the user, through a
client application that connects to the deployed service. There are
two main decisions that must be performed in the service selection
stage: compute node selection and path selection. In the compute
node selection step, as the service is generally replicated in N
locations (e.g., by leveraging a microservices architecture), the
application must decide which of the service replicas it connects to.
Similar to the service deployment stage, this decision requires
knowledge about communication and compute resources available in each
replica. On the other hand, in the path selection decision, the
application must decide which path it chooses to connect to the
service. This decision depends on the communication properties
(e.g., bandwidth and latency) of the available paths. Similar to the
service deployment case, the service provider needs an interface to
query the infrastructure and extract the available compute and
communication resources, with the goal to make informed node and path
selection decisions. It is also important to note that, ideally, the
node and path selection decisions should be jointly optimized, since
in general the best end-to-end performance is achieved by jointly
taking into account both decisions. In some cases, however, such
decisions may be owned by different players. For instance, in some
network environments, the path selection may be decided by the
network operator, wheres the node selection may be decided by the
application. Even in these cases, it is crucial to have a proper
interface (for both the network operator and the service provider) to
query the available compute and communication resources from the
system.
Table 1 summarizes the problem space, the information that needs to
be exposed, and the stakeholders that need this information.
Randriamasy, et al. Expires 5 September 2024 [Page 5]
Internet-Draft TODO - Abbreviation March 2024
+====================+===============+==========================+
| Action to take | Information | Who needs it |
| | needed | |
+====================+===============+==========================+
| Service placement | Compute and | Service provider |
| | communication | |
+--------------------+---------------+--------------------------+
| Service selection/ | Compute | Network/service provider |
| node selection | | and/or application |
+--------------------+---------------+--------------------------+
| Service selection/ | Communication | Network/service and/or |
| path selection | | application |
+--------------------+---------------+--------------------------+
Table 1: Problem space, needs, and stakeholders.
4. Use Cases
4.1. Distributed AI Workloads
Generative AI is a technological feat that opens up many applications
such as holding conversations, generating art, developing a research
paper, or writing software, among many others. Yet this innovation
comes with a high cost in terms of processing and power consumption.
While data centers are already running at capacity, it is projected
that transitioning current search engine queries to leverage
generative AI will increase costs by 10 times compared to traditional
search methods [DC-AI-COST]. As (1) computing nodes (CPUs, GPUs, and
NPUs) are deployed to build the edge cloud leveraging technologies
like 5G and (2) with billions of mobile user devices globally
providing a large untapped computational platform, shifting part of
the processing from the cloud to the edge becomes a viable and
necessary step towards enabling the AI-transition. There are at
least four drivers supporting this trend:
* Computational and energy savings: Due to savings from not needing
large-scale cooling systems and the high performance-per-watt
efficiency of the edge devices, some workloads can run at the edge
at a lower computational and energy cost [EDGE-ENERGY], especially
when considering not only processing but also data transport.
* Latency: For applications such as driverless vehicles which
require real-time inference at very low latency, running at the
edge is necessary.
Randriamasy, et al. Expires 5 September 2024 [Page 6]
Internet-Draft TODO - Abbreviation March 2024
* Reliability and performance: Peaks in cloud demand for generative
AI queries can create large queues and latency, and in some cases
even lead to denial of service. In some cases, limited or no
connectivity requires running the workloads at the edge.
* Privacy, security, and personalization: A "private mode" allows
users to strictly utilize on-device (or near-the-device) AI to
enter sensitive prompts to chatbots, such as health questions or
confidential ideas.
These drivers lead to a distributed computational model that is
hybrid: Some AI workloads will fully run in the cloud, some will
fully run in the edge, and some will run both in the edge and in the
cloud. Being able to efficiently run these workloads in this hybrid,
distributed, cloud-edge environment is necessary given the
aforementioned massive energy and computational costs. To make
optimized service and workload placement decisions, information about
both the compute and communication resources available in the network
is necessary too.
Consider as an example a large language model (LLM) used to generate
text and hold intelligent conversations. LLMs produce a single token
per inference, where a token is almost equivalent to a word.
Pipelining and parallelization techniques are used to optimize
inference, but this means that a model like GPT-3 could potentially
go through all 175 billion parameters that are part of it to generate
a single word. To efficiently run these computational-intensive
workloads, it is necessary to know the availability of compute
resources in the distributed system. Suppose that a user is driving
a car while conversing with an AI model. The model can run inference
on a variety of compute nodes, ordered from lower to higher compute
power as follows: (1) the user's phone, (2) the computer in the car,
(3) the 5G edge cloud, and (4) the datacenter cloud.
Correspondingly, the system can deploy four different models with
different levels of precision and compute requirements. The simplest
model with the least parameters can run in the phone, requiring less
compute power but yielding lower accuracy. Three other models
ordered in increasing value of accuracy and computational complexity
can run in the car, the edge, and the cloud. The application can
identify the right trade-off between accuracy and computational cost,
combined with metrics of communication bandwidth and latency, to make
the right decision on which of the four models to use for every
inference request. Note that this is similar to the resolution/
bandwidth trade-off commonly found in the image encoding problem,
where an image can be encoded and transmitted at different levels of
resolution depending on the available bandwidth in the communication
channel. In the case of AI inference, however, not only bandwidth is
a scarce resource, but also compute. ALTO extensions to support the
Randriamasy, et al. Expires 5 September 2024 [Page 7]
Internet-Draft TODO - Abbreviation March 2024
exposure of compute resources would allow applications to make
optimized decisions on selecting the right computational resource,
supporting the efficient execution of hybrid AI workloads.
4.2. Open Abstraction for Edge Computing
Modern applications such as AR/VR, V2X, or IoT, require bringing
compute closer to the edge in order to meet strict bandwidth,
latency, and jitter requirements. While this deployment process
resembles the path taken by the main cloud providers (notably, AWS,
Facebook, Google and Microsoft) to deploy their large-scale
datacenters, the edge presents a key difference: datacenter clouds
(both in terms of their infrastructure and the applications run by
them) are owned and managed by a single organization, whereas edge
clouds involve a complex ecosystem of operators, vendors, and
application providers, all striving to provide a quality end-to-end
solution to the user. This implies that, while the traditional cloud
has been implemented for the most part by using vertically optimized
and closed architectures, the edge will necessarily need to rely on a
complete ecosystem of carefully designed open standards to enable
horizontal interoperability across all the involved parties. This
document envisions ALTO playing a role as part of the ecosystem of
open standards that are necessary to deploy and operate the edge
cloud.
As an example, consider a user of an XR application who arrives at
his/her home by car. The application runs by leveraging compute
capabilities from both the car and the public 5G edge cloud. As the
user parks the car, 5G coverage may diminish (due to building
interference) making the home local Wi-Fi connectivity a better
choice. Further, instead of relying on computational resources from
the car and the 5G edge cloud, latency can be reduced by leveraging
computing devices (PCs, laptops, tablets) available from the home
edge cloud. The application's decision to switch from one domain to
another, however, demands knowledge about the compute and
communication resources available both in the 5G and the Wi-Fi
domains, therefore requiring interoperability across multiple
industry standards (for instance, IETF and 3GPP on the public side,
and IETF and LF Edge [LF-EDGE] on the private home side). ALTO can
be positioned to act as an abstraction layer supporting the exposure
of communication and compute information independently of the type of
domain the application is currently residing in.
Future versions of this document will elaborate further on this use
case.
Randriamasy, et al. Expires 5 September 2024 [Page 8]
Internet-Draft TODO - Abbreviation March 2024
4.3. Optimized Placement of Microservice Components
Current applications are transitioning from a monolithic service
architecture towards the composition of microservice components,
following cloud-native trends. The set of microservices can have
associated SLOs which impose constraints not only in terms of
required compute resources (CPU, storage, ...) dependent on the
compute facilities available, but also in terms of performance
indicators such as latency, bandwidth, etc, which impose restrictions
in the networking capabilities connecting the computing facilities.
Even more complex constrains, such as affinity among certain
microservices components could require complex calculations for
selecting the most appropriate compute nodes taken into consideration
both network and compute information.
Thus, service/application orchestrators can benefit from the
information exposed by ALTO at the time of deciding the placement of
the microservices in the network.
5. Production and Consumption Scenarios of Compute-related Information
It is important to understand the scenarios of production and
consumption of compute-related information in combination with
information related to communication. Leveraging such combination
enables the possibility of resource and workload placement
optimization, leading to both operational cost reductions to the
operator and service provider as well as an improvement on the
service level experienced by the end users.
5.1. Producers of Compute-Related Information
The information relative to compute (i.e., processing capabilities,
memory, and storage capacity) can be structured in two ways. On one
hand, the information corresponding to the raw compute resources; on
the other hand, the information of resources allocated or utilized by
a specific application or service function.
The former is typically provided by the management systems enabling
the virtualization of the physical resources for a later assignment
to the processes running on top. Cloud Managers or Virtual
Infrastructure Managers are the entities that manage those resources.
These management systems offer APIs to access the available resources
in the computing facility. Thus, it can be expected that these APIs
can be used for the consumption of such information. Once the raw
resources are retrieved from the various compute facilities, it could
be possible to generate topological network views of them, as being
proposed in [I-D.llc-teas-dc-aware-topo-model].
Randriamasy, et al. Expires 5 September 2024 [Page 9]
Internet-Draft TODO - Abbreviation March 2024
Regarding the resources allocated or utilized by a specific
application or service function, two situations apply: (1) The total
allocation and (2) the allocation per service or application. In the
first case, the information can be supplied by the virtualization
management systems described before. For the specific per-service
allocation, it can be expected that the specific management systems
of the service or application is capable to provide the resources
being used at run time typically as part of the allocated ones. In
this last scenario, it is also reasonable to expect the availability
of APIs offering this information, even though they can be specific
to the service or application.
5.2. Consumers of Compute-Related Information
The consumption of compute-related information is relative to the
different phases of the service lifecycle. This means that this
information can be consumed in different points of time and for
different purposes.
The expected consumers can be both external or internal to the
network. As external consumers it is possible to consider external
application management systems requiring resource availability
information for service function placement decision, workload
migration in the case of consuming raw resources, or requiring
information on the usage of resources for service assurance or
service scaling, among others.
As internal consumers, it is possible to consider network management
entities requiring information on the level of resource utilization
for traffic steering (as the Path Selector in
[I-D.ldbc-cats-framework]), load balance, or analytics, among others.
6. Metrics Selection and Exposure
Regarding metrics exposure one can distinguish the topics of (1) how
the metrics are exposed and (2) which kind of metrics need to be
exposed. The infrastructure resources can be divided into network
and compute related resources. Network based resources can roughly
be subdivided according to the network structure into edge, backbone,
and cloud resources.
This section intends to give a brief outlook regarding these
resources for stimulating additional discussion with related work
going on in other IETF working groups or standardization bodies.
Randriamasy, et al. Expires 5 September 2024 [Page 10]
Internet-Draft TODO - Abbreviation March 2024
6.1. Edge Resources
Edge resources are referring to latency, bandwidth, compute latency
or traffic breakout.
6.2. Network Resources
Network resources relate to the traditional network infrastructure.
The next table provides an overview of some of the commonly used
metrics.
+==================+
| Kind of Resource |
+==================+
| QoS |
+------------------+
| Latency |
+------------------+
| Bandwidth |
+------------------+
| RTT |
+------------------+
| Packet Loss |
+------------------+
| Jitter |
+------------------+
Table 2: Examples of
network resource
metrics.
6.3. Cloud Resources
The next table provides an example of parameters that could be
exposed:
Randriamasy, et al. Expires 5 September 2024 [Page 11]
Internet-Draft TODO - Abbreviation March 2024
+============+=========+=================================+
| CPU | Compute | Available cpu resources |
+============+=========+=================================+
| Memory | Compute | Available memory |
+------------+---------+---------------------------------+
| Storage | Storage | Available storage |
+------------+---------+---------------------------------+
| Configmaps | Object | Configuration and topology maps |
+------------+---------+---------------------------------+
| Secrets | Object | Possible secrets |
+------------+---------+---------------------------------+
| Pods | Object | Possible pods |
+------------+---------+---------------------------------+
| Jobs | Object | Concurrent jobs |
+------------+---------+---------------------------------+
| Services | Object | Concurrent services |
+------------+---------+---------------------------------+
Table 3: Examples of cloud resource parameters.
6.4. Considerations about Metrics
The metrics considered in this document should be used to support
decisions for selection and deployment of services and applications.
Further iterations of this document may consider additional life
cycle operations such as assurance and relevant metrics.
The network netrics listed above are specified in a number of IETF
documents such as RFC 9439 [I-D.ietf-alto-performance-metrics], which
itself leverages on RFC 7679. The work on compute metrics at the
IETF, on the other hand, is in its first stages and merely relates to
low-level infrastructure metrics such as in [RFC7666]. However:
* decisions for service deployment and selection also involve
decisions that require an aggregated view for instance at the
service level,
Randriamasy, et al. Expires 5 September 2024 [Page 12]
Internet-Draft TODO - Abbreviation March 2024
* deciding entities may only have partial access to the compute
information and actually do not need to have all the details. A
number of public tools and methods to test compute facility
performances are made available by cloud service providers or
service management businesses, see [UPCLOUD] and [IR] to name a
few. However, for the proposed performance metrics, their
definition and acquisition method may differ from one provider to
the other, making it thus challenging to compare performances
across different providers. The latter aspect is particularly
problematic for applications running at the edge where a complex
ecosystem of operators, vendors, and application providers is
involved and calls for a common standardized definition.
6.5. Metric Dimensions
Upon exploring existing work, this draft proposes to consider a
number of dimensions before identifying the compute metrics needed to
take a service operation decision. This list is initial and is to be
updated upon further discussion.
Dimensions helping to identify needed compute metrics:
+===========+==================+=================================+
| Dimension | Definition | Examples |
+===========+==================+=================================+
| Decision | what are the | monitoring, benchmarking, |
| | metrics used for | service selection and placement |
+-----------+------------------+---------------------------------+
| Driving | what is assessed | speed, scalability, cost, |
| KPI | with the metrics | stability |
+-----------+------------------+---------------------------------+
| Decision | different | infrastructure node/cluster, |
| scope | granularities | compute service, end-to-end |
| | | application |
+-----------+------------------+---------------------------------+
| Receiving | receiving | router, centralized controller, |
| entity | metrics | application management |
+-----------+------------------+---------------------------------+
| Deciding | computing | router, centralized controller, |
| entity | decisions | application management |
+-----------+------------------+---------------------------------+
Table 4: Dimensions to consider when idenfitying compute metrics.
When metrics are documented according to their life cycle action, it
allow for a more reliable interpretation and informed utilization of
the metrics. The table below provides some examples:
Randriamasy, et al. Expires 5 September 2024 [Page 13]
Internet-Draft TODO - Abbreviation March 2024
+====================+=============================================+
| Lifecycle action | Example |
+====================+=============================================+
| Acquisition method | telemetry, estimation |
+--------------------+---------------------------------------------+
| Value processing | aggregation, abstraction |
+--------------------+---------------------------------------------+
| Exposure | in-path distribution, off-path distribution |
+--------------------+---------------------------------------------+
Table 5: Metrics documented by life cycle action.
6.6. Abstraction Level and Information Access
One important aspect to consider is that receiving entities that need
to consume metrics to take selection or placement decisions do not
always have access to computing information. In particular, several
scenarios to be completed upon further discussions, may need to be
considered among which:
* the consumer is an ISP that does not own the compute
infrastructure or has no access to full information. In this case
the compute metrics will likely be estimated
* the consumer is an application that has no direct access to full
information while the ISP has access to both network and compute
information. However the ISP is willing to provide guidance to
the application with abstract information.
* the consumer has access to full network and compute information
and wants to use it for fine-grained decision making e.g. at the
node/cluster level
* the consumer has access to full information but essentially needs
guidance with abstracted information.
* the consumer has access to information that is abstracted or
detailed depending on the metrics.
These scenarios further drive the selection of metrics upon the above
mentioned dimensions.
6.7. Distribution and Exposure Mechanisms
Integration of network and compute metrics in decisions
Randriamasy, et al. Expires 5 September 2024 [Page 14]
Internet-Draft TODO - Abbreviation March 2024
6.7.1. Metric Distribution Computing-Aware Traffic Steering (CATS)
Other existing work at the IETF CATS WG has explored the collection
and distribution of computing metrics in [I-D.ldbc-cats-framework].
They consider three deployment models in their deployment
considerations: - distributed among network devices directly, -
collected by a centralized control plane, - hybrid where a part of
computing metrics are distributed among involved network devices, and
others may be collected by a centralized control plane. In the
hybrid mode, the draft suggests that some static information (e.g.,
capabilities information) can be distributed among network devices
since they are quite stable. Frequent changing information (e.g.,
resource utilization) can be collected by a centralized control plane
to avoid frequent flooding in the distributed control plane.
Beside the required extensions to the routing protocols, the hybrid
mode stresses the impact of the dynamicity of the distributed metrics
and the need to carefully sort out the metric exposure mode w.r.t.
their dynamicity.
6.7.2. Metric Exposure with Extensions of ALTO
The ALTO protocol has been difined to expose an abstracted network
topology and related path costs in [RFC7285]. Its extension RFC 9240
allows to define entities on which properties can be defined, while
[I-D.contreras-alto-service-edge] introduces a proposed entity
property that allows to consider an entity as both a network element
with network related costs and properties and a element of a data
centzer with compute related properties. Such an exposure mechanism
is particularly useful for decision making entities which are
centralized and located off the network paths.
6.7.3. Exposure of Abstracted Generic Metrics
In some cases, whether due to unavailable information details or for
the sake of simplicity, a consumer may need reliable but simple
guidance to select a service. To this end, abstracted generic
metrics may be useful.
One can consider a generic metric that can be named “computingcost”
and is applied to a contact point to one or more edge servers such as
a load balancer, for short an edge server, to reflect the network
operator policy and preferences. The metric “computingcost” results
from an abstraction method that is hidden from users, similarly to
the metric “routingcost” defined in [RFC7285]. For instance,
“computingcost” may be higher for an edge server located far away, or
in disliked geographical areas, or owned by a provider who does not
share information with the Internet Service Provider (ISP) or with
Randriamasy, et al. Expires 5 September 2024 [Page 15]
Internet-Draft TODO - Abbreviation March 2024
which ISP has a poorer commercial agreement. “computingcost” may
also reflect environmental preferences in terms, for instance, of
energy source, average consumption vs. local climate, location
adequacy vs. climate.
One may also consider a generic metric named “computingperf”, applied
to an edge server, that reflects its performances, based on
measurements or estimations by the ISP or combination thereof. An
edge server with a higher “computingperf” value will be preferred.
“computingperf” can be based on a vector of one or more metrics
reflecting, for instance, responsiveness, reliability of cloud
services based on metrics such as latency, packet loss, jitter, time
to first and/or last byte, or a single value reflecting a global
performance score.
7. Related Work
Some existing work has explored compute-related metrics. They can be
categorized as follows:
* References providing raw compute infrastructure metrics:
[I-D.contreras-alto-service-edge] includes references to cloud
management solutions (i.e., OpenStack, Kubernetes, etc) which
administer the virtualization infrastructure, providing
information about raw compute infrastructure metrics.
Furthermore, [NFV-TST] describes processor, memory and network
interface usage metrics.
* References providing compute virtualization metrics: [RFC7666]
provides several metrics as part of the Management Information
Base (MIB) definition for managing virtual machines controlled by
a hypervisor. The objects there defined make reference to the
resources consumed by a particluar virtual machine serving as host
for services or applications. Moreover, [NFV-INF] provides
metrics associated to virtualized network functions.
* References providing service metrics including compute-related
information: [I-D.dunbar-cats-edge-service-metrics] proposes
metrics associated to services running in compute infrastructures.
Some of these metrics do not depend on the infrastructure behavior
itself but from where such compute infrastructure is topologically
located.
Other existing work at the IETF CATS WG has explored the collection
and distribution of computing metrics in [I-D.ldbc-cats-framework].
In their deployment considerations, they consider three models:
distributed, centralized and hybrid.
Randriamasy, et al. Expires 5 September 2024 [Page 16]
Internet-Draft TODO - Abbreviation March 2024
8. Guiding Principles
The driving principles for designing an interface to jointly extract
network and compute information are as follows:
P1. Leverage metrics across working groups to avoid reinventing the
wheel. For instance:
* RFC 9439 [I-D.ietf-alto-performance-metrics] leverages IPPM
metrics from RFC 7679.
* Section 5.2 of [I-D.du-cats-computing-modeling-description]
considers delay as a good metric, since it is easy to use in both
compute and communication domains. RFC 9439 also defines delay as
part of the performance metrics.
* Section 6 of [I-D.du-cats-computing-modeling-description] proposes
to represent the network structure as graphs, which is similar to
the ALTO map services in [RFC7285].
P2. Aim for simplicity, while ensuring the combined efforts don’t
leave technical gaps in supporting the full life cycle of service
deployment and selection. For instance, the CATS working group is
covering path selection from a network standpoint, while ALTO (e.g.,
[RFC7285]) covers exposing of network information to the service
provider and the client application. However, there is currently no
effort being pursued to expose compute information to the service
provider and the client application for service placement or
selection.
9. GAP Analysis
From this related work it is evident that compute-related metrics can
serve several purposes, ranging from service instance instantiation
to service instance behavior, and then to service instance selection.
Some of the metrics could refer to the same object (e.g., CPU) but
with a particular usage and scope.
In contrast, the network metrics are more uniform and
straightforward. It is then necessary to consistently define a set
of metrics that could assist to the operation in the different
concerns identified so far, so that networks and systems could have a
common understanding of the perceived compute performance. When
combined with network metrics, the combined network plus compute
performance behavior will assist informed decisions particular to
each of the operational concerns related to the different parts of a
service life cycle.
Randriamasy, et al. Expires 5 September 2024 [Page 17]
Internet-Draft TODO - Abbreviation March 2024
10. Security Considerations
TODO Security
11. IANA Considerations
This document has no IANA actions.
12. References
12.1. Normative References
[I-D.du-cats-computing-modeling-description]
Du, Z., Fu, Y., Li, C., Huang, D., and Z. Fu, "Computing
Information Description in Computing-Aware Traffic
Steering", Work in Progress, Internet-Draft, draft-du-
cats-computing-modeling-description-02, 23 October 2023,
<https://datatracker.ietf.org/doc/html/draft-du-cats-
computing-modeling-description-02>.
[I-D.ietf-alto-performance-metrics]
Wu, Q., Yang, Y. R., Lee, Y., Dhody, D., Randriamasy, S.,
and L. M. Contreras, "Application-Layer Traffic
Optimization (ALTO) Performance Cost Metrics", Work in
Progress, Internet-Draft, draft-ietf-alto-performance-
metrics-28, 21 March 2022,
<https://datatracker.ietf.org/doc/html/draft-ietf-alto-
performance-metrics-28>.
[I-D.ldbc-cats-framework]
Li, C., Du, Z., Boucadair, M., Contreras, L. M., and J.
Drake, "A Framework for Computing-Aware Traffic Steering
(CATS)", Work in Progress, Internet-Draft, draft-ldbc-
cats-framework-06, 8 February 2024,
<https://datatracker.ietf.org/doc/html/draft-ldbc-cats-
framework-06>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
[RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S.,
Previdi, S., Roome, W., Shalunov, S., and R. Woundy,
"Application-Layer Traffic Optimization (ALTO) Protocol",
RFC 7285, DOI 10.17487/RFC7285, September 2014,
<https://www.rfc-editor.org/rfc/rfc7285>.
Randriamasy, et al. Expires 5 September 2024 [Page 18]
Internet-Draft TODO - Abbreviation March 2024
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.
12.2. Informative References
[DC-AI-COST]
"Generative AI Breaks The Data Center - Data Center
Infrastructure And Operating Costs Projected To Increase
To Over $76 Billion By 2028", Forbes, Tirias Research
Report , 2023.
[EDGE-ENERGY]
"Estimating energy consumption of cloud, fog, and edge
computing infrastructures", IEEE Transactions on
Sustainable Computing , 2019.
[I-D.contreras-alto-service-edge]
Contreras, L. M., Randriamasy, S., Ros-Giralt, J., Perez,
D. A. L., and C. E. Rothenberg, "Use of ALTO for
Determining Service Edge", Work in Progress, Internet-
Draft, draft-contreras-alto-service-edge-10, 13 October
2023, <https://datatracker.ietf.org/doc/html/draft-
contreras-alto-service-edge-10>.
[I-D.dunbar-cats-edge-service-metrics]
Dunbar, L., Majumdar, K., Mishra, G. S., Wang, H., and H.
Song, "5G Edge Services Use Cases", Work in Progress,
Internet-Draft, draft-dunbar-cats-edge-service-metrics-01,
6 July 2023, <https://datatracker.ietf.org/doc/html/draft-
dunbar-cats-edge-service-metrics-01>.
[I-D.llc-teas-dc-aware-topo-model]
Lee, Y., Liu, X., and L. M. Contreras, "DC aware TE
topology model", Work in Progress, Internet-Draft, draft-
llc-teas-dc-aware-topo-model-03, 10 July 2023,
<https://datatracker.ietf.org/doc/html/draft-llc-teas-dc-
aware-topo-model-03>.
[IR] "Cloud Performance Testing Best Tips and Tricks", n.d.,
<https://www.ir.com/guides/cloud-performance-testing>.
[LF-EDGE] "Linux Foundation Edge", https://www.lfedge.org/ , March
2023.
[NFV-INF] "ETSI GS NFV-INF 010, v1.1.1, Service Quality Metrics", 1
December 2014, <https://www.etsi.org/deliver/etsi_gs/NFV-
INF/001_099/010/01.01.01_60/gs_NFV-INF010v010101p.pdf>.
Randriamasy, et al. Expires 5 September 2024 [Page 19]
Internet-Draft TODO - Abbreviation March 2024
[NFV-TST] "ETSI GS NFV-TST 008 V3.3.1, NFVI Compute and Network
Metrics Specification", 1 June 2020,
<https://www.etsi.org/deliver/etsi_gs/NFV-
TST/001_099/008/03.03.01_60/gs_NFV-TST008v030301p.pdf>.
[RFC7666] Asai, H., MacFaden, M., Schoenwaelder, J., Shima, K., and
T. Tsou, "Management Information Base for Virtual Machines
Controlled by a Hypervisor", RFC 7666,
DOI 10.17487/RFC7666, October 2015,
<https://www.rfc-editor.org/rfc/rfc7666>.
[UPCLOUD] "How to benchmark Cloud Servers", May 2023,
<https://upcloud.com/resources/tutorials/how-to-benchmark-
cloud-servers>.
Acknowledgments
TODO acknowledge.
Authors' Addresses
S. Randriamasy
Nokia Bell Labs
Email: sabine.randriamasy@nokia-bell-labs.com
L. M. Contreras
Telefonica
Email: luismiguel.contrerasmurillo@telefonica.com
Jordi Ros-Giralt
Qualcomm Europe, Inc.
Email: jros@qti.qualcomm.com
Roland Schott
Deutsche Telekom
Email: Roland.Schott@telekom.de
Randriamasy, et al. Expires 5 September 2024 [Page 20]