Internet DRAFT - draft-ietf-idr-5g-edge-service-metadata
draft-ietf-idr-5g-edge-service-metadata
Network Working Group L. Dunbar
Internet-Draft Futurewei
Intended status: Standards Track K. Majumdar
Expires: 24 August 2024 Microsoft Azure
H. Wang
Huawei
G. Mishra
Verizon
Z. Du
China Mobile
21 February 2024
BGP Extension for 5G Edge Service Metadata
draft-ietf-idr-5g-edge-service-metadata-15
Abstract
This draft describes a new Metadata Path Attribute and some Sub-TLVs
for egress routers to advertise the Metadata about the attached edge
services (ES). The edge service Metadata can be used by the ingress
routers in the 5G Local Data Network to make path selections not only
based on the routing cost but also the running environment of the
edge services. The goal is to improve latency and performance for 5G
edge services.
The extension enables an edge service at one specific location to be
more preferred than the others with the same IP address (ANYCAST) to
receive data flow from a specific source, like a specific User
Equipment (UE).
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119] [RFC8174] when, and only when, they appear in all capitals,
as shown here.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Dunbar, et al. Expires 24 August 2024 [Page 1]
Internet-Draft Metadata Path February 2024
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 24 August 2024.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions used in this document . . . . . . . . . . . . . . 3
3. Metadata Influenced Ingress Node Behavior . . . . . . . . . . 4
3.1. Metadata Influenced BGP Path Selection . . . . . . . . . 5
3.2. Ingress Router Forwarding Behavior . . . . . . . . . . . 5
3.3. Forwarding Behavior when UEs Move . . . . . . . . . . . . 5
4. Edge Service Metadata Encoding . . . . . . . . . . . . . . . 5
4.1. Metadata Path Attribute . . . . . . . . . . . . . . . . . 6
4.1.1. Metadata Path Attribute Handling Procedure . . . . . 6
4.1.2. TLV Format . . . . . . . . . . . . . . . . . . . . . 7
4.2. The Site Preference Index Sub-TLV . . . . . . . . . . . . 7
4.3. Site Physical Availability Index Metadata . . . . . . . . 8
4.3.1. Site Index Associated to Routes . . . . . . . . . . . 10
4.3.2. BGP UPDATE with standalone Site Availability Index . 10
4.4. Service Delay Prediction Index . . . . . . . . . . . . . 11
4.4.1. Service Delay Prediction Sub-TLV . . . . . . . . . . 12
4.4.2. Service Delay Prediction Based on Load Measurement . 13
4.4.3. Raw Load Measurement Sub-TLV . . . . . . . . . . . . 14
5. Service Metadata Influenced Decision Process . . . . . . . . 14
5.1. Egress Node Behavior . . . . . . . . . . . . . . . . . . 15
5.2. Integrating Network Delay with the Service Metrics . . . 16
5.3. Integrating with BGP decision process . . . . . . . . . . 17
6. Service Metadata Propagation Scope . . . . . . . . . . . . . 18
7. Minimum Interval for Metrics Change Advertisement . . . . . . 19
8. Validation and Error Handling . . . . . . . . . . . . . . . . 19
Dunbar, et al. Expires 24 August 2024 [Page 2]
Internet-Draft Metadata Path February 2024
9. Manageability Considerations . . . . . . . . . . . . . . . . 19
10. Security Considerations . . . . . . . . . . . . . . . . . . . 20
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
11.1. Metadata Path Attribute . . . . . . . . . . . . . . . . 21
11.2. Metadata Path Attribute Sub-Types . . . . . . . . . . . 21
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
13.1. Normative References . . . . . . . . . . . . . . . . . . 22
13.2. Informative References . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction
This document describes a new Metadata Path Attribute added to a BGP
UPDATE message [RFC4271] for egress routers to advertise the Metadata
about 5G low latency edge services directly attached to the egress
routers. 5G [TS.23.501-3GPP] is characterized by having edge services
closer to the Cell Towers reachable by Local Data Networks (LDN).
From IP network perspective, the 5G LDN is a limited domain [RFC8799]
with edge services a few hops away from the ingress nodes. Only
selective UE services are considered as 5G low latency edge services.
Note: The proposed edge service Metadata Path Attribute are not
intended for the best-effort services reachable via the public
internet. The information carried by the Metadata Path Attribute can
be used by the ingress routers to make path selections for selective
low latency services based on not only the network distance but also
the running environment of the edge cloud sites. The goal is to
improve latency and performance for 5G ultra-low latency services.
The extension is targeted for a single domain with RR controlling the
propagation of the BGP UPDATE. The edge service Metadata Path
Attribute is only attached to the low latency services (routes)
hosted in the 5G edge cloud sites, which are only a small subset of
services initiated from UEs, not for UEs accessing many internet
sites.
2. Conventions used in this document
The following conventions are used in this document.
Edge DC: Edge Data Center, which provides the hosting environment
for the edge services. An Edge DC might host 5G core functions in
addition to the frequently used edge services.
gNB: next generation Node B [TS.23.501-3GPP]
RTT: Round-trip Time
Dunbar, et al. Expires 24 August 2024 [Page 3]
Internet-Draft Metadata Path February 2024
PSA: PDU Session Anchor (UPF) [TS.23.501-3GPP]
UE: User Equipment
UPF: User Plane Function [TS.23.501-3GPP]
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC8174] when, and only when, they appear in all capitals, as
shown here.
3. Metadata Influenced Ingress Node Behavior
The goal of this edge service Metadata Path Attribute is for egress
routers to propagate the metrics about the running environment for a
subset of edge services to ingress routers so that the ingress
routers can make path selections based on not only the routing cost
but also the running environment for those edge services. The BGP
speakers that do not support the Metadata Path Attribute can ignore
the Metadata Path Attribute in a BGP UPDATE Message. All
intermediate nodes can forward the entire BGP UPDATE as it is.
Multiple metrics can be attached to one Metadata Path Attribute. One
Metadata Path Attribute can contain computing service capability
information, computing service states, computing resource states of
the corresponding edge site, or more. Computing service capability
information can be used to record information of the computing power
node or initialization deployment information for computing service
initialization. Computing service states can include one of the
service connection numbers, service duration, and so on. Computing
resource states can be detailed information on computing resources
such as CPU/GPU. They can also be an abstract metric from these
detailed parameters to indicate the resource status of the edge site.
There could be more metrics about the running environment being
attached to the Metadata Path Attribute, e.g., some of the metrics
being discussed by the CATS WG. This document illustrates a few
examples of Sub-TLVs of the metrics under the edge service Metadata
Path Attribute:
- the site physical availability index
- the site preference index
- the service delay predication index x, and
- the raw load measurement.
Dunbar, et al. Expires 24 August 2024 [Page 4]
Internet-Draft Metadata Path February 2024
This section specifies how those Metadata impact the ingress node's
path selections.
3.1. Metadata Influenced BGP Path Selection
When an ingress router receives BGP updates for the same IP prefix
from multiple egress routers, all these egress routers' loopback
addresses are considered as the next hops for the IP prefix. For the
selected low latency edge services, the ingress router BGP engine
would call an edge service Management function that can select paths
based on the edge service Metadata received. Section 5.1 has an
exemplary algorithm to compute the weighted path cost based on the
edge service Metadata carried by the Sub-TLV(s) specified in this
document.
Section 5 has the detailed description of the edge service Metadata
influenced optimal path selection.
3.2. Ingress Router Forwarding Behavior
When the ingress router receives a packet and does a lookup on the
route in the FIB, it gets the destination prefix's whole path. It
encapsulates the packet destined towards the optimal egress node.
For subsequent packets belonging to the same flow, the ingress router
needs to forward them to the same egress router unless the selected
egress router is no longer reachable. Keeping packets from one flow
to the same egress router, a.k.a. Flow Affinity, is supported by
many commercial routers. Most registered EC services have relatively
short flows.
How Flow Affinity is implemented is out of the scope for this
document.
3.3. Forwarding Behavior when UEs Move
When a UE moves to a new 5G gNB which is anchored to the same UPF,
the packets from the UE traverse to the same ingress router. Path
selection and forwarding behavior are same as before.
If the UE maintains the same IP address when anchored to a new UPF,
the directly connected ingress router might use the information
passed from a neighboring router to derive the optimal Next Hop for
this route. The detailed algorithm is out of the scope of this
document.
4. Edge Service Metadata Encoding
Dunbar, et al. Expires 24 August 2024 [Page 5]
Internet-Draft Metadata Path February 2024
4.1. Metadata Path Attribute
The Metadata Path Attribute is an optional BGP Path attribute to
carry metrics and metadata about the edge services attached to the
egress router. The Metadata Path Attribute, to be assigned by IANA
[RFC2042], consists of a set of Sub-TLVs, and each Sub-TLV contains
information for specific metrics of the edge services.
4.1.1. Metadata Path Attribute Handling Procedure
Most BGP UPDATE messages don't include the Metadata Path Attribute.
For the limited edge services that need to advertise the metadata
about the services, the Metadata Path Attribute can be included in a
BGP UPDATE message [RFC4271] together with other BGP Path Attributes
[IANA-BGP-PARAMS], such as Communities [RFC4360], NEXT_HOP, Tunnel
Encapsulation Path Attribute [RFC9012], etc.
The BGP Metadata Path attribute MAY be attached to BGP IPv4/IPv6
Unicast prefixes, BGP Labeled IPv4/IPv6 prefixes [RFC8277], and IPv4/
IPv6 Anycast prefixes [RFC4786]. In order to prevent distribution of
the BGP Metadata Path Attribute beyond its intended scope of
applicability, attribute filtering SHOULD be deployed to remove the
BGP Metadata Path attribute at the administrative boundary.
A BGP speaker that advertises a path received from one of its
neighbors SHOULD advertise the BGP Metadata Path attribute received
with the path without modification as long as the BGP Metadata Path
attribute was acceptable. If the path did not come with a BGP
Metadata Path attribute, the speaker MAY attach a BGP Metadata
Attribute to the path if configured to do so.
The Metadata Path Attribute MUST contain at least one metadata Sub-
TLV. Multiple Metadata Sub-TLVs can be included in a Metadata Path
Attribute in one BGP UPDATE message. The content of the Sub-TLVs
present in the BGP Metadata Path attribute is determined by the
configuration. When a BGP Speaker does not recognize some of the
Sub-TLVs within one Metadata Path Attribute in a BGP UPDATE message,
the BGP Speaker should forward the received BGP UPDATE message
without any change if the transitive bit is set to 1 [RFC4271]. The
domain ingress nodes SHOULD process the recognized Sub-TLVs carried
by the Metadata Path Attribute and ignore the unrecognized Sub-TLVs.
By default, a BGP speaker does not report any unrecognized Sub-TLVs
within a Metadata Path Attribute unless configured to send a
notification to its management system. The ingress node should be
configured with an algorithm to combine the recognized metrics
carried by the Sub-TLVs within a Metadata Path Attribute of the
received BGP UPDATE message.
Dunbar, et al. Expires 24 August 2024 [Page 6]
Internet-Draft Metadata Path February 2024
The metrics Sub-TLVs included in the Metadata Path Attribute apply to
all the address families carried in the NLRI field of the BGP UPDATE
message [RFC4271]. For a multi-protocol BGP UPDATE message [RFC4760]
[RFC7606], the metrics Sub-TLVs included in the Metadata Path
Attribute apply to all the AFIs/SAFIs address families carried by the
MP_REACH_NLRI.
4.1.2. TLV Format
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attr. Flags |MetaDataPathAtt| Length (2 Octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Value (multiple Metadata Sub-TLVs) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Metadata Path Attribute
Attr.Flags: Attribute flags, defined as:
- The high-order bit (bit 0): set to 1.
- The second high-order bit (bit 1): set to 0 to indicate that the
service-metadata is not transitive. Only intended for the
receiving router.
- The third high-order bit (bit 2): same as specified by RFC4721.
- The fourth high-order bit (bit 3): set to 1 to indicate there are
two octets for the Length field.
MetaDataPathAtt: Metadata Path Attribute: TBD1 (assigned by IANA).
Length: the total number of octets of the value field.
All values in the Sub-TLVs are unsigned 32 bits integers.
4.2. The Site Preference Index Sub-TLV
Different services might have different preference index values
configured for the same site. For example, Service-A requires high
computing power, Service-B requires high bandwidth among its
microservices, and Service-C requires high volume storage capacity.
For a DC with relatively low storage capacity but high bisectional
bandwidth, its preference index value for Service-B is higher and
Dunbar, et al. Expires 24 August 2024 [Page 7]
Internet-Draft Metadata Path February 2024
lower for Service-C. Site Preference Index can also be used to
achieve stickiness for some services.
It is out of the scope of this document how the preference index is
determined or configured.
The Preference Index Sub-TLV has the following format:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Site-Preference-Index Sub-Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Preference Index value |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Preference Index Sub-TLV
- Site-Preference-Index Sub-Type =1 (specified in this document).
- Preference Index value: 1 .. (2^32-1); the higher the value, the
more preference the site. Preference Index value == 0 is
reserved.
4.3. Site Physical Availability Index Metadata
The Site Physical Availability Index indicates the percentage of
impact on a group of routes associated with a common physical
characteristic, for example, a pod, a row of server racks, a floor,
or an entire DC. The purpose is to use one UPDATE message to
indicate a group of routes of different NLRIs impacted by a physical
event. For example, a power outage to a pod can cause the Site
Physical Availability Index to be 0% for all the routes in the pod.
Partial fiber cut to a row of shelves can cause the Site Physical
Availability Index to 50% for all the routes in those shelves. The
value is 0-100, with 100% indicating the site is fully functional, 0%
indicating the site is entirely out of service, and 50% indicating
the site is 50% degraded.
It is recommended to assign each route with one Site-ID. Depending
on deployment, one DC can use POD number as Site-ID, another DC can
use Row of Shelves as the Site-ID.
Dunbar, et al. Expires 24 August 2024 [Page 8]
Internet-Draft Metadata Path February 2024
Cloud Site/Pod failures and degradation include but are not limited
to, a site degradation or an entire site going down caused by a
variety of reasons, such as fiber cut connecting to the site or among
pods, cooling failures, insufficient backup power, cyber threats
attacks, too many changes outside of the maintenance window, etc.
Fiber-cut is not uncommon within a Cloud site or between sites.
When those failure events happen, the edge (egress) router is running
fine. Therefore, the ingress routers with paths to the egress router
can't use BFD to detect the failures.
When there is a failure occurring at an edge site (or a pod), many
instances can be impacted. In addition, the routes (i.e., the IP
addresses) in the site might not be aggregated nicely. Instead of
many BGP UPDATE messages to the ingress routers for all the
instances, i.e. routes, impacted, the egress router can send one
single BGP UPDATE to indicate the capacity availability of the site.
The ingress routers can switch all or a portion of the instances
associated with the site depending on how much the site is degraded.
The BGP UPDATE for the individual instances (i.e., the routes) can
include the Capacity Availability Index solely for ingress routers to
associate the routes with the Side-ID. The actual Capacity
Availability Index value, i.e., the percentage for all the routes
associated with the Side-ID, is generated by the egress routers with
the egress routers' loopback address as the NLRI.
The Site Physical Availability Index Sub-TLV has fixed length of 4
Octets. Therefore there is no Length field.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PhyAvailIdx Sub-Type |I| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Site-ID (2 octets) | Site Availability Percentage |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: Site Physical Availability Index Sub-TLV
- PhyAvailIdx: Site-Physical-Availability-Index Sub-Type=2
(Specified in this document).
- Site ID: is an identifier for a group of routes associated with a
Dunbar, et al. Expires 24 August 2024 [Page 9]
Internet-Draft Metadata Path February 2024
common physical characteristic, for example, a pod, a row of
server racks, a floor, or an entire DC. The purpose is to use one
UPDATE message to indicate a group of routes impacted by a
physical event. Those routes might be from different address
families or NLRIs. There could be multiple sites connected to one
egress router (a.k.a. Edge DC GW).
Route-Flag I: When set to 1, the Site Availability Index is for BGP
speakers (receivers) to associate the routes with the Site-ID.
The Site Availability Percentage value is ignored. When set to 0,
the BGP speakers (receivers) should apply the Site Availability
Index value to all the routes associated with the Site-ID.
Site Availability Percentage: When the RouteFlag-I is 1, the Site
Availability Percentage is ignored by the Ingress routers. When
the RouteFlag I is set to 0, the Site Availability Percentage
represents the percentage of the site availability for all the
routes associated with the Site-ID, e.g., 100%, 50%, or 0%. When a
site goes dark, the Index is set to 0. 50 means 50% functioning.
When the value is outside the 0-100% range, the value carried in
this Sub-TLV is ignored.
4.3.1. Site Index Associated to Routes
An egress router can append the Site Physical Availability Index Sub-
TLV (RouteFlag-I=1) with a BGP ROUTE UPDATE message for the
registered low latency edge services so that the ingress routers can
associate the Site reference Identifier to the route in the Routing
table.
However, it is unnecessary to include the Site Physical Availability
Index for every BGP Update message if there is no change to the site-
reference identifier or the Site Physical Availability value for the
service instances.
4.3.2. BGP UPDATE with standalone Site Availability Index
When an ingress router receives a BGP update message from Router-X
with a prefix of the loopback for Router-X and the Metadata Path
Attribute with the Site Physical Availability Index Sub-TLV, the new
Site Physical Availability Index value is applied to all the routes
that have the following two constraints: a) have router-X as their
next hop, and b) associated with site-ID. When there are failures or
degradation to a site, the corresponding egress router can send one
BGP UPDATE with the Site Physical Availability Index with the egress
router's loopback address.
Dunbar, et al. Expires 24 August 2024 [Page 10]
Internet-Draft Metadata Path February 2024
The BGP UPDATE with a standalone Site Availability Index is NOT
intended for resolving NextHop.
4.4. Service Delay Prediction Index
It is desirable for an ingress router to select a site with the
shortest processing time for an ultra-low latency service. But it is
not easy to predict which site has "the fastest processing time" or
"the shortest processing delay" for an incoming service request
because:
- The given service instance shares the same physical infrastructure
with many other applications and service instances. Service
requests by other applications, UEs, or applications running
behavior can impact the processing time for the given service
instance.
- The given service instance can be served by a cluster of servers
behind a Load Balancer. To the network, the service is identified
by one service ID.
- The service complexity is different. One service may call many
microservices, need to access multiple backend databases, and need
to go through sophisticated security scrubbing functions, etc.
Another service can be processed by a few simple steps. Without
the application internal logic, it is not easy to estimate the
processing time for future service requests.
Even though utilization measurements, like those below, are collected
by most data centers, they cannot indicate which site has the
shortest processing time. A service request might be processed
faster on Site-A even if Site-A is overutilized.
o Server utilization for the server where the instance is
instantiated.
o The network utilization for the links to the server where the
instance is instantiated.
o The number of databases that the service instance will access.
o The memory utilization of the databases
The remaining available resource at a site is a more reasonable
indication of process delay for future service requests.
o The remaining available Server resources.
Dunbar, et al. Expires 24 August 2024 [Page 11]
Internet-Draft Metadata Path February 2024
o The remaining available network utilization for the links to the
server where the instance is instantiated.
o The number of databases that the service instance will access.
o The remaining storage available for the databases.
The Service Delay Prediction Index is a value that predicts
processing delays at the site for future service requests. The
higher the value, the longer of the delay.
4.4.1. Service Delay Prediction Sub-TLV
While out of scope, we assume there is an algorithm that can derive
the Service Delay Prediction Index that can be assigned to the egress
router. When the Service Delay Prediction value is updated, which
can be triggered by the available resources change, etc., the egress
router can attach the updated Service Delay Predication value in a
Sub-TLV under the Metadata Path Attribute of the BGP Route UPDATE
message to the ingress routers.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ServiceDelayPredict Sub-Type | Length |F| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Service Delay Predication Value |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Service Delay Prediction Index Sub-TLV
- ServiceDelayPredict: (Service Delay Predication) Sub-type=3
(specified in this document).
- Flag (F): A single bit flag to indicate the specific condition of
the Service Delay Predication Value.
- Service Delay Predication Value (when the Flag bit is set to 1):
an integer in the range of 0-100, with 0 indicating that the
service delay is negligible and 100 indicating that the site has
the most significant delay compared to all other sites for the
same service. When the value is outside the 0-100 range, the
value carried in this Sub-TLV is ignored.
- Service Delay Predication Value (when the Flag bit is set to 0):
the estimated delay time as defined in RFC5905.
Dunbar, et al. Expires 24 August 2024 [Page 12]
Internet-Draft Metadata Path February 2024
4.4.2. Service Delay Prediction Based on Load Measurement
When data centers detailed running status are not exposed to the
network operator, historic traffic patterns through the egress nodes
can be utilized to predict the load to a specific service. For
example, when traffic volume to one service at one data center
suddenly increases a huge percentage compared with the past 24 hours
average, it is likely caused by a larger than normal demand for the
service. When this happens, another data center with lower-than-
average traffic volume for the same service might have a shorter
processing time for the same service.
Here are some measurements that can be utilized to derive the Service
Delay Predication for a service ID:
- Total number of packets to the attached service instance
(ToPackets);
- Total number of packets from the attached service instance
(FromPackets);
- Total number of bytes to the attached service instance (ToBytes);
- Total number of bytes from the attached service instance
(FromBytes);
- The actual load measurement to the service instance attached to an
egress router can be based on one of the metrics above or
including all four metrics with different weights applied to each,
such as:
LoadIndex = w1*ToPackets+w2*FromPackes+w3*ToBytes+w4*FromBytes
Where w1/w2/w3/w4 are between 0-1. w1+ w2+ w3+ w4 = 1;
The weights of each metric contributing to the index of the
service instance attached to an egress router can be configured or
learned by self-adjusting based on user feedbacks.
The Service Delay Prediction Index can be derived from
LoadIndex/24Hour-Average. A higher value means a longer delay
prediction. The egress router can use the ServiceDelayPred sub-TLV
to indicate to the ingress routers of the delay prediction derived
from the traffic pattern.
Note: The proposed IP layer load measurement is only an estimate
based on the amount of traffic through the egress router, which might
not truly reflect the load of the servers attached to the egress
Dunbar, et al. Expires 24 August 2024 [Page 13]
Internet-Draft Metadata Path February 2024
routers. They are listed here only for some special deployments
where those metrics are helpful to the ingress routers in selecting
the optimal paths.
4.4.3. Raw Load Measurement Sub-TLV
When ingress routers have embedded analytics tool relying on the raw
measurements, it is useful for the egress router to send the raw
measurement.
Raw Load Measurement Sub-TLV has the following format:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Raw-Load-Measurement Sub-Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Measurement Period |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total number of packets to the Edge Service |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total number of packets from the Edge Service |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total number of bytes to the Edge Service |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total number of bytes from the Edge Service |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Service Delay Prediction Raw Measurements Sub-TLV
- Raw-Load-Measurement Sub-Type =4 (specified in this document): Raw
measurements of packets/bytes to/from the edge service address.
- The receiver nodes can compute the Service Delay Prediction for the
Service based on the raw measurements sent from the egress node and
preconfigured algorithms.
- Measurement Period: BGP Update period in Seconds or user-specified
period.
5. Service Metadata Influenced Decision Process
Dunbar, et al. Expires 24 August 2024 [Page 14]
Internet-Draft Metadata Path February 2024
5.1. Egress Node Behavior
Multiple instances of the same service could be attached to one
egress router. When all instances of the same service are grouped
behind one application layer load balancer, they appear as one single
route to the egress router, i.e., the application loader balancer's
prefix. Under this scenario, the compute metrics for all those
instances behind one application layer balancer are aggregated under
the application load balancer's prefix. In this case, the compute
metrics aggregated by the Load Balancer are visible to the egress
router as associated with the Load Balancer's prefix. However, how
the application layer Load Balancers distribute the traffic among
different instances is out of the scope of this document. When
multiple instances of the same service have different paths or links
reachable from the egress router, multiple groups of metrics from
respective paths could be exposed to the egress router. The egress
router can have preconfigured policies on aggregating various metrics
from different paths and the corresponding policies in selecting a
path for forwarding the packets received from ingress routers. The
aggregated metrics can be carried in the BGP Update messages instead
of detailed measurements to reduce the entries advertised by the
control plane and dampen the routes update in the forwarding plane.
Upon receiving packets from ingress routers, the egress router can
use its policies to choose an optimal path to one service instance.
It is out of the scope of this document how the measurements are
aggregated on egress routers and how ingress routers are configured
with the algorithms to integrate the aggregated metrics with network
layer metrics.
Dunbar, et al. Expires 24 August 2024 [Page 15]
Internet-Draft Metadata Path February 2024
Many measurements could impact and correspondingly reflect service
performance. In order to simplify an optimal selection process,
egress routers can have preconfigured policies or algorithms to
aggregate multiple metrics into one simple one to ingress routers.
Though out of the scope of this document, an egress router can also
have an algorithm to convert multiple metrics to network metrics, an
IGP cost for each instance, to pass to ingress nodes. This decision-
making process integrates network metrics computed by traditional
IGP/BGP and the service delay metrics from egress routers to achieve
a well-informed and adaptive routing approach. This intelligent
orchestration at the edge enhances the service's overall performance
and optimizes resource utilization across the distributed
infrastructure. When the egress has merged the compute metrics from
the local sites behind it, it can include one or more aggregated
compute metrics in the Metadata Path Attribute in the BGP UPDATE to
the Ingress. Also, an identifier or flag can be carried to indicate
that the metrics are merged ones. After receiving the routes for the
Service ID with the identifier, the ingress would do the route
selection based on pre-configured algorithms (see Section 3 of this
document).
5.2. Integrating Network Delay with the Service Metrics
As the service metrics and network delays are in different units,
here is an exemplary algorithm for an ingress router to compare the
cost to reach the service instances at Site-i or Site-j.
ServD-i * CP-j Pref-j * NetD-i
Cost-i=min(w *(----------------) + (1-w) *(------------------))
ServD-j * CP-i Pref-i * NetD-j
CP-i: Capacity Availability Index at Site-i. A higher value means
higher capacity available.
NetD-i: Network latency measurement (RTT) to the Egress Router at
the site-i.
Pref-i: Preference Index for Site-i, a higher value means higher
preference.
ServD-i: Service Delay Predication Index at Site-i for the service,
i.e., the ANYCAST address [RFC4786] for the service.
w: Weight is a value between 0 and 1. If smaller than 0.5, Network
latency and the site Preference have more influence; otherwise,
Service Delay and capacity availability have more influence.
Dunbar, et al. Expires 24 August 2024 [Page 16]
Internet-Draft Metadata Path February 2024
When a set of service Metadata is converted to a simple metric, a
decision process is determined by the metric semantics and deployment
situations. The goal is to integrate the conventional network
decision process with the service Metadata into a unified decision-
making process for path selection.
5.3. Integrating with BGP decision process
When an ingress router receives BGP updates for the same IP address
from multiple egress routers, all those egress routers are considered
as the next hops for the IP address. For the selected services
configured to be influenced by the edge service Metadata, the ingress
router BGP Decision process [IDR-CUSTOM-DECISION] would trigger the
edge service Management function to compute the weight to be applied
to the route's next hop in the forwarding plane. The decision
process is influenced by the edge service Metadata associated with
the client routes, such as Capacity Availability Index, Site
Preference, and Service Delay Prediction Index, in addition to the
traditional BGP multipath computation algorithm, such as the Weight,
Local preference, Origin, MED, etc., shown below:
BGP ANYCAST Update
+--------+ with Metadata +---------------+
| BGP |----------------->| EdgeServiceMgn|
|Decision|< - - - - - - - - | |
+---^-|--+ +-------|-------+
| | BGP ANYCAST | Update Anycast
| | Route | Route Nexthops
| | Multi-path NH install | with weight
+---|-V--+ |
| RIB | |
+----+---+ |
| |
+---V------------------------------V-------+
| Forwarding Plane |
| |
+------------------------------------------+
Figure 6: Metadata Influenced Decision
When any of those metadata value goes to 0, the effect is the same as
the routes becoming ineligible via the egress router who originates
the metadata UPDATE. But when any of those metadata just degrade,
there is possibility, even though smaller, for the egress router to
continue as the optimal next hop.
Dunbar, et al. Expires 24 August 2024 [Page 17]
Internet-Draft Metadata Path February 2024
Suppose a destination address for aa08::4450 can be reached by three
next hops (R1, R2, R3). Further, suppose the local BGP's Decision
Process based on the traditional network layer policies and metrics
identifies the R1 as the optimal next hop for this destination
(aa08::4450). If the edge service Metadata results in R2 as the
optimal next hop for the prefix, the Forwarding Plane will have R2 as
the next-hop for the destination address of aa08::4450.
The edge service Metadata influencing next hop selection is different
from the metric (or weight) to the next hop. The metric to a next
hop can impact many (sometimes, tens of thousands) routes that have
the node as their next hop. while as the edge service Metadata only
impact the optimal next hop selection for a subset of client routes
that are identified as the edge services.
When the BGP custom decision [idr-custom-decision] is used, the edge
service Management function would have algorithm to combine the edge
service Metadata attributes with the custom decision to derive the
optimal next hop for the Edge service routes.
Note: For a BGP UPDATE message that includes the edge service
Metadata Path Attribute with the RouteFlag-I=0 and the egress
router's loopback prefix as the NLRI, the Site Capacity Availability
Index value is applied to all the routes associated with the Site-ID.
6. Service Metadata Propagation Scope
Service Metadata are only distributed to the relevant ingress nodes
interested in the Service, which can be configured or automatically
formed.
For each registered low-latency Service, BGP RT Constrained
Distribution [RFC4684] can be used to form the Group interested in
the Service. The "Service ID", an IP address prefix, is the Route
Target. When an ingress router receives the first packet of a flow
destined to a Service ID (i.e., IP prefix), the ingress router sends
a BGP UPDATE that advertises the Route Target membership NLRI per
[RFC4684]. The ingress router must assign a Timer for the Service
ID, as the UE that uses the Service ID might move away. Upon
receiving a packet destined for the Service ID, the ingress router
must refresh the Timer. The ingress router must send a BGP Withdraw
UPDATE for the Service ID upon expiration of the Timer.
[RFC4684] specifies SAFI=132 for the Route Target membership NLRI
Advertisements.
Dunbar, et al. Expires 24 August 2024 [Page 18]
Internet-Draft Metadata Path February 2024
7. Minimum Interval for Metrics Change Advertisement
As the metrics change can impact the path selection, the Minimum
Interval for Metrics Change Advertisement is configured to control
the update frequency to avoid route oscillations. Default is 30s.
Significant load changes at EC data centers can be triggered by
short-term gatherings of UEs, like conventions, lasting a few hours
or days, which are too short to justify adjusting EC server
capacities among DCs. Therefore, the load metrics change rate can be
in the magnitude of hours or days.
8. Validation and Error Handling
In addition to the Error Handling procedure described in [RFC7606], a
BGP speaker should ignore the Metadata Path Attribute if more than
one Metadata Path Attribute is within one BGP Update message.
The Metadata Path Attribute contains a sequence of Sub-TLVs. The
Metadata Path Attribute's length determines the total number of
octets for all the Sub-TLVs under the Metadata Path Attribute. The
sum of the lengths from all the Sub-TLVs under the Metadata Path
Attribute should equal the length of the Metadata Path Attribute. If
this is not the case, the TLV should be considered malformed, and the
"Treat-as-withdraw" procedure of [RFC7606] is applied.
When more than one sub-TLV is present in a Metadata Path Attribute,
they are processed independently. Suppose a Metadata Path attribute
can be parsed correctly but contains a Sub-TLV whose type is not
recognized by a particular BGP speaker; that BGP speaker MUST NOT
consider the attribute malformed. Instead, it MUST interpret the
attribute as if that Sub-TLV had not been present. Logging the error
locally or to a management system is optional. If the route carrying
the Metadata path attribute is propagated with the attribute, the
unrecognized Sub-TLV remains in the attribute.
9. Manageability Considerations
The edge service Metadata described in this document are only
intended for propagating between Ingress and egress routers of one
single BGP domain, i.e., the 5G Local Data Networks, which is a
limited domain with edge services a few hops away from the ingress
nodes. Only the selective services by UEs are considered as 5G edge
services. The 5G LDN is usually managed by one operator, even though
the routers can be by different vendors.
Dunbar, et al. Expires 24 August 2024 [Page 19]
Internet-Draft Metadata Path February 2024
10. Security Considerations
The proposed edge service Metadata are advertised within the trusted
domain of 5G LDN's ingress and egress routers. The ingress routers
should not propagate the edge service Metadata to any nodes that are
not within the trusted domain.
To prevent the BGP UPDATE receivers (a.k.a. ingress routers in this
document) from leaking the Metadata Path Attribute by accident to
nodes outside the trusted domain [ATTRIBUTE-ESCAPE], the following
practice should be enforced:
- The Metadata Path Attribute originator sets the attribute as Non-
transitive when sending the BGP UPDATE message to its
correspoinding RR. According to [RFC4271], Non-transitive Path
Attributes are only guaranteed to be dropped during BGP route
propagation by implementations that do not recognize them.
- The RR (Route Reflector) can append the NO-ADVERTISE well-known
community to the BGP UPDATE message with Metadata Path Attribute
when forwarding to the ingress routers. By doing so, the Route
Reflector signals to ingress nodes that the associated route's
Metadata Path Attribute should not be further advertised beyond
their scope. This precautionary measure ensures that the receiver
of the BGP UPDATE message refrains from forwarding the received
update to its peers, preventing the undesired propagation of the
information carried by the Metadata Path Attribute.
BGP Route Filtering or BGP Route Policies [RFC5291] can also be used
to ensure that BGP update messages with Metadata Path Attribute
attached do not get forwarded out of the administrative domain. BGP
route filtering [RFC5291] allows network administrators to control
the advertisements and acceptance of BGP routes, ensuring that
specific routes do not leak outside the intended administrative
domain. Here are the steps to achieve this:
- Use Route Filtering: Implement route filtering policies on the
ingress routers to restrict the propagation of BGP update messages
for the registered 5G edge services beyond the administrative
domain. You can use access control lists (ACLs), prefix lists, or
route maps to filter the BGP routes classified as the 5G edge
services, which need the Metadata Path Attributes to be
distributed from egress routers to ingress routers.
- Filter by Prefix: Use prefix filtering to specify which IP
prefixes should be advertised to peers and which should be
suppressed. This step ensures that only authorized routes are
sent to external peers.
Dunbar, et al. Expires 24 August 2024 [Page 20]
Internet-Draft Metadata Path February 2024
- Use Route Maps: Route maps provide a flexible way to filter and
manipulate BGP route advertisements. You can create route maps to
match specific conditions and then apply them to the BGP
configuration.
11. IANA Considerations
11.1. Metadata Path Attribute
IANA is requested to assign a new path attribute from the "BGP Path
Attributes" registry. The symbolic name of the attribute is
"Metadata", and the reference is [This Document].
+=======+======================================+=================+
| Value | Description | Reference |
+=======+======================================+=================+
| TBD1 | Metadata Path Attribute | [this document] |
+-------+--------------------------------------+-----------------+
11.2. Metadata Path Attribute Sub-Types
IANA is requested to create a new sub-registry under the Metadata
Path Attribute registry as follows:
Name: Sub-TLVs under the "Metadata Path Attribute"
Registration Procedure: Expert Review [RFC8126].
Detailed Expert Review procedure will be added per RFC8126.
Reference: [this document]
Dunbar, et al. Expires 24 August 2024 [Page 21]
Internet-Draft Metadata Path February 2024
+========+==========================+=================+
|Sub-Type| Description | Reference |
+========+==========================+=================+
| 0 | reserved | [this document] |
+--------+--------------------------+-----------------+
| 1 | Site Preference Index | [this document] |
+--------+--------------------------+-----------------+
| 2 | Site Availability Index | [this document] |
+--------+--------------------------+-----------------+
| 3 | Service Delay Predication| [this document] |
+--------+--------------------------+-----------------+
| 4 | Raw Load Measurement | [this document] |
+--------+--------------------------+-----------------+
| 5-254 | unassigned | [this document] |
+--------+--------------------------+-----------------+
| 255 | reserved | [this document] |
+--------+--------------------------+-----------------+
12. Acknowledgements
Acknowledgements to Jeff Hass, Tom Petch, Adrian Farrel, Alvaro
Retana, Robert Raszuk, Sue Hares, Shunwan Zhuang, Donald Eastlake,
Dhruv Dhody, Cheng Li, DongYu Yuan, and Vincent Shi for their
suggestions and contributions.
13. References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006,
<https://www.rfc-editor.org/info/rfc4271>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
February 2006, <https://www.rfc-editor.org/info/rfc4360>.
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
"Multiprotocol Extensions for BGP-4", RFC 4760,
DOI 10.17487/RFC4760, January 2007,
<https://www.rfc-editor.org/info/rfc4760>.
Dunbar, et al. Expires 24 August 2024 [Page 22]
Internet-Draft Metadata Path February 2024
[RFC4786] Abley, J. and K. Lindqvist, "Operation of Anycast
Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786,
December 2006, <https://www.rfc-editor.org/info/rfc4786>.
[RFC5291] Chen, E. and Y. Rekhter, "Outbound Route Filtering
Capability for BGP-4", RFC 5291, DOI 10.17487/RFC5291,
August 2008, <https://www.rfc-editor.org/info/rfc5291>.
[RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K.
Patel, "Revised Error Handling for BGP UPDATE Messages",
RFC 7606, DOI 10.17487/RFC7606, August 2015,
<https://www.rfc-editor.org/info/rfc7606>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc8126>.
[RFC8277] Rosen, E., "Using BGP to Bind MPLS Labels to Address
Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017,
<https://www.rfc-editor.org/info/rfc8277>.
[RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder,
"The BGP Tunnel Encapsulation Attribute", RFC 9012,
DOI 10.17487/RFC9012, April 2021,
<https://www.rfc-editor.org/info/rfc9012>.
13.2. Informative References
[ATTRIBUTE-ESCAPE]
J. Haas, "BGP Attribute Escape", July 2023,
<https://datatracker.ietf.org/doc/draft-haas-idr-bgp-
attribute-escape/>.
[IANA-BGP-PARAMS]
IANA, "BGP Path Attributes", BGP Path Attributes
https://www.iana.org/assignments/bgp-parameters/.
[IDR-CUSTOM-DECISION]
A. Retana, R. White, "BGP Custom Decision Process", August
2017, <https://datatracker.ietf.org/doc/draft-ietf-idr-
custom-decision/>.
[RFC2042] Manning, B., "Registering New BGP Attribute Types",
RFC 2042, DOI 10.17487/RFC2042, January 1997,
<https://www.rfc-editor.org/info/rfc2042>.
Dunbar, et al. Expires 24 August 2024 [Page 23]
Internet-Draft Metadata Path February 2024
[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,
R., Patel, K., and J. Guichard, "Constrained Route
Distribution for Border Gateway Protocol/MultiProtocol
Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual
Private Networks (VPNs)", RFC 4684, DOI 10.17487/RFC4684,
November 2006, <https://www.rfc-editor.org/info/rfc4684>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8799] Carpenter, B. and B. Liu, "Limited Domains and Internet
Protocols", RFC 8799, DOI 10.17487/RFC8799, July 2020,
<https://www.rfc-editor.org/info/rfc8799>.
[TS.23.501-3GPP]
3rd Generation Partnership Project (3GPP), "System
Architecture for 5G System; Stage 2, 3GPP TS 23.501
v2.0.1", December 2017.
Authors' Addresses
Linda Dunbar
Futurewei
Dallas, TX,
United States of America
Email: ldunbar@futurewei.com
Kausik Majumdar
Microsoft Azure
California,
United States of America
Email: kmajumdar@microsoft.com
Haibo Wang
Huawei
Beijing
China
Email: rainsword.wang@huawei.com
Gyan Mishra
Verizon
United States of America
Email: gyan.s.mishra@verizon.com
Dunbar, et al. Expires 24 August 2024 [Page 24]
Internet-Draft Metadata Path February 2024
Zongpeng Du
China Mobile
Beijing
China
Email: duzongpeng@chinamobile.com
Dunbar, et al. Expires 24 August 2024 [Page 25]