Internet DRAFT - draft-contreras-rtgwg-rosa-gaar
draft-contreras-rtgwg-rosa-gaar
Network Working Group LM. Contreras
Internet-Draft Telefonica
Intended status: Standards Track D. Trossen
Expires: 10 January 2024 Huawei Technologies
J. Finkhaeuser
Interpeer gUG
P. Mendes
Airbus
9 July 2023
Gap Analysis and Requirements for Routing on Service Addresses
draft-contreras-rtgwg-rosa-gaar-01
Abstract
The term 'service-based routing' (SBR) captures the set of mechanisms
for the steering of traffic in an application-level service scenario.
We position this steering as an anycast problem, requiring the
selection of one of the possibly many choices for service execution
at the very start of a service transaction.
This document builds on the issues and pain points identified across
a range of use cases, reported in [I-D.mendes-rtgwg-rosa-use-cases].
We summarize the key insights and provide a gap analysis with key
technologies related to the problem of SBR, developed by the IETF
over many years. We further outline the requirements to a system
that would adequately close those gaps and thus address the pain
points of our use cases. Those requirements will be used for
outlining a suitable architecture framework in a separate document.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 10 January 2024.
Contreras, et al. Expires 10 January 2024 [Page 1]
Internet-Draft ROSA July 2023
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Observations from Use Cases . . . . . . . . . . . . . . . . . 5
4. Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1. Domain Name System (DNS) . . . . . . . . . . . . . . . . 7
4.1.1. Technology Overview . . . . . . . . . . . . . . . . . 7
4.1.2. Relation to ROSA . . . . . . . . . . . . . . . . . . 8
4.1.3. Gaps . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2. Compute-aware Traffic Steering (CATS) . . . . . . . . . . 10
4.2.1. Technology Overview . . . . . . . . . . . . . . . . . 10
4.2.2. Relation to ROSA . . . . . . . . . . . . . . . . . . 10
4.2.3. Gaps . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3. Locator-ID Separation Protocol (LISP) . . . . . . . . . . 12
4.3.1. Technology Overview . . . . . . . . . . . . . . . . . 12
4.3.2. Relation to ROSA . . . . . . . . . . . . . . . . . . 13
4.3.3. Gaps . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4. Application-Layer Traffic Optimization (ALTO) . . . . . . 15
4.4.1. Technology Overview . . . . . . . . . . . . . . . . . 15
4.4.2. Relation to ROSA . . . . . . . . . . . . . . . . . . 16
4.4.3. Gaps . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5. Technologies related to SBR . . . . . . . . . . . . . . . 17
4.5.1. Service Function Chaining (SFC) . . . . . . . . . . . 17
4.5.2. Multiplexed Application Substrate over QUIC Encryption
(MASQUE) . . . . . . . . . . . . . . . . . . . . . . 18
4.5.3. Time-Variant Routing (TVR) . . . . . . . . . . . . . 18
4.5.4. Source Packet Routing in Networking (SPRING) . . . . 19
5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 19
6. Benefits from Addressing the SBR Problem . . . . . . . . . . 24
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26
8. Security Considerations . . . . . . . . . . . . . . . . . . . 26
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
Contreras, et al. Expires 10 January 2024 [Page 2]
Internet-Draft ROSA July 2023
11. Informative References . . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29
1. Introduction
Virtualization and the proliferation of serverless service
provisioning methods have driven the capability to dynamically deploy
services in more than one network location, allowing for scaling both
horizontally and vertically in a number of use cases, some of which
can be found in [I-D.mendes-rtgwg-rosa-use-cases]. A key problem in
such use cases is that of steering the service requests stemming from
the applications, a mechanism we label as service-based routing
(SBR). A key constraint in realizing solutions for such problem is
the possible distribution of more than one service instance across
several network locations, posing the SBR problem as an inherently
anycast one.
Unlike existing methods for SBR, some of which we will survey in this
document, we envision a system we call routing on service addresses
(ROSA), that allows for suitable service-specific anycast decisions
to be made under a possibly high frequency of change to the notion of
the 'best' instance to be chosen with the expectation to yield in
better performance, such as improved service completion latency,
utilization, and others.
At the same time, it is important to recognize that we do not aim for
replacing existing service routing capabilities, most notably the DNS
as the main form of resolving a service name into routing locator; we
see those capabilities working perfectly well for many Internet
services. However, it is important to understand the gaps that those
existing methods show in realizing the emerging use cases of high
dynamicity in service relations. This document surveys key
technologies, developed in the IETF over recent years, in order to
identify the gaps of those technologies to deliver suitable solutions
to the pain points identified in our use cases of
[I-D.mendes-rtgwg-rosa-use-cases].
Complementing our gap analysis, we also formulate requirements for a
solution to those pain points. We link the various requirements to
observed issues in our use cases [I-D.mendes-rtgwg-rosa-use-cases]
for better illustration and reasoning for their inclusion.
In the remainder of this document, we first introduce in Section 2 a
terminology that provides the common language used throughout the
remainder of the document; this terminology is kept in sync with the
other ROSA draft. We then summarize the key observations from our
use cases in [I-D.mendes-rtgwg-rosa-use-cases] as a recap for the
following gap analysis in Section 4. The insights from our gap and
Contreras, et al. Expires 10 January 2024 [Page 3]
Internet-Draft ROSA July 2023
use case analysis then leads us to the requirements in Section 5,
before outlining in Section 6 the expected benefits from realizing
those requirements in a suitable system.
2. Terminology
The following terminology is used throughout the remainder of this
document:
Service: A monolithic functionality that is provided according to
the specification for said service.
Composite Service: A composite service can be built by orchestrating
a combination of monolithic (or other composite) services. From a
client perspective, a monolithic or composite nature cannot be
determined, since both will be identified in the same manner for
the client to access.
Service Instance: A running environment (e.g., a node, a virtual
instance) that provides the expected service. One service can
involve several instances running within the same ROSA network at
different network locations.
Service Address: An identifier for a specific service.
Service Instance Address: A locator for a specific service instance.
Service Request: A request for a specific service, addressed to a
specific service address, which is directed to at least one of
possibly many service instances.
Affinity Request: A request to a specific service, following an
initial service request, requiring steering to the same service
instance chosen for the initial service request.
Service Transaction: A sequence of higher-layer requests for a
specific service, consisting of at least one service request,
addressed to the service address, and zero or more affinity
requests.
Service Affinity: Preservation of a relationship between a client
and one service instance, with the initial service request
creating said affinity and following affinity requests utilizing
said affinity.
ROSA Provider: Realizing the ROSA-based traffic steering
Contreras, et al. Expires 10 January 2024 [Page 4]
Internet-Draft ROSA July 2023
capabilities over at least one infrastructure provider by
deploying and operating the ROSA components within its defining
ROSA domain.
ROSA Domain: Domain of reachability for services supported by a
single ROSA provider.
ROSA Endpoint: A node accessing or providing one or more services
through one or more ROSA providers.
ROSA Client: A ROSA endpoint accessing one or more services through
one or more ROSA providers, thus issuing services requests
directed to one of possible many service instances that have
previously announced the service address provided by the ROSA
client in the service request.
Service Address Router (SAR): A node supporting the operations for
steering service requests to one of possibly many service
instances, following the procedures outlined in a separate
architecture document.
Service Address Gateway (SAG): A node supporting the operations for
steering service requests to service addresses not announced to
SARs of the same ROSA domain to suitable endpoints in the Internet
or within other ROSA domains.
3. Observations from Use Cases
Several observations can be drawn from the use case examples in
[I-D.mendes-rtgwg-rosa-use-cases] in what concerns their technical
needs:
1. Service instances for a specific service may exist in more than
one network location, e.g., for replication purposes to serve
localized demand, while reducing latency, as well as to increase
service resilience.
2. While the deployment of service instances may follow a longer
term planning cycle, e.g., based on demand/supply patterns of
content usage, it may also have an ephemeral nature, e.g.,
through scaling in and out dynamically to cope with temporary
load situations, enabled by the temporary nature of serverless
functions.
Contreras, et al. Expires 10 January 2024 [Page 5]
Internet-Draft ROSA July 2023
3. Knowing which are the best locations to deploy a service instance
is crucial and may depend on service-specific demands, realizing
a specific service level agreement (with an underlying decision
policy) that is tailored to the service and agreed upon between
the service platform provider and the communication service
provider.
4. Decisions for selecting the 'right' or 'best' service instance
may be highly dynamic under the given service-specific decision
policy and thus may change frequently with demand patterns driven
by the use case. For instance, in our example on Distributed
Mobile applications and Metaverse in Section 3.4 and 3.8 of
[I-D.mendes-rtgwg-rosa-use-cases], respectively, human
interaction may drive the requirement for selecting a suitable
service instance down to few tens of milliseconds only, thus
creating a need for high frequency updates on the to-be-chosen
service instance. As a consequence, traffic following a specific
network path from a client to one service instance, may need to
follow another network path or even utilize an entirely different
service instance as a result of re-applying the decision policy.
5. Minimizing the latency from the initiating client request to the
actual service response arriving back at the client is crucial in
many of our scenarios. Any improvement on utilizing the best
service instance as quickly as possible, thus taking into account
any 'better' alternative to the currently used one, is crucial
for reducing service request completion latency.
6. The namespace for services and applications is separate from that
of routable identifiers used to reach the implementing endpoints,
i.e., the service instances. Resolution and gateway services are
often required to map between those namespace, adding management
and thus complexity overhead, an observation also made in
[Namespaces2022].
7. A specific service may require the execution of more than one
service instance, in an intertwining way, which in turn requires
the coordination of the right service instances, each of which
can have more than one replica in the network.
We can conclude from our observations above that (i) distribution (of
service instances), (ii) dynamicity in the availability of and
choosing the 'best' service instance, and (iii) efficiency in
utilizing the best possible service instance are crucial for our use
cases.
Contreras, et al. Expires 10 January 2024 [Page 6]
Internet-Draft ROSA July 2023
4. Gap Analysis
We now discuss observations and suitability of existing technologies
for realizing the use cases in [I-D.mendes-rtgwg-rosa-use-cases]. We
first survey technologies that possibly provide similar SBR
functionality to our use cases. Here, we have currently identified
the DNS (and solutions based on it), CATS, LISP, and ALTO as such
technologies.
We then outline works that are related to certain aspects of SBR only
for the purpose of explaining differences and relations for possible
future integration or touching points in solutions to ROSA. Here, we
currently include technologies such as SFC, SPRING, and TVR. Future
discussions and work may extend on both of those areas for a more
comprehensive analysis.
4.1. Domain Name System (DNS)
The Domain Name System (DNS) is the most prevalent method being used
for service-based routing in that it supports the resolution of a
domain name, such as foo.com, to an IP address, which is then used
for subsequent message transfer between sender and receiver. We see,
thus, the DNS and methods extending but basing themselves on the DNS,
such as Global Server Load Balancing, as the baseline for SBR. In
the following, we provide insights into the main technology and the
gaps identified towards ROSA objectives.
4.1.1. Technology Overview
The DNS [RFC1035] provides an explicit method for mapping domain
names onto an IP locator, often referred to as 'early binding'.
Those mappings are provided based on previous DNS registrations of IP
locators to certain domain names.
There are many extensions to this basic lookup mechanism, some of
which are relevant to our discussion. For instance, DNS extensions
may be used to base the decision on which IP address of several to
pick based on, e.g., geo-location or load information. For the
latter, load balancing is provided alongside the DNS resolver, e.g.,
in the form of Global Server Load Balancing (GSLB) [GSLB] solutions
in CDNs. Furthermore, a health check functionality may be provided
to resolve IP address failures, providing alternatives to detected
failures of reachability.
Contreras, et al. Expires 10 January 2024 [Page 7]
Internet-Draft ROSA July 2023
4.1.2. Relation to ROSA
As mentioned upfront, the explicit resolution provided by the DNS is
our baseline for comparison due to its widespread use in the
Internet. Albeit its rather static nature of assigning IP addresses
to domain names, it is sufficient for many of the use cases of the
Internet, where the initial selection of a suitable server address
suffices. We thus see the DNS to continue being a vital component of
the Internet and thus only focus in our following gap analysis on
those shortcomings in relatin to our identified use cases.
4.1.3. Gaps
There are number of key differences and gaps to the desired
properties of a ROSA system. Several of those gaps have already been
identified in [I-D.yao-cats-gap-reqs] and also apply here:
1. Resolution latency: The explicit resolution for a DNS name takes
additional time that adds to the overall following data transfer
with the selected IP address. It thus adds to the completion
time of the high layer request that is being made. Many
measurements exist for such latency but its extend heavily
depends on the provisioning for the underlying resource that
exposes the selected IP address. [OnOff2022], for instance,
outlines latencies ranging from 15 to 45 milliseconds where the
used DNS-based systems range from local ISP provided DNS to more
complex CN-provided GSLB [GSLB] solutions, while resolutions that
require several DNS resolver steps may easily require 100ms and
more. For many of our use cases in
[I-D.mendes-rtgwg-rosa-use-cases], such latency is prohibitive
since it may either heavily contribute or even exceed the
available delay budget of the application. But resolution
latency may also be cummulative, e.g., for web browsing, as
discussed in [OnOff2022], particularly when needing to resolve a
larger number of distinct (in terms of domain names) objects
within a given meta-object (such as a webpage). DNS latencies
may still become a decisive factor, negatively impacting the end
user experience. Through the in-band selection method in ROSA,
this explicit resolution latency is entirely avoided, therefore
also reducing the sending of 4 messages (2 for resolution and 2
for the initial data transfer) across the client access, often
being the bottleneck in Internet access, to merely two messages
for the in-band discover instead.
2. Acting on stale information: DNS applies a local caching model to
remove the burden on the DNS system when subsequently the same
request is issued again by the application. This can, however,
lead to acting on stale information for those cases where the
Contreras, et al. Expires 10 January 2024 [Page 8]
Internet-Draft ROSA July 2023
mapping has changed, more so for services where the mapping is
meant to change frequently. Applications may flush the local DNS
cache after every lookup, which may however lead to overburdening
the DNS with the number of renewed requests, possibly being
perceived as a denial-of-service attack by the DNS. ROSA aims at
avoiding any stale information or at least minimizing stale
information through more reactive routing or entirely local
scheduling selection methods.
3. Supporting dynamic resolution changes: Updating a mapping of a
domain name to an IP locator takes time to propagate. Unlike in
local environments, where extensions such as DNS-SD [RFC6763] and
DNS-multicast [RFC6762] may be used for a limited number of local
services, the propagation of renewed mappings need to propagate
the hierarchy of DNS servers in the system. Even, e.g., CDN-
local, mapping updates do not happen frequently although concrete
numbers depend on the various providers using those systems.
With that, even if resolving the domain name frequently, flushing
the cache at the client to avoid using the stale information and
ignoring any possible rate limitation of client request in first
hop DNS resolver, the mapping update may not propagate to the
client before seconds or even longer have passed. For many of
our use cases, such as for the multi-domain/homed use case in
Section 3.3, the micro-service based applications in Section 3.4
or the video-related ones in Section 3.5 and 3.6 in
[I-D.mendes-rtgwg-rosa-use-cases], this level of dynamicity does
not suffice.
4. Supporting arbitrary application identifiers: As the name
suggests, domain names are the primary naming scheme for the DNS.
Any other application identifier scheme would utilize its own
resolution scheme, possibly mimicing the workings of the DNS.
This requires a per-application support for its own identifier
scheme, such as done in the QUICr [I-D.jennings-moq-quicr-arch]
work discussed in our use case of Section 3.5 in
[I-D.mendes-rtgwg-rosa-use-cases]. This is unlike ROSA, which
aims at supporting application identifiers rather than a one size
fits all scheme only. With that, ROSA also provides the ability
to support own naming schemes that may want to explicitly avoid
the use of a centrally governed namespace as well as the use of a
central name resolution scheme that may reveal service usage
patterns to the resolver system itself, as discussed in our use
case of Section 3.10 in [I-D.mendes-rtgwg-rosa-use-cases].
Contreras, et al. Expires 10 January 2024 [Page 9]
Internet-Draft ROSA July 2023
4.2. Compute-aware Traffic Steering (CATS)
The Compute-aware Traffic Steering (CATS) WG is a newly established
working group in the IETF, which aims at supporting the selection of
one of possibly many service instances for a particular service.
This similarity in objectives makes us draw out the main concepts and
gaps to the objectives for ROSA in the following.
4.2.1. Technology Overview
Let us provide a brief overview of LISP and its main concepts - for
more detail, we refer to, e.g., [I-D.ldbc-cats-framework].
CATS proposes compute-aware decisions in sending traffic between a
client and a set of possible egress sites or directly Internet-
connected service hosts. For this, CATS introduces the CS-ID as the
CATS service identifier, which is mapped onto the CB-ID as the CATS
binding identifier. The exact nature of those identifiers is still
work-in-progress with proposals currently being presented to the CATS
WG.
CATS proposes to use an ingress-egress tunneling approach, where
ingress CATS routers use metrics to decide upon the CB-ID to be used
for an incoming request to a CS-ID. The tunneling method is
currently still under discussion with SRv6, MPLS and other
technologies being considered.
As the name suggests, the basis for the aforementioned selection at
the ingress CATS router are compute metrics that are being
distributed to the ingress CATS routers through suitable methods,
which are still under investigation together with the nature and
extend of the metrics themselves.
To support the steering of longer service transactions, CATS proposes
a CATS traffic classifier component, which associates several packets
to such longer service transaction to ensure the steering of those
packets to the same selection made for the initial packet.
4.2.2. Relation to ROSA
CATS proposes a similar anycast type of addressing and as well as
separation of service from routing identifier as done by ROSA.
Furthermore, the ingress CATS router performs a traffic steering
decision among the set of possible service instances albeit with a
focus on such decisions to be compute-aware.
Contreras, et al. Expires 10 January 2024 [Page 10]
Internet-Draft ROSA July 2023
4.2.3. Gaps
There are number of key differences and gaps to the desired
properties of a ROSA system:
1. Focus on compute-awareness: In contrast to CATS (considering the
arch and solutions currently discussed), ROSA does not
specifically consider compute-awareness. This does not prevent
using the CATS steering framework (and later solutions) to be
used outside compute-aware metrics. For this, the extensibility
to general service-specific metrics in the future metric
distribution solutions for CATS will need to be studied for that
purpose.
2. Tunneling all traffic: As mentioned above, CATS proposes an
ingress-egress tunneling of ALL traffic, which is contrary to
ROSA which merely initially selects the service instance through
the ROSA overlay, while all following packets will be directly
sent to the service instance IP address, thus not using the ROSA
overlay anymore and not tunneling any traffic either; it thus
siginificantly more lightweight on the ROSA overlay.
3. Network- vs endpoint-controlled affinity: The aforementioned
tunneling of all traffic through the CATS overlay makes it
necessary to support affinity through functionality provided by
the CATS overlay network. Specifically,
[I-D.ldbc-cats-framework] proposes use of the CATS Traffic
Classifier for this purpose, interfacing with the ingress CATS
router to convey the suitable information for detecting those
packets belonging to a previously tunneled CATS flow. ROSA
instead proposes a purely endpoint-based method where the
initiation of another endpoint selection message signals the
beginning of a new transaction, possibly being sent to a
different choice of service instance than the previous one. This
removes not just state management from the network but also the
need for explicitly supporting future types of transactions and
their associated transport/network-level identification.
4. Dynamicity of selection changes: CATS does foresee changes in
service instance selections based on the metrics being
distributed to the ingress CATS router via the CATS Service
Metric Agent (C-SMA) and he CATS Network Metric Agent (C-NMA).
Currently, necessary routing protocols (and their possible use
and/or extension) are actively discussed. ROSA does foresee see
use of ingress-based scheduling of selection messages, not
requiring frequent metric updates to the ingress point and
therefore allowing for higher frequencies of changes, such as
prescribed in the AR/VR use case in Section 3.6 of
Contreras, et al. Expires 10 January 2024 [Page 11]
Internet-Draft ROSA July 2023
[I-D.mendes-rtgwg-rosa-use-cases]. Relying on routing-based
approaches to metric changes makes the realization of such high
frequency changes difficult or impossible due to the associated
routing overhead and latency for propagation of updated metrics.
5. Adherence to underlay routing policy: ROSA performs endpoint
selection (from a set of possible choices), either routing- or
ingress-based, where any subsequent message(s) that follows the
selection message will traverse the network provider(s) defined
IPv6 path. Here we see ROSA more aligned (conceptually) with
existing SBR methods, such as DNS+IP, where selection precedes
the subsequent network provider policy defined data transfer.
CATS, instead, is currently looking into methods for active path
(selection) control for ALL tunnelled CATS messages, e.g., using
SRv6 or MPLS. However, a purely IP-in-IP tunneling at the
ingress CATS router would align CATS with ROSA in this respect.
Conversely, ROSA may provide such overlay path steering methods
by providing SRv6 path information as the result of the endpoint
selection message.
4.3. Locator-ID Separation Protocol (LISP)
The Locator-ID Separation Protocol (LISP) WG has been in existence
for many years, aiming at separation endpoint identifiers (called
EIDs) and routing locators (called RLOCs) for better scalability of
adjusting to changes in their relation. This similarity in focusing
on in-band dynamic assignments of EIDs to RLOCs positions LISP as a
possible technology to address the pain points identified in our use
case draft. Let us draw out the LISP concepts and the gaps to ROSA
objectives in the following.
4.3.1. Technology Overview
Let us provide a brief overview of LISP and its main concepts - for
more detail, we refer to, e.g., [RFC9299].
LISP introduces two namespaces, separating endpoint identifiers (EID)
from routing locator (RLOC) for a device realizing the service or
resource represented by the EID. The EID may be determined from
mapping services such as the DNS, resolved from other application-
specific identifiers (such as a URL).
Contreras, et al. Expires 10 January 2024 [Page 12]
Internet-Draft ROSA July 2023
Endpoints communicate through their EIDs, sent domain-locally through
an intra-domain routing protocol either to a locally present EID or
to the ingress tunnel router (ITR) of their local domain. The ITR in
turn consults a mapping service [RFC9301] to resolve the EID to an
RLOC of an egress tunnel router (ETR), to which the incoming request
is then sent, while the ETR domain-locally forwards the packet to the
destination EID. LISP uses UDP for ITR-ETR tunnelling as well as for
access the mapping service.
Mapping service resolutions are usually cached at the ITR after
initially being resolved due to an incoming packet request. In
addition to this DNS-like pull operation, a pub/sub extension may
proactively pull EID->RLOC mappings from the mapping service (e.g.,
for planned handovers) or update previously resolved mappings in the
future.
4.3.2. Relation to ROSA
One could position an EID as a service address in ROSA, where the
mapping process in the ITR resembles the endpoint selection. The
proactive pub/sub mapping resolution would allow for changing RLOC
assignments and thus direct EID requests to other ETRs.
4.3.3. Gaps
There are number of key differences and gaps to the desired
properties of a ROSA system:
1. Resolution latency: In its explicit resolution mode, as described
in [RFC9299], LISP is to experience similar latencies as in other
resolution systems. Unlike DNS, the resolution is done, however,
at the ITR, thus not requiring explicit resolution at the client
with subsequent data transfer, therefore reducing the needed
client access link operations. Results from [LISPmon2017] show
early deployment insights for LISP, with resolvers replying to
EID mappings between 400ms and 1400ms. However, pub/sub
extensions to the mapping service [RFC9301] also allow for
reducing those latencies, e.g., proactively placing EID mappings
in ITRs in anticipation of future resolution requests, although
this is subject to suitable management and planning methods to
exist. Equally, for EID mapping updates to previously resolved
EID mappings, the pub/sub extensions may reduce the latency of
future resolution requests. However, scenarios such as those
outlined in Section 3.5 and 3.6 of
[I-D.mendes-rtgwg-rosa-use-cases] are difficult to realize even
with those methods since the frequency of update for per
transaction changes of EID mappings may be achieve through
notification updates to EID mappings due to the network latencies
Contreras, et al. Expires 10 January 2024 [Page 13]
Internet-Draft ROSA July 2023
experienced for the traversal of the EID mapping update to the
respective ITR(s). We can therefore expect that the support for
high dynamicity of service instance changes is likely less in
LISP than what is required in some of our use cases, thus
limiting required the SBR capabilities, while the scheduled mode
of service instance selection in ROSA is expected to allow per
transaction changes.
2. Lack of affinity support: LISP does not have a notion of affinity
to EID selections made for a service transaction, meaning that an
EID->RLOC mapping may change independent from any notion of a
service transaction. This is in contrast to ROSA, where affinity
is signalled directly by the originating endpoint through issuing
a new endpoint selection message, possibly resulting in a
different service instance being selected, with which the
endpoint continues to communicate through the transaction.
Through this, any client and/or flow-specific state is avoided to
exist in the ROSA network elements.
3. Tunnelling all traffic: LISP is a network-level overlay to
separate the EID from RLOCs. As a consequence, ALL traffic from
an originating endpoint to an EID must be tunnelled via the ITR
to the resolved ETR. This is unlike the simpler problem of
identifying a service instance in ROSA, followed by any
subsequent traffic (of a transaction) being sent directly via the
underlying (possibly multi-domain) IP networks, similar to
explicit resolution SBR solutions like DNS. This simplicity is
reflected in less load on the ROSA elements (since only endpoint
selection messages need treatment while no direct endpoint-
instance message will traverse the ROSA element), while also
removing any tunnelling overhead.
4. Deployment as network-independent SBR overlay: LISP extends the
network-level routing capabilities through its separation of
address spaces. It does so, however, by requiring the ITR as a
border gateway to be part of the domain-local network deployment,
turning the otherwise 'LISP unaware' network into a 'LISP-aware'
one, consequently allowing LISP endpoints in this domain to
communicate with other LISP-aware domains. It thus requires the
participation of the local domain in the overall LISP deployment,
still allowing for gradual deployment (through traversing non-
LISP-aware domains through tunnelling) but nonetheless requiring
the endpoint-local domain to be LISP-enabled for using LISP-
enabled services. Proxy-xTRs allow, however, for the
internetworking of LISP-unaware with LISP-aware sites but still
require involvement of the provider edge network and need careful
deployment considerations on EID announcement (to the global
routing system) and placement in the network. This is unlike
Contreras, et al. Expires 10 January 2024 [Page 14]
Internet-Draft ROSA July 2023
ROSA, which is positioned as a L3.5 overlay, thus not requiring
that endpoint-connected domains to participate in the ROSA
service. From a local network perspective, a client sends an
endpoint selection message to what looks like an IP endpoint to
the local domain. Those endpoint selection messages are routed
as true overlay messages, until arriving at an IP-enabled
endpoint that represents the selected service instance, followed
by direct client-instance exchanges for subsequent messages for
the service transaction. Thus, the burden of deployment in local
networks or the need for proxies does not exist here.
5. Service specificity of EID selections: The current methods of
selecting one of possible several EID->RLOC mappings foresee a
priority and weighted mechanism, where those priorities and
weights are driven by the announcer of the EID mapping, with a
direct consequence on how traffic is being steered through the
network. Thus, the objective of those mapping policies are more
focused on traffic distribution although RLOC priorities could
also be driven by service-specific policies. This is unlike the
explicit service specificity of the foreseen ROSA overlay routing
decision, where either a routed or scheduled endpoint selection
process is realized to disconnect the choice of service instance
selection from the network-level policy of steering traffic to
it, as linked to the routing locator of the service instance.
4.4. Application-Layer Traffic Optimization (ALTO)
ALTO, as defined in [RFC7285], provides the ability to select
suitable application-level servers for a client requesting it. It is
thus seemingly aligned with the ROSA anycast problem but there are,
however, very fundamental differences when looking closer:
4.4.1. Technology Overview
ALTO follows other SBR methods in employing an explicit server
discovery step, defined in [RFC7286], thus conceptually aligning with
methods like DNS in that it employs an off-path method.
ALTO also follows more of a recommendation model, where the final
decision is being made by the ALTO client, which of the possible
choices to utilize in the data transfer, while ROSA advocates a ROSA
overlay driven decision.
Moreover, ALTO operates at the application level, currently
supporting HTTP/1, while ROSA advocates the use of any application
(and transport) protocol similar to using the DNS for resolution.
Contreras, et al. Expires 10 January 2024 [Page 15]
Internet-Draft ROSA July 2023
ALTO provides insights into server selection criteria through metric
work, as outlined in [RFC9274] [RFC9241][RFC8895]; work that is
already considered as input to the CATS WG. This consideration
equally applies to ROSA where metrics as well as metric distribution
are not in scope.
4.4.2. Relation to ROSA
Similar to the DNS, detailed in Section 4.1, ALTO provides an
explicit resolution step for selecting HTTP/1-based service instances
from a set of available servers. It thus provides a solution for an
anycast selection albeit limited to HTTP/1-based services. It also
allows for service-specific selection of the final server to be used
through a recommendation model, i.e., providing choices of suitable
servers to the client, which ultimately selects the server. With
this, it differs from the DNS model, where the DNS resolver makes the
ultimate selection.
4.4.3. Gaps
There are number of key differences and gaps to the desired
properties of a ROSA system. Several of those gaps are similar to
those that have already been identified in Section 4.1.3 and also
thus presented only briefly again here:
1. Resolution latency: Similar to other explicit resolution
solutions, ALTO experiences a discovery latency through the
procedures defined in [RFC7285], leading to similar issues
outlined already for the DNS.
2. Acting on stale information: Due to the explicit resolution, the
client, in re-using a previous choice, may in fact act on stale
information in that the previously used server does not represent
the 'best' choice anymore. Only frequent repetition of the
discovery step would avoid this, with similar issues than those
outlined for the DNS.
3. Support dynamic resolution changes: ALTO defines methods for
cost-based selection of (ALTO) servers [RFC9274] as well as
advertising capabilities [RFC9241] and sending server events
impacting the selection [RFC8895]. However, apart from the
latencies involved in updating this information for a renewed and
thus dynamic resolution result, such renewed result can only be
considered in a renewed resolution step, leading back the latency
incurred for doing so; both of which combined does not suffice in
terms of dynamicity, e.g., in the video-related use cases of
Section 3.5 and 3.6 as well as for the mobile application
scenario in Section 3.4 of [I-D.mendes-rtgwg-rosa-use-cases].
Contreras, et al. Expires 10 January 2024 [Page 16]
Internet-Draft ROSA July 2023
4. Support for arbitrary application identifiers (and protocols): As
mentioned before, ALTO supports HTTP/1 only, thus limiting both
application identifiers and protocols to the specific HTTP-based
file sharing, media delivery and real-time comms scenarios that
are outlined in the ALTO problem statement [RFC5693], thus
providing no support for use cases outside the use of HTTP/1.
5. Multi-domain operation: Before the service-level communication
commences, an ALTO client discovers a suitable ALTO server, which
in turn provides guidance on the possible servers (for a
particular service) that may suit the client requirements,
provided as a recommendation to the ALTO client for its ultimate
choosing of the server. As outlined in [RFC7286], the discovery
of the ALTO server is domain-local, while explicit procedures as
defined in [RFC8686] are required for discovering an ALTO server
beyond the current domain. As outlined in the appendix A of
[RFC8686], a possibly multi-domain ALTO deployment would require
steps for discovering (and using) other ALTO servers so as to
enrich the information available to the locally discovered ALTO
server, much akin to the working of the DNS. The approach taken
by ROSA is that of an overlay, employing routing-based methods to
support those services advertised to it (akin to all those
services advertised to the overall ALTO system), while
interconnecting to other ROSA domains and the wider Internet
through an explicit gateway; a capability missing in ALTO.
4.5. Technologies related to SBR
Unlike the solutions in the previous sections, which provide
capabilities to address service-based routing overall, the works in
the next subsections relate to the SBR problem but often only in
parts, which may still be relevant to the wider discussion of
identifying works that may feed into the toolbox for ROSA solutions.
Most of the items on this list were suggested throughout discussions
with community members and they aim at answering their questions on
the relation to ROSA. As such, the list here may or may not increase
in the future.
4.5.1. Service Function Chaining (SFC)
SFC as defined in [RFC7665] allows for chaining the execution of
services at L2 or L3 level, targeting scenarios such as carrier-grade
NAT and others. The work in [RFC8677] extends the chaining onto the
name level, using service names to identify the individual services
of the chain, even allowing combinations of name and L2/L3-based
chains. However, [RFC8677] is tied into a realization of the SFF
(service function forwarder) using a path-based forwarding approach,
Contreras, et al. Expires 10 January 2024 [Page 17]
Internet-Draft ROSA July 2023
thus still relying on an explicit resolution process and therefore
experiencing similar latency and dynamicity issues as DNS, ALTO, and
LISP. The ROSA architecture framework draft includes an early
discussion on how to possible realize name-based SFC without the need
for such explicit resolution, extending the basic functionality of
ROSA to invoke a single chain service.
4.5.2. Multiplexed Application Substrate over QUIC Encryption (MASQUE)
The work in the MASQUE WG aims at developing techniques for stream-
or datagram-based flow multiplexing in a single HTTP connection. For
this, the notion of a 'proxy' [I-D.schinazi-masque-proxy] is proposed
together with CONNECT-UDP and CONNECT-IP primitives to enable this
multiplexing. Typical use cases are tunnelling for increased privacy
or additional encryption. Although QUIC is assumed as the underlying
transport protocol, the WG will consider the working of its
primitives over TCP.
We can foresee the linkage to the proposed ROSA work in utilizing
MASQUE primitives for the in-band signalling of resolution request,
utilizing the CONNECT-IP primitive. This effectively tunnels the
ROSA overlay over MASQUE, possibly improving on deployability. One
key aspect to consider, however, is the support for affinity, i.e.,
only utilizing the MASQUE proxy for initial endpoint selection
requests, then 'transferring' the client-endpoint relation onto a
direct relation, thus removing the proxy from the middle of the
connection for performance improvements and to adhere to the initial
routing policies defined for reaching the locator of the selection
service instance.
4.5.3. Time-Variant Routing (TVR)
The work in the newly established TVR WG addresses the problem of
scheduled, thus predictable changes in routing state within the
network. It plans on utilizing the exposure of agenda information to
feed into the routing protocols for accommodating such predictable
changes.
We can foresee two key linkages to the proposed ROSA work
1. The use of agenda information not just for maintaining route but
possibly also endpoint availability information, which in turn
may feed into the endpoint selection message handling in ROSA.
2. The use of a TVR solution as ROSA overlay routing solutions where
the forwarding of ROSA messages (i.e., the endpoint selection
message), may underlie scheduled and thus predictable changes;
this could even be the case in the use cases currently identified
Contreras, et al. Expires 10 January 2024 [Page 18]
Internet-Draft ROSA July 2023
for TVR (e.g., satellite, mobile devices etc) where those use
cases may experience an anycast semantic for the endpoint
selection.
4.5.4. Source Packet Routing in Networking (SPRING)
Source routing solutions, such as developed in the SPRING WG, allow
for influencing the path across which a packet may traverse to a
final destination. Unlike ROSA, the destination selection itself is
not within scope of such consideration, thus SPRING and similar work
may complement the endpoint selection process of ROSA in that it
provides tools for further determining the path over which a packet
is sent.
5. Requirements
The following requirements for a routing on service addresses (ROSA)
solution (referred to as 'solution' for short) have been identified
from the analysis in the previous section of the use cases provided
in [I-D.mendes-rtgwg-rosa-use-cases].
One commonality of all use cases is the communication with a
'service', realized at one or more network locations as equivalent
'service instances'. Associating the service to an 'owner' is key to
avoid services being announced by fake entities, thus misdirecting
the client's traffic, while obfuscating the purpose of communication
(e.g., leaked through the specific name of a service) but also any
possible policy to select one over another service instance may want
to be kept private; this is likely the case across all of our use
cases. Hence, any solution
REQ1: MUST provide means to associate service instances with a
single service address.
(a) MUST provide secure association of service address to
service owner.
(b) SHOULD provide means to obfuscate the purpose of
communication to intermediary network elements.
(c) MAY provide means to obfuscate the constraint parameters
used for selecting specific service instances.
Across all our use cases, the knowledge of where service instances
(realizing specific services) reside within the network, i.e.,
possibly at different network locations, is crucial for the
communication to happen, at least for the ROSA domain with which the
service has an association with. Such knowledge may be created by a
Contreras, et al. Expires 10 January 2024 [Page 19]
Internet-Draft ROSA July 2023
service management platform, e.g., as part of the overall service
deployment, and thus may not be initiated by the deployed service
instance itself, such as in the example of mobile distributed
applications of Section 3.4 in [I-D.mendes-rtgwg-rosa-use-cases].
Furthermore, service deployment may be delegated to service or CDN
platforms, e.g., in the CDN, AR/VR and video distribution examples of
[I-D.mendes-rtgwg-rosa-use-cases], albeit with linkages needed to the
service routing capabilities of ROSA. Crucially, however, is that a
solution ought to use proactive pushing of suitable reachability
information to service instances into the ROSA system, i.e., pursuing
a routing-based approach, allowing for faster availability of
information to make suitable decisions on which service instance to
choose among those available. Hence, any solution
REQ2: MUST provide means to announce route(s) to specific instances
realizing a specific service address, thus enabling service
equivalence for this set of service instances.
(a) MUST provide scalable means to route announcements.
(b) MUST announce routes within a ROSA domain.
(c) SHOULD provide means to delegate route announcement.
(d) SHOULD provide means to announce routes at other than the
network attachment point realizing the announced service
address.
(e) MUST allow for removing service instances that are
intermittently available, i.e., revoking their service
announcement after a defined timeframe.
A client application may not just invoke services within a single
ROSA domain. While associating with different ROSA domain may be
possible, clients may simply invoke services through their existing
ROSA domain, e.g., for utilizing helper services in examples like
distributed mobile applications (Section 3.4 in
[I-D.mendes-rtgwg-rosa-use-cases]), expecting the service transaction
to be realized regardless. The same goes for invoking services that
may reside in the public Internet, without requiring an explicit
awareness of the client to which ROSA domain (or the public Internet)
to direct the invocation. Thus, any solution
REQ3: MUST provide means to interconnect ROSA islands.
(a) MUST allow for announcing services across ROSA domains.
(b) MUST allow for announcing services outside ROSA domains.
Contreras, et al. Expires 10 January 2024 [Page 20]
Internet-Draft ROSA July 2023
Use cases like distributed mobile applications (Section 3.4 in
[I-D.mendes-rtgwg-rosa-use-cases]) but also video delivery ones such
as for replicated chunk retrieval or AR/VR (Sections 3.5 and 3.6 in
[I-D.mendes-rtgwg-rosa-use-cases], respectively) or the selection of
an appropriate UPF (user plane functions) within a cellular sub-
system (Section 3.2 in [I-D.mendes-rtgwg-rosa-use-cases]), may want
to constrain the selection of 'suitable' service instances through
service-specific constraints, such as the computing load (on the
deployed service instances or their host platforms), service-level
latency, but also, e.g., HW or SW, capabilities. This may also be
the case for multi-homed deployments (see Section 3.3 in
[I-D.mendes-rtgwg-rosa-use-cases]), where constraints on the multi-
connectivity of the service instance may constrain the suitability
for specific clients. Thus any solution
REQ4: Solution MUST provide constraint-based routing capability.
(a) MUST provide means to announce routing constraints
associated with specific service instances and their
realizing networking, computing and storaged resources.
(b) SHOULD allow for providing constraints in the service
(address) announcement.
The work in [OnOff2022] has shown the potential gains in making
runtime decisions for every incoming service transaction, where
transaction lengths may be as small as single (application-level)
requests. For use cases such as for replicated chunk retrieval
(Section 3.5 in [I-D.mendes-rtgwg-rosa-use-cases]) or AR/VR
(Section 3.6 in [I-D.mendes-rtgwg-rosa-use-cases]), this may lead to
significant smoothening of the request completion latency, i.e.,
reducing the latency variance, thus enabling a better, smoother
experience at the client. However, the specific mechanism may vary
and, more importantly, may be highly service-specific, with solutions
such as [CArDS2022] providing a simple weighted round robin, while
other methods may rely on regular (service) metric reporting. Thus
any solution
REQ5: MUST provide an instance selection at ROSA domain ingress
nodes only.
(a) MUST allow for signalling selection mechanism and
necessary input parameters for selection to the ROSA
domain ingress nodes.
Explicit resolution steps, such as those in DNS, GSLB, or Alto,
suffer from the need for an explicit control plane exchange. This
causes additional latency before the data transfer to the chosen
Contreras, et al. Expires 10 January 2024 [Page 21]
Internet-Draft ROSA July 2023
service instance may start. In-band data, i.e., the inclusion of
application-level data in the control messages, is not supported due
to the layering of such solutions at the application level itself.
It is desirable, however, to already allow for the exchange of
application data, including that needed for establishing secure
connections, in the process that determines the most suitable service
instance to further reduce any latency for completing a given
application-level service transaction. Thus any solution
REQ6: MUST provide an in-band data transfer capability in the
process of determining the suitable service instance for any
following data transfer within the same service transaction.
While video delivery use cases like replicated chunk retrieval
(Section 3.5 in [I-D.mendes-rtgwg-rosa-use-cases]) or AR/VR
(Section 3.6 in [I-D.mendes-rtgwg-rosa-use-cases]) may exhibit short
lived transactions of just one (service-level) request, due to the
replicated nature of the video content in each service instance,
service transactions may last many requests after the initial one has
been sent. Ephemeral state may be created during this transaction,
which would require that a change of the (initial) service instance
during a transaction would share such ephemeral state with any new
service instance being used. While service platforms, like K8S,
provide such ability through 'shared data layer' capabilities, those
are often limited to single site deployments. Any support across
sites would incur additional costs or even possibly latencies for
such state sharing, thus often leading to completing an ongoing
service transaction with the service instance that has been
originally been used (note that a service instance in ROSA may use
internal methods for serving incoming requests across which state
sharing would be applied - from a ROSA perspective, however, only one
service instance is being used). We call the capability to retain an
initial selection of a service instance for the length of a service
transaction 'affinity'. Thus, any solution
REQ7: MUST adhere to the affinity towards the service instance
chosen in the initial service request of the service
transaction, thus directing all subsequent service transaction
requests to the same instance.
All of our use cases are likely being deployed over existing network
infrastructure, which makes a consideration to use its existing
solutions in any realization of ROSA very important. Specifically,
any solution
REQ8: Solution SHOULD use IPv6 for the routing and forwarding of
service and affinity requests.
Contreras, et al. Expires 10 January 2024 [Page 22]
Internet-Draft ROSA July 2023
(a) Solution MAY use IPv4 for the routing and forwarding of
service and affinity requests.
Most of our use cases, specifically on distributed mobile
applications (Section 3.4 in [I-D.mendes-rtgwg-rosa-use-cases]) but
also our video delivery examples, may be realized in inherently
mobile settings with clients moving about for their experience.
While mobile IP solutions exist, the service initialization in ROSA
needs to be equally supported in order to allow for invoking ROSA
services on the move. Thus, any solution
REQ9: SHOULD support in-request mobility for a ROSA client.
Mobility of clients, but also varying loads in scenarios of no client
mobility, may also lead to situations where moving on ongoing service
transaction to another service instance may be beneficial, termed
'transaction mobility'. In other words, service instances may be
replaced mid-transaction, in order to ensure the service level
agreement. This may happen if, for instance, the local node where
the service instance was initially installed is running out of
resources, or its accessibility is reduced (which be periodically).
Thus, any solution
REQ10: SHOULD support transaction mobility, i.e., changing service
instances during an ongoing service transaction.
With most service transactions likely being encrypted for privacy and
security reasons, supporting the appropriate transport layer methods
is crucial in all our scenarios in [I-D.mendes-rtgwg-rosa-use-cases].
While work in [OnOff2022] has shown that small service transactions
in scenarios like replicated chunk retrieval (Section 3.5 in
[I-D.mendes-rtgwg-rosa-use-cases]) or AR/VR (Section 3.6 in
[I-D.mendes-rtgwg-rosa-use-cases]) may be beneficial for
significantly reducing the service-level latency, the challenge lies
in initiating suitable transport layer security associations with
frequently changing service instances. Pre-shared certificates may
address this to allow for 0-RTT handshakes being realized but come
with well-known forward secrecy problems. Thus, any solution
REQ11: SHOULD support TLS 0-RTT handshakes without the need for pre-
shared certificates.
We envision the ROSA layer in ROSA endpoints to be transparently
integrated in the operation of transport protocols, and thus
applications, by provuding suitable interfaces to accessing the ROSA
services of a specific ROSA domain. Thus, any solution
Contreras, et al. Expires 10 January 2024 [Page 23]
Internet-Draft ROSA July 2023
REQ12: SHOULD be transparent to applications in order to ensure a
smooth deployment.
6. Benefits from Addressing the SBR Problem
We expect the following benefits to be realized through providing a
solution to the problem statement presented in
[I-D.mendes-rtgwg-rosa-use-cases]:
* Remove explicit resolution latency: Current service-based routing
utilises a an explicit resolution step with explicit off-path
operations before being able to utilise a specific service, thus
incurring an additional latency for requesting the resolution and
receiving its result. We aim at significantly reducing, even
removing this latency. The work in [OnOff2022] outlines the
possible impact of such reduction, while also evaluating the
capabilities enabled by a flexible (small affinity) traffic
steering under the constraint of a given latency budget that is
now been enabled by the smaller endpoint selection latency.
* Dynamicity: Decisions to select one out of possibly many service
instance can be highly dynamic, done per service transaction,
including for single service requests even. This is enabled by
the move from an explicit off-path resolution step to an in-band
mapping of a service address to its realizing service instance.
Such dynamicity aims at improving transaction completion latency
and variance, balancing load across service instances, as well as
possibly deal with temporary network conditions. The work in
[OnOff2022] evaluates the impact of performing traffic steering
decisions through such in-based rather than explicit resolution
approaches.
* Service-specificity: The constraints for selecting a suitable
service instance should not be limited to network metrics like
delay or bandwidth. Instead, services should be able to define
service-specific constraints, allowing for either multi-optimality
routing or realising request-level and possibly compute-aware
request scheduling for selecting one of possibly several service
endpoints. The mechanism in [CArDS2022] outlines an example for
such steering decisions, taking into account service-specific
compute information. However, to avoid embedding full path
information into the service-based routing itself, the
consideration of service-specific constraints should be limited to
the selection of service instances, while the forwarding of
transaction data (in the form of subsequent affinity requests)
solely follows the routing policies defined by the underlay
network, similar to the workings of the DNS today.
Contreras, et al. Expires 10 January 2024 [Page 24]
Internet-Draft ROSA July 2023
* Avoiding in-network state: Mimicking the workings of the DNS, ROSA
seeks to keep any transaction state management entirely at the
endpoint, i.e., it is the endpoint that explicitly invokes the
(now in-band) endpoint selection, followed by end-to-end data
transfer throughout the transaction. This avoids the need for any
in-network or edge component to manage client- and flow/
transaction- specific state, such as envisioned in the CATS
architecture framework [I-D.ldbc-cats-framework] when relying on
explicit tunnel endpoints. This creates a deployment dependency
only for the endpoint selection itself, much like when using the
existing DNS, while any subsequent data transfer (within the
transaction) runs directly over the (possibly many) IP networks
that the IP packets will traverse, likely easing deployment of any
ROSA solution.
* Efficiently support higher degree of service distribution: Typical
application or also L4-level solutions, such as GSLB, QUIC-based
indirection, and others, lead effectively to egress hopping when
performed in a multi-site deployment scenario in that the client
request will be routed first to an egress as defined either
through the DNS resolution or the indirection through a central
server, from which the request is now resolved or redirected to
the most appropriate DC site. In deployments with a high degree
of distribution across many (e.g., smaller edge computing) sites,
this leads to inefficiencies through path stretch and additional
signalling that will increase the request completion time.
Instead, direct or on-path solutions such as ROSA are expected to
lead to a more direct traffic towards the site where the service
will eventually be executed, while also allowing for application
data to be already carried as part of the service instance
selection process, thus keeping the request completion time close
to its optimum in respect to the best site being used for
execution of the request.
* Bring application namespace closer to communication relations:
Reid et al [Namespaces2022] outline insights into the aspects and
pain points experienced when deploying existing intra-DC service
platforms in multi-site settings, i.e., networked over the
Internet. The main takeaway in is the lacking protocol support
for routing requests of microservices that would allow for mapping
application onto network address spaces without the need for
explicitly managed mapping and gateway services. While this
results in management overhead and thus costs, efficiency of such
additional mapping and gateway services is also seen as a
hinderance in scenarios with highly dynamic relationships between
distributed microservices, an observation aligned with the
findings in [OnOff2022]. The use cases presented in
[I-D.mendes-rtgwg-rosa-use-cases], among others, exhibit the
Contreras, et al. Expires 10 January 2024 [Page 25]
Internet-Draft ROSA July 2023
degrees of distribution in which relationship management (through
explicit mapping and/or gatewaying) may become complex and a
possible hinderance for service deployment and suitable
performance.
7. Conclusions
This draft provided a gap analysis of existing methods for service-
based routing in relation to the issues and pain points identified in
[I-D.mendes-rtgwg-rosa-use-cases].
Furthermore, we outlined requirements to fill those gaps in possible
realizations, a first of which is being described in a companion
document as the ROSA architecture.
8. Security Considerations
To facilitate the decision between service information (i.e., the
service address) and the IP locator of the selected service instance,
information needs to be provided to the ROSA service address routers.
This is similar to the process of resolving domain names to IP
locators in today's solutions, such as the DNS. Similar to the
latter techniques, the preservation of privacy in terms of which
services the initiating client is communicating with, needs to be
preserved against the traversing underlay networks. For this,
suitable encryption of sensitive information needs to be provided as
an option. Furthermore, we assume that the choice of ROSA overlay to
use for the service to locator mapping is similar to that of choosing
the client-facing DNS server, thus we assume it being configurable by
the client, including to fall back using the DNS for those cases
where services may be announced to ROSA methods and DNS-like
solutions alike.
9. IANA Considerations
This draft does not request any IANA action.
10. Acknowledgements
Many thanks go to Ben Schwartz, Luigi Iannone, Mohamed Boucadair,
Tommy Pauly, Joel Halpern, Daniel Huang, and Russ White for their
comments to the text to clarify several aspects of the motiviation
for and technical details of ROSA.
11. Informative References
Contreras, et al. Expires 10 January 2024 [Page 26]
Internet-Draft ROSA July 2023
[CArDS2022]
Khandaker, K., Trossen, D., Khalili, R., Despotovic, Z.,
Hecker, A., and G. Carle, "CArDS:Dealing a New Hand in
Reducing Service Request Completion Times", Paper IFIP
Networking, 2022.
[GSLB] "What is GSLB?", Technical Report Efficient IP, 2022,
<https://www.efficientip.com/what-is-gslb/>.
[I-D.jennings-moq-quicr-arch]
Jennings, C. F. and S. Nandakumar, "QuicR - Media Delivery
Protocol over QUIC", Work in Progress, Internet-Draft,
draft-jennings-moq-quicr-arch-01, 11 July 2022,
<https://datatracker.ietf.org/doc/html/draft-jennings-moq-
quicr-arch-01>.
[I-D.ldbc-cats-framework]
Li, C., Du, Z., Boucadair, M., Contreras, L. M., Drake,
J., Huang, D., and G. S. Mishra, "A Framework for
Computing-Aware Traffic Steering (CATS)", Work in
Progress, Internet-Draft, draft-ldbc-cats-framework-02, 22
June 2023, <https://datatracker.ietf.org/doc/html/draft-
ldbc-cats-framework-02>.
[I-D.mendes-rtgwg-rosa-use-cases]
Mendes, P., Finkhäuser, J., Contreras, L. M., and D.
Trossen, "Use Cases and Problem Statement for Routing on
Service Addresses", Work in Progress, Internet-Draft,
draft-mendes-rtgwg-rosa-use-cases-00, 26 June 2023,
<https://datatracker.ietf.org/doc/html/draft-mendes-rtgwg-
rosa-use-cases-00>.
[I-D.schinazi-masque-proxy]
Schinazi, D., "The MASQUE Proxy", Work in Progress,
Internet-Draft, draft-schinazi-masque-proxy-00, 13 March
2023, <https://datatracker.ietf.org/doc/html/draft-
schinazi-masque-proxy-00>.
[I-D.yao-cats-gap-reqs]
Yao, K., Jiang, T., Eardley, P., Trossen, D., Li, C., and
D. Huang, "Computing-Aware Traffic Steering (CATS) Gap
Analysis and Requirements", Work in Progress, Internet-
Draft, draft-yao-cats-gap-reqs-00, 3 March 2023,
<https://datatracker.ietf.org/doc/html/draft-yao-cats-gap-
reqs-00>.
Contreras, et al. Expires 10 January 2024 [Page 27]
Internet-Draft ROSA July 2023
[LISPmon2017]
Li, Y., Iannone, L., and D. Saucez, "LISP-Views:
Monitoring LISP at Large Scale", Paper 29th International
Teletraffic Congress (ITC 29), 2017.
[Namespaces2022]
Reid, A., Eardley, P., and D. Kutscher, "Namespaces,
Security, and Network Addresses", Paper ACM SIGCOMM
workshop on Future of Internet Routing and Addressing
(FIRA), 2022.
[OnOff2022]
Khandaker, K., Trossen, D., Yang, J., Despotovic, Z., and
G. Carle, "On-path vs Off-path Traffic Steering, That Is
The Question", Paper ACM SIGCOMM workshop on Future of
Internet Routing and Addressing (FIRA), 2022.
[RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
November 1987, <https://www.rfc-editor.org/info/rfc1035>.
[RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic
Optimization (ALTO) Problem Statement", RFC 5693,
DOI 10.17487/RFC5693, October 2009,
<https://www.rfc-editor.org/info/rfc5693>.
[RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762,
DOI 10.17487/RFC6762, February 2013,
<https://www.rfc-editor.org/info/rfc6762>.
[RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service
Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
<https://www.rfc-editor.org/info/rfc6763>.
[RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S.,
Previdi, S., Roome, W., Shalunov, S., and R. Woundy,
"Application-Layer Traffic Optimization (ALTO) Protocol",
RFC 7285, DOI 10.17487/RFC7285, September 2014,
<https://www.rfc-editor.org/info/rfc7285>.
[RFC7286] Kiesel, S., Stiemerling, M., Schwan, N., Scharf, M., and
H. Song, "Application-Layer Traffic Optimization (ALTO)
Server Discovery", RFC 7286, DOI 10.17487/RFC7286,
November 2014, <https://www.rfc-editor.org/info/rfc7286>.
Contreras, et al. Expires 10 January 2024 [Page 28]
Internet-Draft ROSA July 2023
[RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
Chaining (SFC) Architecture", RFC 7665,
DOI 10.17487/RFC7665, October 2015,
<https://www.rfc-editor.org/info/rfc7665>.
[RFC8677] Trossen, D., Purkayastha, D., and A. Rahman, "Name-Based
Service Function Forwarder (nSFF) Component within a
Service Function Chaining (SFC) Framework", RFC 8677,
DOI 10.17487/RFC8677, November 2019,
<https://www.rfc-editor.org/info/rfc8677>.
[RFC8686] Kiesel, S. and M. Stiemerling, "Application-Layer Traffic
Optimization (ALTO) Cross-Domain Server Discovery",
RFC 8686, DOI 10.17487/RFC8686, February 2020,
<https://www.rfc-editor.org/info/rfc8686>.
[RFC8895] Roome, W. and Y. Yang, "Application-Layer Traffic
Optimization (ALTO) Incremental Updates Using Server-Sent
Events (SSE)", RFC 8895, DOI 10.17487/RFC8895, November
2020, <https://www.rfc-editor.org/info/rfc8895>.
[RFC9241] Seedorf, J., Yang, Y., Ma, K., Peterson, J., and J. Zhang,
"Content Delivery Network Interconnection (CDNI) Footprint
and Capabilities Advertisement Using Application-Layer
Traffic Optimization (ALTO)", RFC 9241,
DOI 10.17487/RFC9241, July 2022,
<https://www.rfc-editor.org/info/rfc9241>.
[RFC9274] Boucadair, M. and Q. Wu, "A Cost Mode Registry for the
Application-Layer Traffic Optimization (ALTO) Protocol",
RFC 9274, DOI 10.17487/RFC9274, July 2022,
<https://www.rfc-editor.org/info/rfc9274>.
[RFC9299] Cabellos, A. and D. Saucez, Ed., "An Architectural
Introduction to the Locator/ID Separation Protocol
(LISP)", RFC 9299, DOI 10.17487/RFC9299, October 2022,
<https://www.rfc-editor.org/info/rfc9299>.
[RFC9301] Farinacci, D., Maino, F., Fuller, V., and A. Cabellos,
Ed., "Locator/ID Separation Protocol (LISP) Control
Plane", RFC 9301, DOI 10.17487/RFC9301, October 2022,
<https://www.rfc-editor.org/info/rfc9301>.
Authors' Addresses
Luis M. Contreras
Telefonica
Ronda de la Comunicacion, s/n
Contreras, et al. Expires 10 January 2024 [Page 29]
Internet-Draft ROSA July 2023
Sur-3 building, 1st floor
28050 Madrid
Spain
Email: luismiguel.contrerasmurillo@telefonica.com
URI: http://lmcontreras.com/
Dirk Trossen
Huawei Technologies
80992 Munich
Germany
Email: dirk.trossen@huawei.com
URI: https://www.dirk-trossen.de
Jens Finkhaeuser
Interpeer gUG
86926 Greifenberg
Germany
Email: ietf@interpeer.io
URI: https://interpeer.io/
Paulo Mendes
Airbus
82024 Taufkirchen
Germany
Email: paulo.mendes@airbus.com
URI: http://www.airbus.com
Contreras, et al. Expires 10 January 2024 [Page 30]