Internet-Draft | Collaborative Host/Network Signaling: Us | March 2024 |
Rajagopalan, et al. | Expires 5 September 2024 | [Page] |
Host-to-network (and vice versa) signaling can improve the user experience by informing the network which flows are more important and which packets within a flow are more important without having to disclose the content of the packets being delivered. The differentiated service may be provided at the network (e.g., packet discard preference), the sender (e.g., adaptive transmission or session migration), or through cooperation of both the host and the network.¶
This document outlines a set of use-cases that highlight the need for a mechanism to share metadata about flows between a host and its network in order to enable different traffic treatment. Such a mechanism is typically implemented using a signaling protocol between the host and a set of trusted netwrok elements.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://danwing.github.io/signaling-use-cases/draft-wing-tsvwg-signaling-use-cases.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-rwbr-tsvwg-signaling-use-cases/.¶
Discussion of this document takes place on the Transport and Services Working Group Working Group mailing list (mailto:tsvwg@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/tsvwg/. Subscribe at https://www.ietf.org/mailman/listinfo/tsvwg/.¶
Source for this draft and an issue tracker can be found at https://github.com/danwing/signaling-use-cases.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 5 September 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Bandwidth constraints exist most predominantly at the access network (e.g., radio access networks). Users who are serviced via these networks use various hosts which run various applications; each having different connectivity needs for an optimal user experience. These needs are not frozen but change over time depending on the application and even depending on how an application is used (e.g., user's preferences).¶
The simple network diagram below shows where such bandwidth and performance constraints usually exist with a "B" (for Bottleneck). Other network bottlenecks may be experienced in other segments not shown in the figure, such as interconnection links or the infrastructure that hosts the service (e.g., flash crowds). A bottleneck may be limited in time, present or not regular patters, etc.¶
Complications that are induced by such phenomena may be eliminated by adequate dimensioning and upgrades. However, such upgrades may not be always immediately possible or economically justified.¶
Complementary mitigations are thus needed to soften these complications by introducing some collaboration between hosts and networks to adjust their behaviors.¶
For traffic sent in either direction, the network network elements that terminate a bandwidth constraining link (or located few hops next to that element) can be fed with flow metadata. Such augmentation allows those network elements to make autonomous decisions to prioritize, delay, or drop packets, especially when performing reactive resource management. Absent such metadata, these network elements have no means to guide the enforcement of the reactive resource policy.¶
There are several challenges with this metadata augmentation:¶
for hosts: which data to share without privacy breach or lowering confidentiality.¶
for network elements:¶
The metadata signals from a content provider are more likely to be authentic (if adequate authorization/validation are in place) but the metadata signals from other hosts may be "wrong", undesired by the peer host, or maliciously contain improper metadata. Attempts to automate identification of content providers have included HTTP "Host" header inspection and TLS SNI inspection which are expected to fail as encrypted SNI and privacy-enhancing proxies become more prevalent. Another mechanism to authorize metadata signals from a content provider is to configure the ISP equipment with the content network's source IP addresses (or other labels that may be visible on the packets) and provide a differentiated service to the traffic that match these criteria. However, such an arrangement may have scalability issues. An approach to mitigate these issues is to limit the target contents networks and networks that would put in place these arrangements. Such limitations would benefit large players (large ISPs and large content network) and disadvantages small players (and new players). A more egalitarian approach would provide the same benefit to all parties -- large and small -- and also provide richer signaling to further improve user experience and metadata interoperability. This would allow all parties to become part of the "Internet fast lane".¶
The authorization problem exists with technologies as relatively simple as DiffServ and the problem persists with many other recently discussed metadata signaling mechanisms, including embedding information in the UDP payload ([I-D.trammell-plus-spec]), UDP options ([I-D.kaippallimalil-tsvwg-media-hdr-wireless]), overloading the IPv6 Flow Label ([I-D.cc-v6ops-wlcg-flow-label-marking], and Hop-by-Hop Options. One mechanism suggested occasionally is to encrypt or integrity protect the metadata with a key; such a key could be established using a signaling protocol, see Section 6.2.¶
There is some consensus that applications can benefit by collaborative signaling the network ([IAB], [ATIS]). This document provides use-cases to further detail the need of such signaling.¶
This document does not intend to define any signaling protocol nor call whether a new signaling protocol, a new extension, one or more signaling protocols are needed.¶
However, this document provides a reference to digest the intended benefits for enabling collaborating between hosts and networks. These benefits are yet to be backed up with more evidence. Some experimental work would be reasonable to be endorsed by the IETF to solicit more feedback and collect assess the benefits under various setups.¶
network policy such as (monthly) bandwidth quota or bandwidth limit, or quality (delay and/or jitter)) assurances.¶
network reactions to congestion events, with very short to very long durations (e.g., varying wireless and mobile air interface conditions).¶
Figure 1 depicts examples of approaches to establish channels to convey and share metadata between hosts, networks, and servers.¶
Metadata exchanges can occur in one single direction or both directions of a flows.¶
The client-centric metadata sharing approach because it preserves privacy and also takes advantage of clients having a full view on their available network attachments.¶
Certain flows being received by a host (or by an application on a host) are less or more important than other flows of the same host. For example, a host downloading a software update is generally considered less important than another host doing interactive audio/video or gaming. By signaling the relative importance of flows to a network element, the network element can (de-)prioritize those flows to best accomodate the needs of the various applications (on a same host) and between hosts on a network.¶
Interactive Audio/Video has long been using [RTP] which runs over UDP. As described in Section 2.3.7.2 of [RFC7478], there is value in differentiating between voice, video and data. Today's video streaming is exclusively over TCP but will migrate to QUIC and eventually is likely to support unreliable transport ([RFC9221], [I-D.kpugin-rush]). With unreliable transport of video in RTP or QUIC, it is beneficial to differentiate the important video keyframes from other video frames. Other applications such as gaming and remote desktop also benefit from differentiating their packets to the network.¶
Many of these flows do not originate from a content provider's network. Thus, the flows originate from an IP address that is not known before connection establishment, so there needs to be a way for the client to authorize the network elements to receive and hopefully to honor the metadata of those packets.¶
Streaming video contains the occasional key frame ("i-frame") containing a full video frame. These are necessary to rebuild receiver state after loss of delta frames. The key frames are therefore more critical to deliver to the receiver than delta frames.¶
Streaming video also contains audio frames which can be encoded separately and thus can be signaled separately. Audio is more critical than video for almost all applications, but its importance (relative to other packets in the flow) is still an application decision. In the example below, the audio is more important than video (importance=high, PT=keep, RU=reliable), video key frames have middle importance (importance=low, PT=discard, RU=reliable), and both types of video delta frames (P-frame and B-frame) have least importance (importance=low, PT=discard, RU=unreliable).¶
Video Streaming Metadata:¶
Traffic type | Importance | PacketNature | PacketType |
---|---|---|---|
video I-frame (key frame) | low | realtime | reliable |
video delta P-frame | low | discard | unreliable |
video delta B-frame | low | discard | unreliable |
audio | high | realtime | reliable |
Examples: VoIP, gaming.¶
Requirement: Signal the flow needs low jitter and low delay. However, the network can only provide a limited amount of low jitter/low delay to each host, maybe as few as one. This requires signaling feedback indicating that low jitter and low delay flows are already subscribed to other hosts. In response, the user and the application will likely continue, occasionally re-attempting to get the desired quality of service from the network.¶
In many scenarios a game or VoIP application will want to signal different metadata for the same type of packet in each direction. For example, for a game, video in the server-to-client direction might be more important than audio, whereas input devices (e.g., keystrokes) might be more important than audio.¶
Both gaming (video in both directions, audio in both directions, input devices from client to server) and interactive audio/video (VoIP, video conference) involves important traffic in both directions -- thus is a slightly more complicated use-case than the previous example. Additionally, most Internet service providers constrain upstream bandwidth so proper packet treatment is critical in the upstream direction.¶
Metadata:¶
Based on metadata types listed in the [I-D.rwbr-sconepro-flow-metadata-], the host to network metadata parameters for interactive media type will as given below.¶
Interactive A/V, downstream Metadata:¶
Traffic type | Importance | PacketNature | PacketType |
---|---|---|---|
video key frame | low | realtime | reliable |
video delta frame | low | discard | unreliable |
audio | high | realtime | reliable |
Traffic type | Importance | PacketNature | PacketType |
---|---|---|---|
video key frame | low | realtime | reliable |
video delta frame | low | discard | unreliable |
audio | high | realtime | reliable |
Many interactive audio/video applications also support sharing the presenter's screen, file, video, or pictures. During this sharing the presenter's video is less important but the screen or picture is more important. This change of imporance can be conveyed in metadata to the network, as in the table below:¶
Interactive A/V, upstream Metadata:¶
Traffic type | Importance | PacketNature | PacketType |
---|---|---|---|
video key frame | low | realtime | reliable |
video delta frame | low | discard | unreliable |
audio | high | realtime | reliable |
picture sharing | high | realtime | reliable |
In many scenarios a game or VoIP application will want to signal different metadata for the same type of packet in each direction. For example, for a game, video in the server-to-client direction might be more important than audio, whereas input devices (e.g., keystrokes) might be more important than audio.¶
Todo: this section on cooperation needs editing.¶
Examples: backup/restore, software update, RSS feed update, email, printing to a print server¶
Requirement: Signal the flow as below best-effort.¶
Metadata:¶
Traffic type | Importance | PacketNature | PacketType | Comments |
---|---|---|---|---|
File copy | low | bulk | reliable | |
Printing | high | bulk | reliable |
Examples: Desktop Virtualization, Office software in the cloud (editing local files, typing is interactive while save operation is bulk transfer)¶
Requirement: Signal flow will vary depending on the nature of the packet. With variety of traffic going through the session, some packets can contain interactive traffic while the others contain bulk transfer. There can be combination of reliable and unreliable traffic within the same session through multiple streams. Host-to-network signaling plays a vital role in effectively routing mixed traffic for ideal user interactivity and network performance.¶
Example packet metadata for Desktop Virtualization (like Citrix Virtual Apps and Desktops - CVAD) application. This is shown in two tables, client-to-server traffic (Table 6) and server-to-client traffic (Table 7).¶
Remote Desktop Virtualization Metadata:¶
Traffic type | Importance | PacketNature | PacketType | Comments |
---|---|---|---|---|
User typing | high | realtime | reliable | |
Mouse click/End Position | high | realtime | reliable | The start and endpoint of the pointer movement is vital to ensure user action is completed correctly. So, the endpoints have to be reliably transmitted with real-time priority. ** |
Interactive audio | high | keep | unreliable | |
Authentication - Finger print, smart card | low | realtime | reliable | |
Interactive video key frame | low | keep | unreliable | Video key frames form the base frames of a video upon which the next 'n' timeframe of video updates is applied on. These frames, are hence, critical and without them, the video would not be coherent until the next critical frame is received. Retransmits of these are harmful to the UX. *** |
Mouse position tracking | low | discard | unreliable | When the pointer is moved from one point to another, the coordinates of the pointers between the two points can be lost without much of an impact to the UX as long as the start and endpoint reaches. This would ensure the user action is completed, even if the experience seems glitchy. |
Interactive video delta frame | low | discard | unreliable |
Traffic type | Importance | PacketNature | PacketType | Comments |
---|---|---|---|---|
Glyph critical | high | realtime | reliable | The frames that form the base for the image is more critical and needs to be transmitted as reliably as possible. Retransmits of these are harmful to the UX.** |
Interactive (or streaming) audio | high | keep | unreliable | |
Haptic feedback | high | discard | unreliable | Virtualizing haptic feedback is real-time and high importance although the feedback being delivered late is of no use. So dropping the packet altogether and not retransmitting it makes more sense |
Interactive (or streaming) video key frame | low | keep | unreliable | Video key frames form the base frames of a video upon which the next 'n' timeframe of video updates is applied on. These frames, are hence, critical and without them, the video would not be coherent until the next critical frame is received. Retransmits of these are harmful to the UX. *** |
File copy | low | bulk | reliable | |
Interactive (or streaming) video predictive frame | low | discard | unreliable | Video predictive frames can be lost, which would result in minor glitch but not compromise the user activity and video would still be coherent and useful. The reception of subsequent video key frame would mitigate the loss in quality caused by lost predictive frames. |
Glyph smoothing | low | discard | Unreliable | The smoothing elements of the glyph can be lost and would still present a recognizable image, although with a lesser quality. Hence, these can be marked as loss tolerant as the user action is still completed with a small compromise to the UX. Moreover, with the reception of the next glyph critical frame would mitigate the loss in quality caused by lost glyph smoothing elements. |
*** A video key frame should be handled differently by the network depending on a streaming application versus a remote desktop application. The video streaming application's primary and only nature of traffic is video and audio. In contrast, a remote desktop application might be playing a video and its associated audio while at the same time the user is editing a document. The user's keystrokes and those glyphs need to be prioritized over the video lest the user think their inputs are being ignored (and type the same characters again). Hence, the values are different even for the same nature of traffic but a different application.¶
There are cases (crisis) where "normal" network resources cannot be used at maximum and, thus, a network would seek to reduce or offload some of the traffic during these events -- often called 'reactive traffic policy'. An example of such sue case is cellular networks that are overly used (and radio resources exhausted) while alternative network attachment networks are available to host.¶
Network-to-host signals are useful to put in place adequate traffic distribution policies (e.g., prefer the use of alternate paths, offload a network).¶
It is important that not every flow be prioritized; otherwise, the network devolves into the best-effort network that existed prior to metadata signaling. It is a requirement that mechanisms exist to prevent this occurrence.¶
Such a mechanism might be simple, for example, a cellular network might allow one flow from a subscriber to declare itself as important; other flows with that subscriber are denied attempts to prioritize themselves. The mechanism might be more complex where authentication and authorization is performed by an enterprise network which, itself, decides which flows are important based on its policy and only the enterprise network communicates flow priorities to the ISP network. The enterprise might prioritize certain users (e.g., IT staff), certain equipment (audio/video equipment in a conference room), or whatever its policies it might want.¶
Various proposals have suggested establishing a key to validate per-packet metadata or to decrypt per-packet metadata. However, most proposals have not specified how this key would be established. A signaling protocol from the receiving host to its ISP could establish such a key. The host can then convey the key to the sending host to use to integrity protect or encrypt the per-packet metadata.¶
Note: The CPU overhead of validating or decrypting such per-packet metadata needs to be carefully considered (and further assessed via experiments) by the signaling protocol proposing such keying. Also, the required operational setup should be documented.¶
The sender has to convey metadata in a way that is understood by the various network elements on the path -- each of which might be operated by different entities and have different capabilities. For example, the Wi-Fi access point might be operated by an enterprise network, hotel, or home user, whereas the upstream router is operated by the ISP. Each of those might support different versions of the same metadata, or might need the metadata expressed in different ways.¶
The signaling protocol would provide a way to learn the needs of those networks, and provide metadata signaling satisfying most or all of their needs.¶
TODO summary.¶
TODO Security¶
This document has no IANA actions.¶
TODO acknowledge.¶