Internet DRAFT - draft-jennings-moq-usages
draft-jennings-moq-usages
Network Working Group C. Jennings
Internet-Draft S. Nandakumar
Intended status: Informational M. Zanaty
Expires: 11 January 2024 Cisco
10 July 2023
MOQ Usages for audio and video applications
draft-jennings-moq-usages-00
Abstract
Media over QUIC Transport (MOQT) defines a publish/subscribe based
unified media delivery protocol for delivering media for streaming
and interactive applications over QUIC. This specification defines
details for building audio and video applications over MOQT, more
specifically, provides information on mapping application media
objects to the MOQT object model and possible mapping of the same to
underlying QUIC transport.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 11 January 2024.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction
1.1. Requirements Notation and Conventions
2. MOQT QUIC Mapping
2.1. Stream per MOQT Group
2.2. Stream per MOQT Object
2.3. Stream per MOQT Track
2.4. Stream per Priority
2.5. Stream per multiple MOQT Tracks
3. MoQ Audio Objects
4. MoQ Video Objects
4.1. Encoded Frame
4.2. Encoded Slice
4.3. CMAF Chunk
4.4. CMAF Fragment
5. MOQT Track
5.1. Single Quality Media Streams
5.2. Multiple Quality Media Streams
5.2.1. Simulcast
5.2.2. Scalable Video Coding (SVC)
5.2.3. k-SVC
6. Object and Track Priorities
7. Relay Considerations
8. Bitrate Adaptation
9. Usage Mode identification
10. References
10.1. Normative References
10.2. Informative References
Appendix A. Security Considerations
Appendix B. IANA Considerations
Appendix C. Acknowledgments
Authors' Addresses
1. Introduction
Media Over QUIC Transport (MOQT) [MoQTransport] allows set of
publishers and subscribers to participate in the media delivery over
QUIC for streaming and interactive applications. The MOQT
specification defines the necessary protocol machinery for the
endpoints and relays to participate, however, it doesn't provide
recommendations for media applications on using MOQT object model and
mapping the same to underlying QUIC transport.
This document introduces MOQT's object model mapping to underlying
QUIC transport in Section 2. Section 3 and Section 4 describe
various grouping of application level media objects and their mapping
to the MOQT object model. Section 5.2 discusses considerations when
using multiple quality video applications, such as simulcast and/or
layer coding, over the MOQT protocol.Section 8 describes
considerations for adaptive bitrate techniques and finally the
Section 6 discusses interactions when priorities are used on objects
and tracks.
Below picture captures the conceptual model showing mapping at
various levels of a typical media application stack using MOQT
delivery protocol.
+------------------------------+
| Application Data | ----+ frames, slices, segments
+---------------+--------------+ |
| v
| +-------------------------------+
| | Tracks, Groups, Objects |
| +-------------------------------+
+--------------v---------------+
| MOQT Object Model |
+--------------+---------------+
| +----------------------------------------+
| |Stream per Group, Stream per Object, .. |
| +----------------------------------------+
+---------------v--------------+
| QUIC |
+------------------------------+
1.1. Requirements Notation and Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD","SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
2. MOQT QUIC Mapping
In a typical MOQT media application, the captured media from a media
source is encoded (compressed), encrypted (based on the encryption
scheme), packaged (based on the container format) and mapped onto
MOQT object model. Applications (such as Media producers, Relays)
deliver MOQT objects over QUIC transport by choosing the mapping that
is appropriate based on the context.
|
Encoded (and/or) Encrypted Media Stream
V
+-------------------+
| MOQT |
+-------------------+
|
MOQ Tracks
V / ===== QUIC Stream per Group
+---------------------+ |
| QUIC Transport | --| ===== QUIC Stream per Object
+---------------------+ |
| ===== QUIC Stream per Track
|
| ===== QUIC Stream per multiple Tracks
|
\ ===== QUIC Stream per Priority
Subsections below describes a few possibilities to consider when
mapping MOQT objects to QUIC Streams.
2.1. Stream per MOQT Group
In this mode, an unidirectional QUIC stream is setup per MOQT Group
(section 2.2 of [MoQTransport]). Following observations can be made
about such a setup:
* MOQT groups typically represent things that share some kind of
relationship (eg decodability, priority) and having the objects of
a group share the same underlying stream context allows them to
delivered coherently.
* Media consumers can map each incoming QUIC stream into decoding
context in the order of their arrival per group.
* Since the objects within group share the QUIC stream, there is
likelihood of increased end to end latency due to head-of-line
blocking under losses.
2.2. Stream per MOQT Object
In this mode, an unidirectional QUIC stream is setup per MOQT Object
(section 2.1 of [MoQTransport]). Following observations can be made
about such a setup:
* Using a single stream per object can help reduce latency at the
source especially when objects represent smaller units of the
application data (say a single encoded frame).
* The impact of on-path losses to the end to end latency is scoped
to the object duration. The smaller the object duration, lesser
the impact on the end to end latency.
* One stream per object may end up in creating large number of
streams, especially when the objects durations is small.
* Media consumers may need to re-organize the incoming streams for
handling objects within a group, since streams may arrive out of
order.
2.3. Stream per MOQT Track
In this mode, there is one unidirectional QUIC stream per MOQT Track
(section 2.3 [MoQTransport]). Following observations can be made
about such a setup:
* This scheme is the simplest in its implementation and the streams
stays active until the track exists. Endpoints need to maintain
just one stream context per track.
* Since all the objects within the track share the same stream, the
may be impact on end to end latency due to HOL blocking under
loss.
2.4. Stream per Priority
In this mode, there is one unidirectional QUIC stream per MOQT Track/
Object priority. Following observations can be made about such a
setup:
* This scheme is relatively simpler in its implementation and number
of stream contexts match the number of priority levels.
* Such a scheme can be used at Relays when forwarding decisions can
be naturally mapped to priorities carried in the object header.
2.5. Stream per multiple MOQT Tracks
This mode is similar Section 2.3 but with more than one MOQT track
delivered over a single unidirectional QUIC stream, thus allowing
implementations to map multiple incoming QUIC streams to a few
outgoing QUIC Streams. Similar to Section 2.3, this mode is
relatively simple in its implementation and at the same time may
suffer from latency impacts under losses.
3. MoQ Audio Objects
Each chunk of encoded audio data, say 10ms, represents a MOQ Object.
In this setup, there is one MOQT Object per MOQT Group, where, the
Group Sequence in the object header is increment by one for each
encoded audio chunk and the Object Sequence is defaulted to value 0.
When mapped to the underlying QUIC Stream, each such unitary group is
sent over individual unidirectional QUIC stream (similar to Section
2.2/Section 2.1).
Future sub 2 kbps audio codecs may take advantage of a rapidly
updated model that are needed to decode the audio which could result
in audio needing to use groups with multiple objects to ensure all
the objects needed to decode some audio are in the same group.
4. MoQ Video Objects
The decision on what constitutes a MOQ object/group and its preferred
mapping to the underlying QUIC transport for video streams is
governed by the granularity of encoded bitstream, as chosen by the
application. The smallest unit of such an application defined
encoded bitstream will be referred to as "Video Atom" in this
specification and they are mapped 1:1 to MOQ Objects.
The size and duration of a video atom is application controlled and
follows various strategies driven by application requirements such as
latency, quality, bandwidth and so on.
Following subsections identify various granularities defining the
video atoms and their corresponding mapping to MOQT object model and
the underlying QUIC transport.
4.1. Encoded Frame
In this scheme, the video atom is a single encoded video frame. The
Group Sequence is incremented by 1 at IDR Frame boundaries. The
Object Sequence is increment by 1 for each video frame, starting at 0
and resetting to 0 at the start of new group. The first video frame
(Object Sequence 0) should be IDR Frame and the rest of the video
frames within a MOQT group are typically dependent frames (delta
frames) and organized in the decode order.
When using QUIC mapping scheme defined in Section 2.2, each
unidirectional QUIC stream is used to deliver one encoded frame. In
this mode, the receiver application should manage out of order
streams to ensure the MOQ Objects are delivered to the decoder in the
increasing order of the Object Sequence within a group and then in
the increasing order of the Group Sequence.
When using QUIC mapping as defined in {spg}, one unidirectional QUIC
stream is setup to deliver all the encoded frames (objects) within a
group.
4.2. Encoded Slice
In Slice-based encoding a single video frame is “sliced” into
separate sections and are encoded simultaneously in parallel. Once
encoded, each slice can then be immediately streamed to a decoder
instead of waiting for the entire frame to be encoded first.
In this scheme, the video atom is a encoded slice, starting with the
IDR frame as Object Sequence of 0 for that slice and followed by
delta frames with Object Sequence incremented by 1 successively. A
MOQT Group is identified by set of such objects at each IDR frame
boundaries. To be able to successfully decode and render at the
media consumer, the identifier of the containing video frame for the
slice needs to be carried end to end. This would allow the media
consumer to map the slices to right decoding context of the frame
being processed.
Note: The video frame identifier may be carried either as part of
encoded object's payload header or introduce a group header for
conveying the frame identifier.
When using QUIC mapping scheme defined in Section 2.2, each
unidirectional QUIC stream is used to deliver one encoded slice of a
video frame. When using QUIC mapping as defined in {spg}, each
unidirectional QUIC stream is setup to deliver all the encoded slices
(objects) within a group.
4.3. CMAF Chunk
CMAF [CMAF] chunks are CMAF addressable media objects that contain a
consecutive subset of the media samples in a CMAF fragment. CMAF
chunks can be used by a delivery protocol to deliver media samples as
soon as possible during live encoding and streaming, i.e., typically
less than a second. CMAF chunks enable the progressive encoding,
delivery, and decoding of each CMAF fragment.
A given video application may choose to have chunk duration to span
more than one encoded video frame. When using CMAF chunks, the video
atom is a CMAF chunk. The CMAF chunk containing the IDR Frame shall
have Object Sequence set to 0, with each additional chunk with its
Object Sequence incremented by 1. The Group Sequence is incremented
at every IDR interval and all the CMAF chunks within a given IDR
interval shall be part of the same MOQT Group.
When using QUIC mapping scheme defined in Section 2.2, each
unidirectional QUIC stream is used to deliver a CMAF Chunk. When
using QUIC mapping as defined in {spg}, each unidirectional QUIC
stream is setup to deliver all the CMAF chunks (objects) within a
group. When using QUIC mapping defined in Section 2.3 CMAF chunks
corresponding to CMAF track are delivered over the same
unidirectional QUIC stream.
4.4. CMAF Fragment
CMAF fragments are the media objects that are encoded and decoded.
For scenarios, where the fragments contains one or more complete
coded and independently decodable video sequences, each such fragment
is identified as single MOQT Object and it forms its own MOQT Group.
There is one unidirectional QUIC stream per such an object
Section 2.2. Media senders should stream the bytes of the object, in
the decode order, as they are generated in order the reduce the
latencies.
5. MOQT Track
MOQT Tracks are typically characterized by having a single encoding
and optionally a encryption configuration. Applications can encoded
a captured source stream into one or more qualities as described in
the sub sections below.
5.1. Single Quality Media Streams
For scenarios where the media producer intents to publish single
quality audio and video streams, applications shall map the objects
from such audio and video streams to individual tracks enabling each
track to represent a single quality.
5.2. Multiple Quality Media Streams
It is not uncommon for applications to support multiple qualities
(renditions) per source stream to support receivers with varied
capabilities, enabling adaptive bitrate media flows, for example. We
describe 2 common approaches for supporting multiple
qualities(renditions/encodings) - Simulcast and Layered Coding.
5.2.1. Simulcast
In simulcast, each MOQT track is an time-aligned alternate encoding
(say, multiple resolutions) of the same source content. Simulcasting
allows consumers to switch between tracks at group boundaries
seamlessly.
Few observations:
* Catalog should identify time-aligned relationship between the
simulcasted tracks.
* All the alternate encodings shall matching base timestamp and
duration.
* All the alternate encodings are for the same source media stream.
* Media consumers can pick and choose the right quality by
subscribing to the appropriate track.
* Media consumers react to changing network/bandwidth situations by
subscribing to different quality track at the group boundaries.
5.2.2. Scalable Video Coding (SVC)
SVC defines a coded video representation in which a given bitstream
offers representations of the source material at different levels of
fidelity (spatial, quality, temporal) structured in a hierarchical
manner. Such an organization allows bitstream to be extracted at
lower bit rate than the complete sequence to enable decoding of
pictures with multiple image structures (for sequences encoded with
spatial scalability), pictures at multiple picture rates (for
sequences encoded with temporal scalability), and/or pictures with
multiple levels of image quality (for sequences encoded with SNR/
quality scalability). Different layers can be separated into
different bitstreams. All decoders access the base stream; more
capable decoders can access enhancement streams.
5.2.2.1. All layers in a single MOQT Track
In this mode, the video application transmits all the SVC layers
under a single MOQT Track. When mapping to the MOQT object model,
any of the methods described in Section 4 can be leveraged to mapped
the encoded bitstream into MOQT groups and objects.
When transmitting all the layers as part of a single track, following
properties needs to be considered:
* Catalog should identify the SVC Codec information in its codec
definition.
* Media producer should map each video atom to the MOQ object in the
decode order and can utilize any of the QUIC mapping methods
described in Section 2.
* Dependency information for all the layers (such as spatial/
temporal layer identifiers, dependent descriptions) are encoded in
the bitstream and/or container.
The scheme to map all the layers to a single track is simple to
implement and allows subscribers/media consumers to make independent
layer drop decisions without needing any protocol exchanges (as
needed in Section 5.2.1). However, such a scheme is constrained by
disallowing selective subscriptions to the layers of interest.
5.2.2.2. One SVC layer per MOQT Track
In this mode, each SVC layer is mapped to a MOQT Track. Each unique
combination of fidelity (say spatial and temporal) is identified by a
MOQT Track ( see example below).
+-----------+ +-----------+
| S0T0 | --------> | Track1 |
+-----------+ +-----------+
+-----------+ +-----------+
| S0T1 | --------> | Track2 |
+-----------+ +-----------+
+-----------+ +-----------+
| S1T0 | --------> | Track3 |
+-----------+ +-----------+
+-----------+ +-----------+
| S1T1 | --------> | Track4 |
+-----------+ +-----------+
ex: 2-layer spatial and 2-layer temporal scalability encoding
The catalog should identify the complete list of dependent tracks for
each track that is part of layered coding for a given media stream.
For example the figure below shows a sample layer dependency
structure (2 spatial and temporal layers) and corresponding tracks
dependencies.
+----------+
+----------->| S1T1 |
| | Track4 |
| +----------+
| ^
| |
+----------+ |
| S1TO | |
| Track3 | |
+----------+ +-----+----+
^ | SOT1 |
| | Track2 |
| +----------+
| ^
+----------+ |
| SOTO | |
| Track1 |---------+
+----------+
Catalog Track Dependencies:
Track2 depends on Track1
Track3 depends on Track1
Track4 depends on Track2 and Track3
Within each track, the encoded media for the given layer can follow
mappings defined in Section 4 and can choose from the options defined
in Section 2 for transporting the mapped objects over QUIC. The
bitstream and/or the container should carry the neccessary to capture
video frame level dependencies.
Media consumers would need to consider information from catalog to
group the the related tracks and gather information from the bistream
to establish frame level depedencies. This would allow the consumer
to appropriately map the incoming QUIC streams and MOQ objects to the
right decoder context.
5.2.3. k-SVC
k-SVC is a flavor of layered coding wherein the encoded frames within
a layer depend only on the frames within the same layer, with the
exception that the IDR frame in the enhancement layers depends on the
IDR frame in the next level lower fidelity layer.
When each layer of a k-SVC encoded bitstream is mapped to a MOQT
track, following needs to be taken into consideration:
* Catalog should identify the tracks are related via k-SVC
dependency
* MOQT protocol should be extended to propose a group header that
enables track for the enhancement layer to identify the group
sequence of its dependent track for satisfying IDR Frame
dependency.
6. Object and Track Priorities
Media producers are free to prioritize media delivery between the
tracks by encoding priority information in the MOQT Object Header for
a given track. Relay can utilize these priorities to make forwarding
decisions. "draft-zanaty-moq-priority" specifies a prioritization
mechanism for objects delivered using the Media over QUIC Transport
(MOQT) protocol.
7. Relay Considerations
Relays are not allowed to modify MOQT object header, as it might
break encryption and authentication. However, Relays are free to
apply any of the transport mappings defined in Section 2 that it sees
fit based on the local decisions.
For example, a well engineered Relay network may choose to take
multiple incoming QUIC streams and map it to few outgoing QUIC
streams (similar to one defined in Section 2.3) or the Relays may
choose MOQT object priorities Section 2.4 to decide the necessary
transport mapping. It is important to observe that such decisions
cam be made solely considering the MOQT Object header information.
8. Bitrate Adaptation
TODO: add considerations for client side ABR and possible options for
server side ABR.
9. Usage Mode identification
This specification explores 2 possible usage modes for applications
to consider when using MOQT media delivery protocol :
1. Transport Mapping Section 2
2. MOQT Object model mapping
For interoperability purposes, media producers should communicate its
usage modes to the media consumers. Same can be achieved in one of
the following ways
1. Via out of band, application specific mechanism. This approach
limits the interoperability across applications, however.
2. Exchange the usage modes via Catalog. This approach enables
consumers of catalog to setup their transport/media stacks
appropriately based on the sender's preference.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
10.2. Informative References
[MoQTransport]
Curley, L., Pugin, K., Nandakumar, S., and V. Vasiliev,
"Media over QUIC Transport", Work in Progress, Internet-
Draft, draft-ietf-moq-transport-00, 5 July 2023,
<https://datatracker.ietf.org/doc/html/draft-ietf-moq-
transport-00>.
[CMAF] "Information technology -- Multimedia application format
(MPEG-A) -- Part 19: Common media application format
(CMAF) for segmented media", March 2020.
Appendix A. Security Considerations
This section needs more work.
Appendix B. IANA Considerations
This document doesn't recommend any changes to IANA registries.
Appendix C. Acknowledgments
Thanks to MoQ WG for all the discussions that inspired this
document's existence.
Authors' Addresses
Cullen Jennings
Cisco
Email: fluffy@iii.ca
Suhas Nandakumar
Cisco
Email: snandaku@cisco.com
Mo Zanaty
Cisco
Email: mzanaty@cisco.com