Independent Submission                                         L. Curley
Internet-Draft                                                    Twitch
Intended status: Informational                           9 February 2022
Expires: 13 August 2022


                 Warp - Segmented Live Video Transport
                         draft-lcurley-warp-00

Abstract

   This document defines the core behavior for Warp, a segmented live
   video transport protocol.  Warp maps live media to QUIC streams based
   on the underlying media encoding.  Media is prioritized to minimize
   latency during congestion.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 13 August 2022.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.





Curley                   Expires 13 August 2022                 [Page 1]

Internet-Draft                    WARP                     February 2022


Table of Contents

   1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Terms and Definitions . . . . . . . . . . . . . . . . . .   3
   2.  Segments  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Initialization  . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Media . . . . . . . . . . . . . . . . . . . . . . . . . .   4
       2.2.1.  Video . . . . . . . . . . . . . . . . . . . . . . . .   4
       2.2.2.  Audio . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Streams . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Messages  . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Segments  . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.3.  Prioritization  . . . . . . . . . . . . . . . . . . . . .   5
       3.3.1.  Live Content  . . . . . . . . . . . . . . . . . . . .   6
       3.3.2.  Recorded Content  . . . . . . . . . . . . . . . . . .   6
     3.4.  Cancellation  . . . . . . . . . . . . . . . . . . . . . .   6
     3.5.  Middleware  . . . . . . . . . . . . . . . . . . . . . . .   7
   4.  Messages  . . . . . . . . . . . . . . . . . . . . . . . . . .   7
     4.1.  init  . . . . . . . . . . . . . . . . . . . . . . . . . .   7
     4.2.  media . . . . . . . . . . . . . . . . . . . . . . . . . .   8
     4.3.  priority  . . . . . . . . . . . . . . . . . . . . . . . .   8
     4.4.  Extensions  . . . . . . . . . . . . . . . . . . . . . . .   8
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
     7.2.  Informative References  . . . . . . . . . . . . . . . . .   9
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Overview

   Warp is a live video transport protocol that utilizes the [QUIC]
   network protocol.

   The live stream is split into segments (Section 2) at I-frame
   boundaries.  These are fragmented MP4 files as defined in [ISOBMFF].
   Initialization segments contain track metadata while media segments
   contain either video or audio samples.

   QUIC streams (Section 3) are used to transfer messages and segments
   between endpoints.  These streams are prioritized based on the
   contents, such that the most important media is delivered during
   congestion.

   Messages (Section 4) are sent over streams alongside segments.  These
   are used to carry necessary metadata and control messages.




Curley                   Expires 13 August 2022                 [Page 2]

Internet-Draft                    WARP                     February 2022


1.1.  Terms and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   Commonly used terms in this document are described below.

   Frame:  An image to be rendered at a specific point in time.

   I-frame:  A frame that does not depend on the contents of other
      frames.

   Group of pictures (GOP):  A I-frame followed by a sequential series
      of dependent frames.

   Group of samples:  A sequential series of audio samples starting at a
      given timestamp.

   Segment:  A sequence of video frames and/or audio samples serialized
      into a container.

   Presentation Timestamp (PTS):  A point in time when video/audio
      should be presented to the viewer.

   Media producer:  An endpoint sending media over the network.

   Media consumer:  An endpoint receiving media over the network.

   Congestion:  Packet loss and queuing caused by degraded or overloaded
      networks.

2.  Segments

   The live stream is split into segments before being transferred over
   the network.  Segments are fragmented MP4 files as defined by
   [ISOBMFF].

   There are two types of segments: initialization and media.

2.1.  Initialization

   Initialization segments contain track metadata but no sample data.






Curley                   Expires 13 August 2022                 [Page 3]

Internet-Draft                    WARP                     February 2022


   Initialization segments MUST consist of a File Type Box ('ftyp')
   followed by a Movie Box ('moov').  This Movie Box consists of Movie
   Header Boxes ('mvhd'), Track Header Boxes ('tkhd'), Track Boxes
   ('trak'), followed by a final Movie Extends Box ('mvex').  These
   boxes MUST NOT contain any samples and MUST have a duration of zero.

   Note that a Common Media Application Format Header [CMAF] meets all
   these requirements.

2.2.  Media

   Media segments contain media samples for a single track.

   Media segments MUST consist of a Segment Type Box ('styp') followed
   by at least one media fragment.  Each media fragment consists of a
   Movie Fragment Box ('moof') followed by a Media Data Box ('mdat').
   The Media Fragment Box MUST contain a Movie Fragment Header Box
   ('mfhd') and Track Box ('trak') with a Track ID ('track_ID') matching
   a Track Box in the initialization segment.

   Note that a Common Media Application Format Segment [CMAF] meets all
   these requirements.

2.2.1.  Video

   Media segments containing video data MUST start with an I-frame.
   Media fragments MAY contain a single frame, minimizing latency at the
   cost of a small increase in segment size.  Video frames MUST be in
   decode order.

2.2.2.  Audio

   Media fragments MAY contain a single group of audio samples,
   minimizing latency at the cost of a small increase in segment size.

3.  Streams

   Warp uses unidirectional QUIC streams to transfer messages and
   segments over the network.  The establishment of the QUIC connection
   is outside the scope of this document.

   An endpoints MAY both send media (producer) and receive media
   (consumer).  This is accomplished by sending messages and segments
   over unidirectional streams.  Streams contain any number of messages
   and segments concatenated together.






Curley                   Expires 13 August 2022                 [Page 4]

Internet-Draft                    WARP                     February 2022


3.1.  Messages

   Messages are used to control playback or carry metadata about
   upcoming segments.

   A Warp Box ('warp') is a top-level MP4 box as defined in [ISOBMFF].
   The contents of this box is a warp message.  See the messages section
   (Section 4) for the encoding and types available.

3.2.  Segments

   Segments are transferred over streams alongside messages.  Each
   segment MUST be preceded by an init (Section 4.1) or media
   (Section 4.2) message, indicating the type of segment and providing
   additional metadata.

   The media producer SHOULD send each segment as a unique stream to
   avoid head-of-line blocking.  The media producer MAY send multiple
   segments over a single stream, for simplicity, when head-of-line
   blocking is desired.

   A segment is the smallest unit of delivery, as the tail of a segment
   can be safely delayed/dropped without decode errors.  A future
   version of Warp will support layered coding (additional QUIC streams)
   to enable dropping or downscalling frames in the middle of a segment.

3.3.  Prioritization

   Warp utilizes precedence to deliver the most important content during
   congestion.

   The media producer assigns a numeric presidence to each stream.  This
   is a strict prioritzation scheme, such that any available bandwidth
   is allocated to streams in descending order.  QUIC supports stream
   prioritization but does not standardize any mechanisms; see
   Section 2.3 in [QUIC].  The media producer MUST support sending
   priorized streams.  The media producer MAY choose to delay
   retransmitting lower priority streams when possible within QUIC flow
   control limits.

   The media consumer determines how long to wait for a given segment
   (buffer size) before skipping ahead.  The media consumer MAY cancel a
   skipped segment to save bandwidth, or leave it downloading in the
   background (ex. to support rewind).

   Prioritization allows a single media producer to support multiple
   media consumers with different latency targets.  For example, one
   consumer could have a 1s buffer to minimize latency, while another



Curley                   Expires 13 August 2022                 [Page 5]

Internet-Draft                    WARP                     February 2022


   conssumer could have a 5s buffer to improve quality, while a yet
   another consumer could have a 30s buffer to receive all media (ex.
   VOD recorder).

3.3.1.  Live Content

   Live content is encoded and delivered in real-time.  Media delivery
   is blocked on the encoder throughput, except during congestion
   causing limited network throughput.  To best deliver live content:

   *  Audio streams SHOULD be prioritized over video streams.  This
      allows the media consumer to skip video while audio continues
      uninterupted during congestion.

   *  Newer video streams SHOULD be prioritized over older video
      streams.  This allows the media consumer to skip older video
      content during congestion.

   For example, this formula will prioritze audio segments, but only up
   to 3s in the future:

     if is_audio:
       precedence = timestamp + 3s
     else:
       precedence = timestamp

3.3.2.  Recorded Content

   Recorded content has already been encoded.  Media delivery is blocked
   exclusively on network throughput.

   Warp is primarily designed for live content, but can switch to head-
   of-line blocking by changing stream prioritization.  This is also
   useful for content that should not be skipped over, such as
   advertisements.  To enable head-of-line blocking:

   *  Older streams SHOULD be prioritized over newer streams.

   For example, this formula will prioritize older segments:

     precedence = -timestamp

3.4.  Cancellation

   During congestion, prioritization intentionally cause stream
   starvation for the lowest priority streams.  Some form of starvation
   will last until the network fully recovers, which may be indefinite.




Curley                   Expires 13 August 2022                 [Page 6]

Internet-Draft                    WARP                     February 2022


   The media consumer SHOULD cancel a stream (via a QUIC STOP_SENDING
   frame) after it has been skipped to save bandwidth.  The media
   producer SHOULD reset the lowest priority stream (via QUIC
   RESET_STREAM frame) when nearing resource limits.  Both of these
   actions will effectively drop the tail of the segment.

3.5.  Middleware

   Media may go through multiple hops and processing steps on the path
   from the broadcaster to player.  The full effectiveness of warp as an
   end-to-end protocol depends on middleware support.

   *  Middleware MUST maintain stream idependence to avoid introducing
      head-of-line blocking.

   *  Middleware SHOULD maintain stream prioritization when traversing
      networks susceptible to congestion.

   *  Middleware MUST forward the priority message (Section 4.3) for
      downstream servers.

4.  Messages

   Warp endpoints communicate via messages contained in the top-level
   Warp Box (warp).

   A warp message is JSON object, where the key defines the message type
   and the value depends on the message type.  Unknown messages MUST be
   ignored.

   An endpoint MUST send messages sequentially over a single stream when
   ordering is required.  Messages MAY be combined into a single JSON
   object when ordering is not required.

4.1.  init

   The init message indicates that the remainder of the stream contains
   an initialization segment.

   {
     init: {
       id: int
     }
   }

   id:  Incremented by 1 for each initialization segment.





Curley                   Expires 13 August 2022                 [Page 7]

Internet-Draft                    WARP                     February 2022


4.2.  media

   The media message contains metadata about the next media segment in
   the stream.

   {
     segment: {
       init: int,
       timestamp: int,
     }
   }

   init:  The id of the cooresponding initialization segment.  A decoder
      MUST block until the coorespending init message to arrive.

   timestamp:  The presentation timestamp in milliseconds for the first
      frame/sample in the next segment.  This timestamp MUST be used
      when it does not match the timestamp in the media container.

4.3.  priority

   The priority message informs middleware about the intended priority
   of the current stream.  Any middleware MAY ignore this value but
   SHOULD forward it.

   {
     priority: {
       precedence: int,
     }
   }

   precedence:  An integer value, indicating that any available
      bandwidth SHOULD be allocated to streams in descending order.

4.4.  Extensions

   Custom messages MUST start with x-. Unicode LATIN SMALL LETTER X
   (U+0078) followed by HYPHEN-MINUS (U+002D).

   Custom messages SHOULD use a unique prefix to reduce collisions.  For
   example: x-twitch-load would contain identification required to start
   playback of a Twitch stream.

5.  Security Considerations

   TODO





Curley                   Expires 13 August 2022                 [Page 8]

Internet-Draft                    WARP                     February 2022


6.  IANA Considerations

   This document has no IANA actions.

7.  References

7.1.  Normative References

   [ISOBMFF]  "Information technology — Coding of audio-visual objects —
              Part 12: ISO Base Media File Format", December 2015.

   [QUIC]     Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
              Multiplexed and Secure Transport", RFC 9000,
              DOI 10.17487/RFC9000, May 2021,
              <https://www.rfc-editor.org/rfc/rfc9000>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

7.2.  Informative References

   [CMAF]     "Information technology -- Multimedia application format
              (MPEG-A) -- Part 19: Common media application format
              (CMAF) for segmented media", March 2020.

Contributors

   *  Michael Thornburgh

Author's Address

   Luke Curley
   Twitch

   Email: lcurley@twitch.tv










Curley                   Expires 13 August 2022                 [Page 9]