Internet DRAFT - draft-jennings-dispatch-new-media
draft-jennings-dispatch-new-media
Network Working Group C. Jennings
Internet-Draft Cisco
Intended status: Standards Track March 18, 2018
Expires: September 19, 2018
Modular Media Stack
draft-jennings-dispatch-new-media-01
Abstract
A sketch of a proposal for a modular media stack for interactive
communications.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 19, 2018.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Jennings Expires September 19, 2018 [Page 1]
Internet-Draft new-media March 2018
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
5. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 5
6. Connectivity Layer . . . . . . . . . . . . . . . . . . . . . 5
6.1. Snowflake - New ICE . . . . . . . . . . . . . . . . . . . 6
6.2. STUN2 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.2.1. STUN2 Request . . . . . . . . . . . . . . . . . . . . 6
6.2.2. STUN2 Response . . . . . . . . . . . . . . . . . . . 6
6.3. TURN2 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7. Transport Layer . . . . . . . . . . . . . . . . . . . . . . . 8
8. Media Layer - RTP3 . . . . . . . . . . . . . . . . . . . . . 9
8.1. RTP Meta Data . . . . . . . . . . . . . . . . . . . . . . 12
8.2. Securing the messages . . . . . . . . . . . . . . . . . . 12
8.3. Sender requests . . . . . . . . . . . . . . . . . . . . . 12
8.4. Data Codecs . . . . . . . . . . . . . . . . . . . . . . . 13
8.5. Media Keep Alive . . . . . . . . . . . . . . . . . . . . 13
8.6. Forward Error Correction . . . . . . . . . . . . . . . . 13
8.7. MTI Codecs . . . . . . . . . . . . . . . . . . . . . . . 13
8.7.1. Audio . . . . . . . . . . . . . . . . . . . . . . . . 13
8.7.2. Video . . . . . . . . . . . . . . . . . . . . . . . . 13
8.7.3. Annotation . . . . . . . . . . . . . . . . . . . . . 14
8.7.4. Application Data Channels . . . . . . . . . . . . . . 14
8.7.5. Reverse Requests & Stats . . . . . . . . . . . . . . 14
8.8. Message Key Agreement . . . . . . . . . . . . . . . . . . 15
9. Control Layer . . . . . . . . . . . . . . . . . . . . . . . . 15
9.1. Transport Capabilities API . . . . . . . . . . . . . . . 15
9.2. Media Capabilities API . . . . . . . . . . . . . . . . . 15
9.3. Transport Configuration API . . . . . . . . . . . . . . . 16
9.4. Media Configuration API . . . . . . . . . . . . . . . . . 16
9.5. Transport Metrics . . . . . . . . . . . . . . . . . . . . 18
9.6. Flow Metrics API . . . . . . . . . . . . . . . . . . . . 18
9.7. Stream Metrics API . . . . . . . . . . . . . . . . . . . 19
10. Call Signalling - JABBER2 . . . . . . . . . . . . . . . . . . 19
11. Signalling Examples . . . . . . . . . . . . . . . . . . . . . 20
11.1. Simple Audio Example . . . . . . . . . . . . . . . . . . 20
11.1.1. simple audio advertisement . . . . . . . . . . . . . 20
11.1.2. simple audio proposal . . . . . . . . . . . . . . . 21
11.2. Simple Video Example . . . . . . . . . . . . . . . . . . 22
11.2.1. Proposal sent to camera . . . . . . . . . . . . . . 23
11.3. Simulcast Video Example . . . . . . . . . . . . . . . . 24
11.4. FEC Example . . . . . . . . . . . . . . . . . . . . . . 24
11.4.1. Advertisement includes a FEC codec. . . . . . . . . 24
11.4.2. Proposal sent to camera . . . . . . . . . . . . . . 25
12. Switched Forwarding Unit (SFU) . . . . . . . . . . . . . . . 26
Jennings Expires September 19, 2018 [Page 2]
Internet-Draft new-media March 2018
12.1. Software Defined Networking . . . . . . . . . . . . . . 26
12.2. Vector Packet Processors . . . . . . . . . . . . . . . . 27
12.3. Information Centric Networking . . . . . . . . . . . . . 27
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 27
14. Other Work . . . . . . . . . . . . . . . . . . . . . . . . . 27
15. Style of specification . . . . . . . . . . . . . . . . . . . 27
16. Informative References . . . . . . . . . . . . . . . . . . . 28
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 28
1. Introduction
This draft is an accumulation of varios ideas some people are
thinking about. Most of them are fairly separable and could be
morphed into existing protocols though this draft takes a blank sheet
of paper approach to considering what would be the best think if we
were starting from scratch. With that is place, it is possible to
ask which of theses ideas makes sense to back patch into existing
protocols.
2. Goals
o Better connectivity by enable situation where asymmetric media is
possible.
o Design for SFU ( Switch Forwarding Units). Design for multiparty
calls first then consider two party calls as a specialized subcase
of that.
o Designed for client servers with server based controll of clients
o Faster setup
o Pluggable congestion controll
o much much simpler
o end to end security
o remove ability to use STUN / TURN in DDOS reflection attacks
o ability for receiver of video to tell the sender about size
changes of display window such that the sender can match
o Eliminiate the problems with ROC in SRTP
o address reasons people have not used from SDES to DTLS-SRTP
o seperation of call setup and ongoing call / conference control
Jennings Expires September 19, 2018 [Page 3]
Internet-Draft new-media March 2018
o make codec negotiation more generic so that it works for future
codecs
o remove ICE's need for global pacing which is more or less
imposible on general purpose devices like PCs
3. Overview
This draft proposes a new media stack to replace the existing stack
RTP, DTLS-SRTP, and SDP Offer Answer. The key parts of this stack
are connectivity layer, the transport layer, the media layer, a
control API, and the singling layer.
The connectivity layer uses a simplified version of ICE, called
snowflake [I-D.jennings-dispatch-snowflake], to find connectivity
between endpoints and change the connectivity from one address to
another as different networks become available or disappear. It is
based on ideas from [I-D.jennings-mmusic-ice-fix].
The transport layer uses QUIC to provide a hop by hop encrypted,
congestion controlled transport of media. Although QUIC does not
currently have all of the partial reliability mechanisms to make this
work, this draft assumes that they will be added to QUIC.
The media layer uses existing codecs and packages them along with
extra header information to provide information about, when the
sequence needs to be played back, which camera it came from, and
media streams to be synchronized.
The control API is an abstract API that provides a way for the media
stack to report it capabilities and features and a way for the an
application tell the media stack how it should be configured.
Configuration includes what codec to use, size and frame rate of
video, and where to send the media.
The singling layer is based on an advertisement and proposal model.
Each endpoint can create an advertisement that describes what it
supports including things like supported codecs and maximum bitrates.
A proposal can be sent to an endpoint that tells the endpoint exactly
what media to send and receive and where to send it. The endpoint
can accept or reject this proposal in total but cannot change any
part of it.
4. Terminology
o media stream: Stream of information from a single sensor. For
example, a video stream from a single camera. A stream may have
multiple encodings for example video at different resolutions.
Jennings Expires September 19, 2018 [Page 4]
Internet-Draft new-media March 2018
o encoding: A encoded version of a stream. A given stream may have
several encodings at different resolutions. One encoding may
depend on other encodings such as forward error corrections or in
the case of scalable video codecs.
o flow: A logical transport between two computers. Many media
streams can be transported over a single flow. The actually IP
address and ports used to transport data in the flow may change
over time as connectivity changes.
o message: some data or media that to be sent across the network
along with metadata about it. Similar to an RTP packet.
o media source: a camera, microphone or other source of data on an
endpoint
o media sink: a speaker, screen, or other destination for data on an
endpoint
o TLV: Tag Length Value. When used in the draft, the Tag, Length,
and any integer values are coded as variable length integers
similar to how this is done in CBOR.
5. Architecture
Much of the deployments architecture of IETF media designs are based
on a distributed controller for the media stack that is running peer
to peer in each client. Nearly all deployments, by they a cloud
based conferencing systems or an enterprise PBX, use a central
controller that acts as an SBC to try and controll each client. The
goal here would be an deployment architecture that
o support a single controller that controlled all the device in a
given conference or call. The controller could be in the cloud or
running on one of the endpoints.
o design for multi party conference calls first and treat 2 party
calls as a specialed sub case of that
o design with the assumption that an light weight SFU (Switched
Forwarding Unit) was used to distribute media for conference
calls.
6. Connectivity Layer
Jennings Expires September 19, 2018 [Page 5]
Internet-Draft new-media March 2018
6.1. Snowflake - New ICE
All that is needed to discover the connectivity is way to:
o Gather some IP/ports that may work using TURN2 relay, STUN2, and
local addresses.
o A controller, which might be running in the cloud, to inform a
client to send a STUN2 packet from a given source IP/port to a
given destination IP/port.
o The receiver notifies the controller about information on received
STUN2 packets.
o The controller can tell the sender the secret that was in the
packet to prove consent of the receiver to receive data then the
sending client can allow media to flow over that connection.
The actually algorithm used to decide on what pairs of addresses are
tested and in what order does not need to be agreed on by both the
sides of the call - only the controller needs to know it. This
allows the controller to use machine learning, past history, and
heuristics to find an optimal connection much faster than something
like ICE.
The details of this approach are described in
[I-D.jennings-dispatch-snowflake]. Many of ideas in this can be
traced back to [I-D.kaufman-rtcweb-traversal].
6.2. STUN2
The speed of setting up a new media flow is often determined by how
many STUN2 checks need to be done. If the STUN2 packets are smaller,
then the stun checks can be done faster without risk of causing
congestion.
6.2.1. STUN2 Request
A STUN2 request consists of, well, really nothing. The STUN client
just opens a QUIC connection to the STUN server.
6.2.2. STUN2 Response
When the STUN2 sever receives a new QUIC connection, it responds with
the IP address and port that the connection came from.
The client can check it is talking to the correct STUN server by
checking the fingerprint of the certificate. Protocols like ICE
Jennings Expires September 19, 2018 [Page 6]
Internet-Draft new-media March 2018
would need to exchange theses fingerprints instead of all the crazy
stun attributes.
Thanks to Peter Thatcher for proposing STUN over QUIC.
6.3. TURN2
TODO: make TURN2 run over QUIC
Out of band, the client tells the TURN2 server the fingerprint of the
cert it uses to authenticate with. The TURN2 server gives the client
two public IP:port address pairs. One is called inbound and other
called outbound. The client connects to the outbound port and
authenticates to TURN2 server using the TLS domain name of server.
The TURN2 server authenticates the client using mutual TLS with
fingerprint of cert provided by the client. Any time a message or
stun packet is received on the matched inbound port, the TURN2 server
forwards it to the client(s) connected to the outbound port.
A single TURN2 connection can be used for multiple different calls or
session at the same time and a client could choose to allocate the
TURN2 connection at the time that it started up. It does not need to
be done on a per session basis.
The client can not send from the TURN2 server.
Jennings Expires September 19, 2018 [Page 7]
Internet-Draft new-media March 2018
Client A Turn Server Client B
(Media Receiver) (Media Sender)
| | |
| | |
| | |
|(1) OnInit Register (A's fingerprint)
|------------->| |
| | |
| | |
|(2) Register Response (Port Pair (L,R))
|<-------------| |
| | |
| | |
| L(left of Server), R(Right of Server)
| | |
| | |
| | |
|(3) Setup TLS Connection (L port)
|..............| |
| | |
| | |
| | | B send's media to A
| | |
| | |
| | |
| |(4) Media Tx (Received on Port R)
| |<-------------|
| | |
| | |
|(5) Media Tx (Sent from Port L)
|<-------------| |
| | |
| | |
7. Transport Layer
The responsibility of the transport layer is to provide an end to end
crypto layer equivalent to DTLS and they must ensure adequate
congestion control. The transport layer brings up a flow between two
computers. This flow can be used by multiple media streams.
The MTI transport layer is QUIC with packets. It assumes that QUIC
has a way to delivers the packets in an effecent unreliable mode as
wells as an optional way to deliver important metadata packets in a
reliable mode. It assumes that QUIC can report up to the rate
adaptation layer a current max target bandwidth that QUIC can
transmit at. It's possible these are all unrealistic characteristics
Jennings Expires September 19, 2018 [Page 8]
Internet-Draft new-media March 2018
of QUIC in which case a new transport protocol should be developed
that provides these and is layered on top of DTLS for security.
This is secured by checking the fingerprints of the DTLS connection
match the fingerprints provided at the control layer or by checking
the names of the certificates match what was provided at control
layer.
The transport layer needs to be able to set the DSCP values in
transmitting packets as specified by the control layer.
The transport MAY provide a compression mode to remove the redundancy
of the non-encrypted portion of the media messages such as
GlobalEncodingID. For example, a GlobalEncodingID could be mapped to
a QUIC channel and then it could be removed before sending the
message and added back on the receiving side.
The transport need to be able to ensure that it has a very small
chance of being confused with the STUN2 traffic it will be
multiplexed with. (Open issue - if the STUN2 runs on top of same
transport, this becomes less of issue )
The transport crypto needs to be able to export server state that can
be passed out of band to the client to enable the client to make a
zero RTT connection to the server.
8. Media Layer - RTP3
Each message consist of a set of TLV headers with metadata about the
packet, followed by payload data such as the output of audio or video
codec.
There are several message headers that help the receiver understand
what to do with the media. The TLV header are the follow:
o Conference ID: Integer that will be globally unique identifier for
the for all applications using a common call singling system.
This is set by the proposal.
o Endpoint ID: Integer to uniquely identify the endpoint with within
scope of conference ID. This is set by the proposal.
o Source ID: integer to uniquely identify the input source within
the scope a endpoint ID. A source could be a specific camera or a
microphone. This is set by the endpoint and included in the
advertisement.
Jennings Expires September 19, 2018 [Page 9]
Internet-Draft new-media March 2018
o Sink ID: integer to uniquely identify the sink within the scope a
endpoint ID. A sink could be a speaker or screen. This is set by
the endpoint and included in the advertisement. An endpoint
sending media can have this set. If it is set it should transmit
it for 3 frames any time it changes and once every 5 second. An
SFU can add, modify, or delete this from any media packet. TODO -
How to use this for SFU controlled layout - for example, if have
100 users in conference and want to put the 10 most recent
speakers in thumbnails. Do we need this at all ?
o Encoding ID: integer to uniquely identify the encoding of the
stream within the scope of the source ID. Note there may be
multiple encodings of data from the same source. This is set by
the proposal.
o Salt : salt to use for forming the initialization vector for AEAD.
The salt shall be sent as part of the packet and need not be sent
in all the packets. This is created by the endpoint sending the
message.
o GlobalEncodingID: 64 bit hash of concatenation of conference ID,
endpoint ID, source ID, encoding ID
o Capture time: Time when the first sample in the message was
captured. It is a NTP time in ms with the high order bits
discarded. The number of bits in the capture time needs to be
large enough that it does not wrap in for the lifetime of this
stream. This is set by the endpoint sending the message.
o Sequence ID: When the data captured for a single point in time is
too large to fit in a single message, it can be split into
multiple chunks which are sequentially numbered starting at 0
corresponding to the first chunk of the message. This is set by
the endpoint sending the message.
o GlobalMessageID: 64 bit hash of concatenation of conference ID,
endpoint ID, encoding ID, sequence ID
o Active level: this is a number from 0 to 100 indicates the
probability that the sender of this media wishes it to be
considered active media. For example if it was voice, it would be
100 if the person was clearly speaking, and 0 if not, and perhaps
a value in the middle if it was uncertain. This allows an media
switch to select the active speaker in the in a conference call.
o Location: relative or absolute location, direction of view, and
field view. With video coming from drones, 360 cameras, VR light
field cameras, and complex video conferencing rooms, this provides
Jennings Expires September 19, 2018 [Page 10]
Internet-Draft new-media March 2018
the information about the camera or microphone that the receiver
can use to render the correct view. This is end to end encrypted.
o Reference Frame : bool to indicate if this message is part of a
reference frame. Typically, a SFU will switch to the new video
stream at the start of a reference frame.
o DSCP : DSCP to use on transmissions of this message and future
messages on this GlobalEncodingID
o Layer ID : Integer indicating which layer is for scalable video
codecs. SFU may use this to selectively drop a frame.
The keys used for the AEAD are unique to a given conference ID and
endpoint ID.
If the message has any of the following headers, they must occur in
the following order followed by all other headers:
1. GlobalEncodingID,
2. GlobalMessageID,
3. conference ID,
4. endpoint ID,
5. encoding ID,
6. sequence ID,
7. active level,
8. DSCP
Every second there much be at least one message in each encoding that
contains:
o conference ID,
o endpoint ID,
o encoding ID,
o salt,
o and sequence ID headers
Jennings Expires September 19, 2018 [Page 11]
Internet-Draft new-media March 2018
but they are not needed in every packet.
The sequence ID or GlobalMessageID is required in every message and
periodically there should be message with the capture time.
8.1. RTP Meta Data
We tend to end up with a few categories of data associated with the
media:
o Stuff you need at the same time you get the media. For example,
this is a reference frame.
o Stuff you need soon but not instantly. For example the name of
the speaker in a given rectangle of a video stream
And it tends to change at different rates:
o Stuff that you need to process the media and may change but does
not change quickly and you don't need it with every frame. For
example, salt for encryption
o Stuff that you need to join the media but may never change. For
example, resolution of the video is
TODO - think about how to optimize design for each type of meta data
8.2. Securing the messages
The whole message is end to end secured with AEAD. The headers are
authenticated while the payload data is authenticated and encrypted.
Similar to how the IV for AES-GCM is calculated in SRTP, in this case
the IV is computed by xor'ing the salt with the concatenation of the
GlobalEncodingID and low 64 bits of sequence ID. The message
consists of the authenticated data, followed by the encrypted data ,
then the authentication tag.
8.3. Sender requests
The control layer supports requesting retransmission of a particular
media message identified by IDs and capture time it would contain.
The control layer supports requesting a maximum rate for each given
encoding ID.
Jennings Expires September 19, 2018 [Page 12]
Internet-Draft new-media March 2018
8.4. Data Codecs
Data messages including raw bytes, xml, senml can all be sent just
like media by selecting an appropriate codec and a software based
source or sink. An additional parameter to the codec can indicate if
reliably delivery is needed and if in order delivery is needed.
8.5. Media Keep Alive
Provided by transport.
8.6. Forward Error Correction
A new Reed-Solomon based FEC scheme based on
[I-D.ietf-payload-flexible-fec-scheme] that provides FEC over
messages needs to be defined.
8.7. MTI Codecs
8.7.1. Audio
Implementation MUST support at least G711 and Opus
8.7.2. Video
Implementation MUST support at least H.264 and AV1
Video codecs use square pixels.
Video codecs MUST support any aspect ratio within the limits of their
max width and height.
Video codecs can specify a maximum pixel rate, maximum frame rate,
maximum images size. The can also specify a list of binary flags of
supported features which are defined by the codec and may be
supported by the codec for encode, decode, or neither where each
feature can be independently controlled. They can not impose
constraints beyond that. Some existing codecs like vp8 may easily
fit into that while some codec like H264 may need some suspects
defined as new codecs to meet the requirements for this. It is not
expected that all the nuances that could be negotiated with SDP for
264 would be supported in this new media.
Video codecs MUST support a min width and min height of 1.
All video on the wire is oriented such that the first scan line in
the frame is up and first pixel in the scan line is on the left.
Jennings Expires September 19, 2018 [Page 13]
Internet-Draft new-media March 2018
T.38 fax and DTMF are not supported. Fax can be sent as a TIFF
imager over a data channel and DTFM can be done as an application
specific information over a data channel.
TODO: Capture the list of what metadata video encoders produce * if
it is a reference frame or not * resolution * frame-rate ? * capture
time of frame
TODO: Capture the list of what metadata video encoders needs. *
capture timestamp * source and target resolution * source and target
frame-rate * target bitrate * max bitrate * max pixel rate
8.7.3. Annotation
Optional support for annotation based overlay using vector graphics
such as a subset of SVG.
8.7.4. Application Data Channels
Need support for application defined data in both a reliable and
unreliable datagram mode.
8.7.5. Reverse Requests & Stats
The hope is that this is not needed.
Much of what goes in the reverse direction of the media in RTCP is
either used for congestion controll, diagnostics, or controll of the
codec such as requesting to resent a frame or sending a new intra
codec frame for video. The design reduces the need for this.
The congestion controll information which is needed quickly is all
handled at QUIC layer.
The diagnostic type information can be reported from the endpint to
the controller and does not need to flow at the media level.
Information that needs to be delivered reliably can be sent that way
at the QUIC level remove the need for retransmit type request.
System that use selective retransmission to recover from packet loss
of media do not tend to work as well for interactive medias as
forward error correction schemes because of the large latency they
introduce.
Information like requesting a new intra codec frame for video often
needs to come from the controller and can be sent over the signalling
and controll layer.
Jennings Expires September 19, 2018 [Page 14]
Internet-Draft new-media March 2018
8.8. Message Key Agreement
The secret for encrypting messages can be provided in the proposal by
value or by a reference. The reference approach allows the client to
get it from a messaging system where the server creating the proposal
may not have access to the the secret. For example, it might come
from a system like [I-D.barnes-mls-protocol].
9. Control Layer
The control layer needs an API to find out what the capabilities of
the device are, and then a way to set up sending and receiving
stream. All media flow are only in one direction. The control is
broken into control of connectivity and transports, and control of
media streams.
9.1. Transport Capabilities API
An API to get information for remote connectivity including:
o set the IP, port, and credential for each TURN2 server
o can return the IP, port tuple for the remote side to send to TURN2
server
o gather local IP, port, protocol tuples for receiving media
o report SHA256 fingerprint of local TLS certificate
o encryption algorithms supported
o report an error for a bad TURN2 credential
9.2. Media Capabilities API
Send and receive codecs are consider separate codecs and can have
separate capabilities though the default to the same if not specified
separately.
For each send or receive audio codec, an API to learn:
o codec name
o the max sample rate
o the max sample size
o the max bitrate
Jennings Expires September 19, 2018 [Page 15]
Internet-Draft new-media March 2018
For each send or receive video codec, an API to learn:
o codec name
o the max width
o the max height
o the max frame rate
o the max pixel depth
o the max bitrate
o the max pixel rate ( pixels / second )
9.3. Transport Configuration API
To create a new flow, the information that can be configured is:
o turn server to use
o list of IP, Port, Protocol tuples to try connecting to
o encryption algorithm to use
o TLS fingerprint of far side
An api to allow modification of the follow attributes of a flow:
o total max bandwidth for flow
o forward error correction scheme for flow
o FEC time window
o retransmission scheme for flow
o addition IP, Port, Protocol pairs to send to that may improve
connectivity
9.4. Media Configuration API
For all streams:
o set conference ID
o set endpoint ID
Jennings Expires September 19, 2018 [Page 16]
Internet-Draft new-media March 2018
o set encoding ID
o salt and secret for AEAD
o flag to pause transition
For each transmitted audio steam, a way to set the:
o audio codec to use
o media source to connect
o max encoded bitrate
o sample rate
o sample size
o number of channels to encode
o packetization time
o process as one of : automatically set, raw, speech, music
o DSCP value to use
o flag to indicating to use constant bit rate
o optionally set a sinkID to periodically include in the media
For each transmitted video stream, a way to set
o video codec to use
o media source to connect to
o max width and max height
o max encoded bitrate
o max pixel rate
o sample rate
o sample size
o process as one of : automatically set, rapidly changing video,
fine detail video
Jennings Expires September 19, 2018 [Page 17]
Internet-Draft new-media March 2018
o DSCP value to use
o for layered codec, a layer ID and set of layers IDs this depends
on
o optionally set a sinkID to periodically include in the media
For each transmitted video stream, a way to tell it to:
o encode the next frame as an intra frame
For each transmitted data stream:
o a way to send a data message and indicate reliable or unreliable
transmission
For each received audio stream:
o audio codec to use
o media sink to connect to
o lip sync flag
For each received video stream:
o video codec to use
o media sink to connect to
o lip sync flag
For each received data stream:
o notification of received data messages
Note on lip sync: For any streams that have the lip sync flag set to
true, the render attempts to synchronize their play back.
9.5. Transport Metrics
o report gathering state and completion
9.6. Flow Metrics API
For each flow, report:
o report connectivity state
Jennings Expires September 19, 2018 [Page 18]
Internet-Draft new-media March 2018
o report bits sent
o report packets lost
o report estimated RTT
o report SHA256 fingerprint for certificate of far side
o current 5 tuple in use
9.7. Stream Metrics API
For sending streams:
o Bits sent
o packets lost
For receiving streams:
o capture time of most recently receives packet
o endpoint ID of more recently received packet
o bits received
o packets lost
For video streams (send & receive):
o current encoded width and height
o current encoded frame rate
10. Call Signalling - JABBER2
Call signalling is out of scope for usages like WebRTC but other
usages may want a common REST API they can use.
Call signalling works be having the client connect to a server when
it starts up and send its current advertisement and open a web socket
or to receive proposals from the server. A client can make a rest
call indicating the parties(s) it wishes to connect to and the server
will then send proposals to all clients that connect them. The
proposal tell each client exactly how to configure it's media stack
and MUST be either completely accepted, or completely rejected.
Jennings Expires September 19, 2018 [Page 19]
Internet-Draft new-media March 2018
The signalling is based on the the advertisement proposal ideas from
[I-D.peterson-sipcore-advprop].
We define one round trip of signalling to be a message going from a
client up to a server in the cloud, then down to another client which
returns a response along the reverse path. With this definition SIP
is takes 1.5 round trips or more if TURN is needed to set up a call
while this takes 0.5 round trips.
11. Signalling Examples
11.1. Simple Audio Example
11.1.1. simple audio advertisement
{
"receiveAt":[
{
"relay":"2001:db8::10:443",
"stunSecret":"s8i739dk8",
"tlsFingerprintSHA256":"1283938"
},
{
"stun":"203.0.113.10:43210",
"stunSecret":"s8i739dk8",
"tlsFingerprintSHA256":"1283938"
},
{
"local":"192.168.0.2:443",
"stunSecret":"s8i739dk8",
"tlsFingerprintSHA256":"1283938"
}
],
"sources":[
{
"sourceID":1,
"sourceType":"audio",
"codecs":[
{
"codecName":"opus",
"maxBitrate":128000
},
{
"codecName":"g711"
}
]
}
],
Jennings Expires September 19, 2018 [Page 20]
Internet-Draft new-media March 2018
"sinks":[
{
"sinkID":1,
"sourceType":"audio",
"codecs":[
{
"codecName":"opus",
"maxBitrate":256000
},
{
"codecName":"g711"
}
]
}
]
}
11.1.2. simple audio proposal
{
"receiveAt":[
{
"relay":"2001:db8::10:443",
"stunSecret":"s8i739dk8"
},
{
"stun":"203.0.113.10:43210",
"stunSecret":"s8i739dk8"
},
{
"local":"192.168.0.10:443",
"stunSecret":"s8i739dk8"
}
],
"sendTo":[
{
"relay":"2001:db8::20:443",
"stunSecret":"20kdiu83kd8",
"tlsFingerprintSHA256":"9389739"
},
{
"stun":"203.0.113.20:43210",
"stunSecret":"20kdiu83kd8",
"tlsFingerprintSHA256":"9389739"
},
{
"local":"192.168.0.20:443",
"stunSecret":"20kdiu83kd8",
Jennings Expires September 19, 2018 [Page 21]
Internet-Draft new-media March 2018
"tlsFingerprintSHA256":"9389739"
}
],
"sendStreams":[
{
"conferenceID":4638572387,
"endpointID":23,
"sourceID":1,
"encodingID":1,
"codecName":"opus",
"AEAD":"AES128-GCM",
"secret":"xy34",
"maxBitrate":24000,
"packetTime":20
}
],
"receiveStreams":[
{
"conferenceID":4638572387,
"endpointID":23,
"sinkID":1,
"encodingID":1,
"codecName":"opus",
"AEAD":"AES128-GCM",
"secret":"xy34"
}
]
}
11.2. Simple Video Example
Advertisement for simple send only camera with no audio
Jennings Expires September 19, 2018 [Page 22]
Internet-Draft new-media March 2018
{
"sources":[
{
"sourceID":1,
"sourceType":"video",
"codecs":[
{
"codecName":"av1",
"maxBitrate":20000000,
"maxWidth":3840,
"maxHeight":2160,
"maxFrameRate":120,
"maxPixelRate":248832000,
"maxPixelDepth":8
}
]
}
]
}
11.2.1. Proposal sent to camera
{
"sendTo":[
{
"relay":"2001:db8::20:443",
"stunSecret":"20kdiu83kd8",
"tlsFingerprintSHA256":"9389739"
}
],
"sendStreams":[
{
"conferenceID":0,
"endpointID":0,
"sourceID":0,
"encodingID":0,
"codecName":"av1",
"AEAD":"NULL",
"width":640,
"height":480,
"frameRate":30
}
]
}
Jennings Expires September 19, 2018 [Page 23]
Internet-Draft new-media March 2018
11.3. Simulcast Video Example
Advertisement same as simple camera above but proposal has two
streams with different encodingID.
{
"sendTo":[
{
"relay":"2001:db8::20:443",
"stunSecret":"20kdiu83kd8",
"tlsFingerprintSHA256":"9389739"
}
],
"sendStreams":[
{
"conferenceID":0,
"endpointID":0,
"sourceID":0,
"encodingID":1,
"codecName":"av1",
"AEAD":"NULL",
"width":1920,
"height":1080,
"frameRate":30
},
{
"conferenceID":0,
"endpointID":0,
"sourceID":0,
"encodingID":2,
"codecName":"av1",
"AEAD":"NULL",
"width":240,
"height":240,
"frameRate":15
}
]
}
11.4. FEC Example
11.4.1. Advertisement includes a FEC codec.
Jennings Expires September 19, 2018 [Page 24]
Internet-Draft new-media March 2018
{
"sources":[
{
"sourceID":1,
"sourceType":"video",
"codecs":[
{
"codecName":"av1",
"maxBitrate":20000000,
"maxWidth":3840,
"maxHeight":2160,
"maxFrameRate":120,
"maxPixelRate":248832000,
"maxPixelDepth":8
},
{
"codecName":"flex-fec-rs"
}
]
}
]
}
11.4.2. Proposal sent to camera
Jennings Expires September 19, 2018 [Page 25]
Internet-Draft new-media March 2018
{
"sendTo":[
{
"relay":"2001:db8::20:443",
"stunSecret":"20kdiu83kd8",
"tlsFingerprintSHA256":"9389739"
}
],
"sendStreams":[
{
"conferenceID":0,
"endpointID":0,
"sourceID":0,
"encodingID":1,
"codecName":"av1",
"AEAD":"NULL",
"width":640,
"height":480,
"frameRate":30
},
{
"conferenceID":0,
"endpointID":0,
"sourceID":0,
"encodingID":2,
"AEAD":"NULL",
"codecName":"flex-fec-rs",
"fecRepairWindow":200,
"fecRepairEncodingIDs":[
1
]
}
]
}
12. Switched Forwarding Unit (SFU)
When several clients are in conference call, the SFU can forward
packets based on looking at which clients needs a given
GlobalEncodingID. By looking at the "active level", the SFU can
figure out which endpoints are the active speaker and forward only
those. The SFU never changes anything in the message.
12.1. Software Defined Networking
Is it possible to use the packet recycling concepts in SDN to forward
a single packet to multiple endpoints? Can the way SDN forwarding
would work be adapted to use a SDN router as a SFU?
Jennings Expires September 19, 2018 [Page 26]
Internet-Draft new-media March 2018
12.2. Vector Packet Processors
Can we use fast VPP systems like fd.io to create a SFU?
12.3. Information Centric Networking
What changes would be needed to map RTP2 into the prefix and suffix
of hICN?
13. Acknowledgements
Thank you for input from: Harald Alvestrand, Espen Berger, Matthew
Kaufman, Patrick Linskey, Eric Rescorla, Peter Thatcher, Malcolm
Walters Martin Thomson
14. Other Work
rfc7016
draft-kaufman-rtcweb-traversal
Consider using terminology from rfc7656
docs.google.com/presentation/
d/1Sg_1TVCcKJvZ8Egz5oa0CP01TC2rNdv9HVu7W38Y4zA/
edit#slide=id.g29a8672e18_22_120
docs.google.com/presentation/d/1o-
o5jZBLw3Py1OuenzWDkxDG6NigSmLHvGw5KemKWLw/
edit#slide=id.g2f8f4acff1_1_249
cs.chromium.org/chromium/src/third_party/webrtc/common_video/include/
video_frame.h
15. Style of specification
Fundamental driven by experiments. The proposal is to have a high
level overview document where we document some of the design - this
document could be a start of that. Then write a a spec for each on
of the separable protocol parts such as STUN2, TURN2, etc.
The protocol specs would contain a high level overview like you might
find on a wikipedia page and the details of the protocol encoding
would be provided in an open source reference implementation. The
test code for the references implementation helps test the spec. The
implementation is not optimized for perfromance but instead is simply
trying to clearly illustrate the protocol. Particular version of the
draft would be bound to a tagged version of the source code. All the
Jennings Expires September 19, 2018 [Page 27]
Internet-Draft new-media March 2018
source code would be under normal IETF IPR rules just like it was
included directly in the draft.
16. Informative References
[I-D.barnes-mls-protocol]
Barnes, R., Millican, J., Omara, E., Cohn-Gordon, K., and
R. Robert, "The Messaging Layer Security (MLS) Protocol",
draft-barnes-mls-protocol-00 (work in progress), February
2018.
[I-D.ietf-payload-flexible-fec-scheme]
Zanaty, M., Singh, V., Begen, A., and G. Mandyam, "RTP
Payload Format for Flexible Forward Error Correction
(FEC)", draft-ietf-payload-flexible-fec-scheme-06 (work in
progress), March 2018.
[I-D.jennings-dispatch-snowflake]
Jennings, C. and S. Nandakumar, "Snowflake - A Lighweight,
Asymmetric, Flexible, Receiver Driven Connectivity
Establishment", draft-jennings-dispatch-snowflake-01 (work
in progress), March 2018.
[I-D.jennings-mmusic-ice-fix]
Jennings, C., "Proposal for Fixing ICE", draft-jennings-
mmusic-ice-fix-00 (work in progress), July 2015.
[I-D.kaufman-rtcweb-traversal]
Kaufman, M. and J. Rosenberg, "NAT Traversal Requirements
for RTCWEB", draft-kaufman-rtcweb-traversal-00 (work in
progress), June 2011.
[I-D.peterson-sipcore-advprop]
Peterson, J. and C. Jennings, "The Advertisement/Proposal
Model of Session Description", draft-peterson-sipcore-
advprop-01 (work in progress), March 2011.
Author's Address
Cullen Jennings
Cisco
Email: fluffy@iii.ca
Jennings Expires September 19, 2018 [Page 28]