Internet DRAFT - draft-pthatcher-mmusic-many-sources
draft-pthatcher-mmusic-many-sources
Network Working Group P. Thatcher
Internet-Draft J. Uberti
Intended status: Standards Track Google
Expires: August 8, 2013 February 4, 2013
An argument for encoding multiple media sources per m= section of SDP.
draft-pthatcher-mmusic-many-sources-00
Abstract
This document explains why it is preferable to have multiple media
sources encoded in SDP as one m= section rather than many m=
sections, especially when there are a large number of video sources
which change frequently and need dynamic resolution changes, such as
in a video conferencing system.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 8, 2013.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Thatcher & Uberti Expires August 8, 2013 [Page 1]
Internet-Draft Argument for multiple sources per m-line February 2013
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The downsides of many, frequently changing m= sections. . . . . 3
3. Responses to the benefits of many m= sections. . . . . . . . . 4
4. Other considerations for a video conferencing system. . . . . . 5
4.1. Per-source negotiation is not a major factor . . . . . . . 5
4.2. Source selection . . . . . . . . . . . . . . . . . . . . . 5
4.3. Extensibility to very large or dynamic conferences . . . . 6
5. Use in Production . . . . . . . . . . . . . . . . . . . . . . . 6
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7
Thatcher & Uberti Expires August 8, 2013 [Page 2]
Internet-Draft Argument for multiple sources per m-line February 2013
1. Introduction
There's a long debate over whether to encode multiple media sources
in SDP as multiple m= sections or whether to encode multiple sources
into a single m= section, such as in RFC5576.
When there are a large number of sources (10-100, such as in a video
conferencing system), multiple m= sections without BUNDLE would
require many (10-100) transports, which would incur significant
overhead and fragility. This alone would be sufficient argument for
not using multiple m= sections when dealing with large numbers of
sources.
However, what if we can combine BUNDLE and multiple m= sections?
Would that be usable for scenarios with many sources that change
frequently (such as a video conferencing system)?
We believe that such an approach (of BUNDLE + many-m-sections) has a
number of significant problems when dealing with many sources, and
that an approach using multiple sources m= section (as in RFC5576)
should be preferred.
2. The downsides of many, frequently changing m= sections.
When there are many sources, and when both sides can add or remove
sources at any time (which is common in a video conferencing system),
using many m= sections has a number of serious problems.
o If the answerer has more sources than the offerer, the answerer
cannot put them in the answer, since the answer must have an equal
number of m= sections as the offer.
o If both ends wish to add or remove sources at the same time, the
result is signaling glare, since m= sections are identified by
index. It's unclear how this would be resolved efficiently (a
known unknown).
o If one side adds or removes a source at the same time that the
other refers to a source (such as signaling a desired resolution
for a specific source) the result is more complex signaling glare,
again due to the lookup by index. It's unclear how this would be
resolved efficiently (another known unknown).
o Adding or removing a source requires sending a full list of all
sources, which doesn't scale well to 10s or 100s of sources,
especially when they are added or removed frequently.
o Because this solution requires BUNDLE, which is still not widely
supported, it can be deployed in fewer situations, and will take
longer to reach usable support. There are many open questions
regarding BUNDLE, including whether partial BUNDLEing is allowed;
if it is, then adding a new media stream with BUNDLE will result
Thatcher & Uberti Expires August 8, 2013 [Page 3]
Internet-Draft Argument for multiple sources per m-line February 2013
in having to generate new transport candidates, in the event the
remote side doesn't want to BUNDLE the new stream.
o With BUNDLE, use of RFC5576 a=ssrc attributes is required in order
for the recipient to be able to properly demultiplex incoming RTP.
Payload type isn't sufficient as a demux point, since RTCP packets
don't contain RTP payload types. Therefore, this approach
requires both sides to understand RFC5576, just like the single-m-
section approach.
o BUNDLE has many complex rules regarding how the various SDP
attributes in the different m= lines need to be constructed, since
there is only a single RTP session. For one, all payload types
need to be unique, which could complicate the use of per-source
attributes with many sources; if having per-source attributes
required the payload types to be different, and there were so many
sources that the payload type space were exhausted, then what
would we do? There are lots of "unknown unknowns" here. Or, as I
like to say "thar be dragons".
3. Responses to the benefits of many m= sections.
The idea of using many m= sections with BUNDLE is not without merit.
The major benefits that have been suggested are:
1. It allows per-source SDP attributes.
2. We have to support multiple m= sections anyway.
3. There's no advantage to multi-source m= sections.
4. There are many unknowns with having multiple sources per m=
section.
However:
1. For a multi-source m= section, support for most of the important
SDP attributes is already defined on a per-source basis in
RFC5576, and the few remaining attributes already have proposals.
As an example, if we wanted to have two audio sources, one with a
high bitrate and one with a low bitrate, we could describe that
using the per-source fmtp parameters defined in RFC5576 to
indicate, say, high bitrate Opus for one source and low bitrate
Opus for the other. In short, this is a solved problem for the
multi-source m= section approach, and so the multi-m=section
approach doesn't have much of an advantage.
2. While multiple m= sections will still be needed for a finite
number of legacy cases, that does not mean multiple m= sections
is suitable for all cases, including those with many, frequently
changing sources. As has been shown, there are a number of
problems. We can continue to use multiple m= sections in certain
cases for legacy interop while using multiple sources per m=
section going forward so that we can support many sources without
the stated problems.
Thatcher & Uberti Expires August 8, 2013 [Page 4]
Internet-Draft Argument for multiple sources per m-line February 2013
3. As mentioned, many m= sections has a number of significant
problems, and the multi-source m= section avoids them. Thus, it
does have advantages.
4. The RFC5576-based approach has already been in use for many years
in a major video conferencing system (Google+ Hangouts), and has
worked very well. We feel that we have searched the "unknown"
space and solved all of the major issues. In other words, we
feel that the multi-source m= section is mostly a solved problem
for the case of many sources that change frequently, such as in a
video conferencing system. On the other hand, the multi m=
section approach has never been tried by anyone, that we are
aware of, with a large number of m= sections and so almost be
definition has many unknown unknowns.
4. Other considerations for a video conferencing system.
4.1. Per-source negotiation is not a major factor
In a video conferencing system, such as Google+ Hangouts, we have
found that it is not only unnecessary to negotiate per-source codecs,
RTP header extensions, etc, it's actually something we don't want to
do, because it is simply more complexity that isn't needed. We have
found that it is best to negotiate the codecs, RTP header extensions,
etc, once, independent of sources, and then simply select from the
negotiated options the settings for each source. This selection can
be done by the receiver, but in many cases (e.g. selection of
bitrate), it is far better for the sender to make this choice. The
sender is aware of the details of the media it is sending, and the
bandwidth estimate to the receiver, so it is in a far better position
to decide with what options a given source should be encoded (and it
can change these options at any time with no signaling delay).
Thus, from our view, the per-source codec, RTP header extension, etc,
negotiation of the multi m= section approach is not an advantage, but
rather an additional complexity.
4.2. Source selection
It is desirable in a video conferencing system to allow the receiver
to tell the sender what resolution to send on a per-source basis, and
to change that value at any time throughout the call. As discussed,
the multiple m= section approach has a problem when once side selects
a source resolution while the other side modifies the list of
sources.
A multi-source m= section approach avoids this, and work as already
begun defining source selection SDP, as in https://tools.ietf.org/
Thatcher & Uberti Expires August 8, 2013 [Page 5]
Internet-Draft Argument for multiple sources per m-line February 2013
html/draft-lennox-mmusic-sdp-source-selection-02
4.3. Extensibility to very large or dynamic conferences
SDP, by its nature, describes the full state of a session, with no
support for partial updates. Thus, any mechanism for describing or
controlling sources based on SDP will become impractical as the
number of sources becomes very large (hundreds or more) or the list
of sources changes quickly. In such scenarios, other mechanisms for
describing or controlling sources are preferred. Several mechanisms
along these lines have been proposed, including the CLUE signaling
channel (sent on an independent SCTP data channel), extensions to the
XCON conference event package (using SIP SUBSCRIBE/NOTIFY), or the
Codec Operation Point (COP) messages (sent over RTCP). In all these
cases, however, SDP mechanisms will be used to describe the RTP
sessions that carry the sources that are controlled or described
externally. It is very desirable that the syntax of the SDP
descriptions of these sessions be as similar as possible to that of
sessions with sources described in SDP. Multi-source m= sections
meet this naturally -- the sessions controlled externally would
simply omit the a=ssrc attributes. The multiple m= line proposal, by
contrast, does not appear to have any similarly natural way to extend
itself to external control.
5. Use in Production
We'd like to reiterate that an RFC-5576-based approach, with multiple
sources per m= section is already in use by Google+ Hangouts for
doing a form video conferencing. It has been used successfully for
several years for millions of users and conferences and has scaled
well for up to tens of simultaneous sources within a video
conference. It is used for logical sources of audio, video, and
arbitrary real-time data.
We feel that using multiple source per m=section is a proven solution
for dealing with large numbers of sources that change frequently,
because we've used it and it works. The "unknown" space has been
thoroughly explored. On the other hand, we see large risk with going
down the multiple m=section approach when dealing with many sources.
6. Conclusion
We believe that encoding multiple sources as multiple m= sections
does not scale well to many sources, has many problems, and many
unknowns. On the contrary, encoding the sources into one m= section,
as per RFC5576, scales to many sources, provides a number of
Thatcher & Uberti Expires August 8, 2013 [Page 6]
Internet-Draft Argument for multiple sources per m-line February 2013
advantages, and has fewer unknowns.
1. It allows per-source SDP attributes for most of the important
attributes.
2. It scales to many sources in cases where many m= sections don't.
3. It's been used already and has fewer unknowns.
7. Security Considerations
None.
8. IANA Considerations
None.
9. Acknowledgements
Jonathan Lennox provided the observations on large or dynamic
conferences.
Authors' Addresses
Peter Thatcher
Google
747 6th St S
Kirkland, WA 98033
USA
Email: pthatcher@gmail.com
Justin Uberti
Google
747 6th St S
Kirkland, WA 98033
USA
Email: justin@uberti.name
Thatcher & Uberti Expires August 8, 2013 [Page 7]