pthatcher-mmusic-many-sources-00.txt

Internet DRAFT - draft-pthatcher-mmusic-many-sources
draft-pthatcher-mmusic-many-sources

Last Version:	draft-pthatcher-mmusic-many-sources-00.txt	Tracker Entry
Date:	`05-Feb-2013`
Disposition:	expired




Network Working Group                                        P. Thatcher
Internet-Draft                                                 J. Uberti
Intended status: Standards Track                                  Google
Expires: August 8, 2013                                 February 4, 2013


 An argument for encoding multiple media sources per m= section of SDP.
                 draft-pthatcher-mmusic-many-sources-00

Abstract

   This document explains why it is preferable to have multiple media
   sources encoded in SDP as one m= section rather than many m=
   sections, especially when there are a large number of video sources
   which change frequently and need dynamic resolution changes, such as
   in a video conferencing system.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 8, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Thatcher & Uberti        Expires August 8, 2013                 [Page 1]

Internet-Draft  Argument for multiple sources per m-line   February 2013


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  The downsides of many, frequently changing m= sections. . . . . 3
   3.  Responses to the benefits of many m= sections.  . . . . . . . . 4
   4.  Other considerations for a video conferencing system. . . . . . 5
     4.1.  Per-source negotiation is not a major factor  . . . . . . . 5
     4.2.  Source selection  . . . . . . . . . . . . . . . . . . . . . 5
     4.3.  Extensibility to very large or dynamic conferences  . . . . 6
   5.  Use in Production . . . . . . . . . . . . . . . . . . . . . . . 6
   6.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 7
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 7
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 7




































Thatcher & Uberti        Expires August 8, 2013                 [Page 2]

Internet-Draft  Argument for multiple sources per m-line   February 2013


1.  Introduction

   There's a long debate over whether to encode multiple media sources
   in SDP as multiple m= sections or whether to encode multiple sources
   into a single m= section, such as in RFC5576.

   When there are a large number of sources (10-100, such as in a video
   conferencing system), multiple m= sections without BUNDLE would
   require many (10-100) transports, which would incur significant
   overhead and fragility.  This alone would be sufficient argument for
   not using multiple m= sections when dealing with large numbers of
   sources.

   However, what if we can combine BUNDLE and multiple m= sections?
   Would that be usable for scenarios with many sources that change
   frequently (such as a video conferencing system)?

   We believe that such an approach (of BUNDLE + many-m-sections) has a
   number of significant problems when dealing with many sources, and
   that an approach using multiple sources m= section (as in RFC5576)
   should be preferred.


2.  The downsides of many, frequently changing m= sections.

   When there are many sources, and when both sides can add or remove
   sources at any time (which is common in a video conferencing system),
   using many m= sections has a number of serious problems.
   o  If the answerer has more sources than the offerer, the answerer
      cannot put them in the answer, since the answer must have an equal
      number of m= sections as the offer.
   o  If both ends wish to add or remove sources at the same time, the
      result is signaling glare, since m= sections are identified by
      index.  It's unclear how this would be resolved efficiently (a
      known unknown).
   o  If one side adds or removes a source at the same time that the
      other refers to a source (such as signaling a desired resolution
      for a specific source) the result is more complex signaling glare,
      again due to the lookup by index.  It's unclear how this would be
      resolved efficiently (another known unknown).
   o  Adding or removing a source requires sending a full list of all
      sources, which doesn't scale well to 10s or 100s of sources,
      especially when they are added or removed frequently.
   o  Because this solution requires BUNDLE, which is still not widely
      supported, it can be deployed in fewer situations, and will take
      longer to reach usable support.  There are many open questions
      regarding BUNDLE, including whether partial BUNDLEing is allowed;
      if it is, then adding a new media stream with BUNDLE will result



Thatcher & Uberti        Expires August 8, 2013                 [Page 3]

Internet-Draft  Argument for multiple sources per m-line   February 2013


      in having to generate new transport candidates, in the event the
      remote side doesn't want to BUNDLE the new stream.
   o  With BUNDLE, use of RFC5576 a=ssrc attributes is required in order
      for the recipient to be able to properly demultiplex incoming RTP.
      Payload type isn't sufficient as a demux point, since RTCP packets
      don't contain RTP payload types.  Therefore, this approach
      requires both sides to understand RFC5576, just like the single-m-
      section approach.
   o  BUNDLE has many complex rules regarding how the various SDP
      attributes in the different m= lines need to be constructed, since
      there is only a single RTP session.  For one, all payload types
      need to be unique, which could complicate the use of per-source
      attributes with many sources; if having per-source attributes
      required the payload types to be different, and there were so many
      sources that the payload type space were exhausted, then what
      would we do?  There are lots of "unknown unknowns" here.  Or, as I
      like to say "thar be dragons".


3.  Responses to the benefits of many m= sections.

   The idea of using many m= sections with BUNDLE is not without merit.
   The major benefits that have been suggested are:
   1.  It allows per-source SDP attributes.
   2.  We have to support multiple m= sections anyway.
   3.  There's no advantage to multi-source m= sections.
   4.  There are many unknowns with having multiple sources per m=
       section.

   However:
   1.  For a multi-source m= section, support for most of the important
       SDP attributes is already defined on a per-source basis in
       RFC5576, and the few remaining attributes already have proposals.
       As an example, if we wanted to have two audio sources, one with a
       high bitrate and one with a low bitrate, we could describe that
       using the per-source fmtp parameters defined in RFC5576 to
       indicate, say, high bitrate Opus for one source and low bitrate
       Opus for the other.  In short, this is a solved problem for the
       multi-source m= section approach, and so the multi-m=section
       approach doesn't have much of an advantage.
   2.  While multiple m= sections will still be needed for a finite
       number of legacy cases, that does not mean multiple m= sections
       is suitable for all cases, including those with many, frequently
       changing sources.  As has been shown, there are a number of
       problems.  We can continue to use multiple m= sections in certain
       cases for legacy interop while using multiple sources per m=
       section going forward so that we can support many sources without
       the stated problems.



Thatcher & Uberti        Expires August 8, 2013                 [Page 4]

Internet-Draft  Argument for multiple sources per m-line   February 2013


   3.  As mentioned, many m= sections has a number of significant
       problems, and the multi-source m= section avoids them.  Thus, it
       does have advantages.
   4.  The RFC5576-based approach has already been in use for many years
       in a major video conferencing system (Google+ Hangouts), and has
       worked very well.  We feel that we have searched the "unknown"
       space and solved all of the major issues.  In other words, we
       feel that the multi-source m= section is mostly a solved problem
       for the case of many sources that change frequently, such as in a
       video conferencing system.  On the other hand, the multi m=
       section approach has never been tried by anyone, that we are
       aware of, with a large number of m= sections and so almost be
       definition has many unknown unknowns.


4.  Other considerations for a video conferencing system.

4.1.  Per-source negotiation is not a major factor

   In a video conferencing system, such as Google+ Hangouts, we have
   found that it is not only unnecessary to negotiate per-source codecs,
   RTP header extensions, etc, it's actually something we don't want to
   do, because it is simply more complexity that isn't needed.  We have
   found that it is best to negotiate the codecs, RTP header extensions,
   etc, once, independent of sources, and then simply select from the
   negotiated options the settings for each source.  This selection can
   be done by the receiver, but in many cases (e.g. selection of
   bitrate), it is far better for the sender to make this choice.  The
   sender is aware of the details of the media it is sending, and the
   bandwidth estimate to the receiver, so it is in a far better position
   to decide with what options a given source should be encoded (and it
   can change these options at any time with no signaling delay).

   Thus, from our view, the per-source codec, RTP header extension, etc,
   negotiation of the multi m= section approach is not an advantage, but
   rather an additional complexity.

4.2.  Source selection

   It is desirable in a video conferencing system to allow the receiver
   to tell the sender what resolution to send on a per-source basis, and
   to change that value at any time throughout the call.  As discussed,
   the multiple m= section approach has a problem when once side selects
   a source resolution while the other side modifies the list of
   sources.

   A multi-source m= section approach avoids this, and work as already
   begun defining source selection SDP, as in https://tools.ietf.org/



Thatcher & Uberti        Expires August 8, 2013                 [Page 5]

Internet-Draft  Argument for multiple sources per m-line   February 2013


   html/draft-lennox-mmusic-sdp-source-selection-02

4.3.  Extensibility to very large or dynamic conferences

   SDP, by its nature, describes the full state of a session, with no
   support for partial updates.  Thus, any mechanism for describing or
   controlling sources based on SDP will become impractical as the
   number of sources becomes very large (hundreds or more) or the list
   of sources changes quickly.  In such scenarios, other mechanisms for
   describing or controlling sources are preferred.  Several mechanisms
   along these lines have been proposed, including the CLUE signaling
   channel (sent on an independent SCTP data channel), extensions to the
   XCON conference event package (using SIP SUBSCRIBE/NOTIFY), or the
   Codec Operation Point (COP) messages (sent over RTCP).  In all these
   cases, however, SDP mechanisms will be used to describe the RTP
   sessions that carry the sources that are controlled or described
   externally.  It is very desirable that the syntax of the SDP
   descriptions of these sessions be as similar as possible to that of
   sessions with sources described in SDP.  Multi-source m= sections
   meet this naturally -- the sessions controlled externally would
   simply omit the a=ssrc attributes.  The multiple m= line proposal, by
   contrast, does not appear to have any similarly natural way to extend
   itself to external control.


5.  Use in Production

   We'd like to reiterate that an RFC-5576-based approach, with multiple
   sources per m= section is already in use by Google+ Hangouts for
   doing a form video conferencing.  It has been used successfully for
   several years for millions of users and conferences and has scaled
   well for up to tens of simultaneous sources within a video
   conference.  It is used for logical sources of audio, video, and
   arbitrary real-time data.

   We feel that using multiple source per m=section is a proven solution
   for dealing with large numbers of sources that change frequently,
   because we've used it and it works.  The "unknown" space has been
   thoroughly explored.  On the other hand, we see large risk with going
   down the multiple m=section approach when dealing with many sources.


6.  Conclusion

   We believe that encoding multiple sources as multiple m= sections
   does not scale well to many sources, has many problems, and many
   unknowns.  On the contrary, encoding the sources into one m= section,
   as per RFC5576, scales to many sources, provides a number of



Thatcher & Uberti        Expires August 8, 2013                 [Page 6]

Internet-Draft  Argument for multiple sources per m-line   February 2013


   advantages, and has fewer unknowns.
   1.  It allows per-source SDP attributes for most of the important
       attributes.
   2.  It scales to many sources in cases where many m= sections don't.
   3.  It's been used already and has fewer unknowns.


7.  Security Considerations

   None.


8.  IANA Considerations

   None.


9.  Acknowledgements

   Jonathan Lennox provided the observations on large or dynamic
   conferences.


Authors' Addresses

   Peter Thatcher
   Google
   747 6th St S
   Kirkland, WA  98033
   USA

   Email: pthatcher@gmail.com


   Justin Uberti
   Google
   747 6th St S
   Kirkland, WA  98033
   USA

   Email: justin@uberti.name










Thatcher & Uberti        Expires August 8, 2013                 [Page 7]