Internet DRAFT - draft-mostafa-mmusic-sip-cp

draft-mostafa-mmusic-sip-cp






Network Working Group                                    A. Mostafa, Ed.
Internet-Draft                                                     Avaya
Intended status: Standards Track                       December 23, 2011
Expires: June 25, 2012


 A Mechanism for Negotiating Multi-Stream Continuous Presence Video in
                                  SIP
                     draft-mostafa-mmusic-sip-cp-00

Abstract

   The NextGen video conferencing clients require multiple concurrent
   video streams to provide a User eXperience (UX) in which multiple
   participants can be viewed at the same time, this user experience is
   called Continuous Presence (CP) video.  The multi-stream CP video
   provides more client control of the UX and less processing on the
   conference server since the video streams are relayed by the server
   rather than mixed to compose a CP video stream.  The client CP
   layout, processing power and bandwidth limitations require a per
   stream bandwidth and resolution to be negtiated in the SIP Offer/
   Answer with the conference server.  Standard methods are used to
   achieve this negotiation in addition to a new SDP parameter.  This
   document explains the methodology and solution to achieve this in SIP
   and SDP.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on June 25, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Mostafa                   Expires June 25, 2012                 [Page 1]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     2.1.  Key Words  . . . . . . . . . . . . . . . . . . . . . . . .  3
     2.2.  Abbreviations  . . . . . . . . . . . . . . . . . . . . . .  3
     2.3.  Voice Activated Switching  . . . . . . . . . . . . . . . .  3
     2.4.  Continuous Presence  . . . . . . . . . . . . . . . . . . .  3
     2.5.  Video Shuffling  . . . . . . . . . . . . . . . . . . . . .  3
   3.  Multi-Stream Continuous Presence Video . . . . . . . . . . . .  4
   4.  Multi-Stream Continuous Presence video SIP and SDP
       negotiation  . . . . . . . . . . . . . . . . . . . . . . . . .  4
     4.1.  Basic SIP and SDP negotiation and flows for
           multi-stream CP Video  . . . . . . . . . . . . . . . . . .  5
       4.1.1.  Client Inbound CP Video  . . . . . . . . . . . . . . .  5
       4.1.2.  Client Outbound Video  . . . . . . . . . . . . . . . .  5
       4.1.3.  Audio  . . . . . . . . . . . . . . . . . . . . . . . .  5
     4.2.  Advanced SIP and SDP negotiation and flows for
           multi-stream CP Video  . . . . . . . . . . . . . . . . . .  5
       4.2.1.  SDP content attribute  . . . . . . . . . . . . . . . .  5
       4.2.2.  VAS Rank . . . . . . . . . . . . . . . . . . . . . . .  7
   5.  Active Talker Indication . . . . . . . . . . . . . . . . . . .  9
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     9.1.  Informative References . . . . . . . . . . . . . . . . . .  9
     9.2.  Normative References . . . . . . . . . . . . . . . . . . . 10
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10











Mostafa                   Expires June 25, 2012                 [Page 2]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


1.  Overview

   This document describes the SIP and SDP negotiation required for the
   multi-stream CP video using video codecs such as H.264 SVC and AVC
   (SVC: Scalable Video Coding, AVC: Advanced Video Coding).  It covers
   the CP layout use cases, grouping, shuffling and bandwidth scaling
   for the CP streams.


2.  Terminology

2.1.  Key Words

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC
   2119[RFC2119].

2.2.  Abbreviations

   VAS: Voice Activated Switching
   CP: Continuous Presence
   UX: User eXperience
   BW: Bandwidth
   H.264 SVC: H.264 Scalable Video Coding
   H.264 AVC: H.264 Advanced Video Coding

2.3.  Voice Activated Switching

   Voice Activated Switching in video delivers the video of single user
   in a conference to a participant, this user is the current or most
   recent active speaker.  For example Alice, Bob, Carol, Dave and John
   are video particpants in a conference, Alice is talking, John would
   see Alice's video, when Bob starts talking John sees Bob's video.

2.4.  Continuous Presence

   Continuous Presence in video delivers the video of multiple users in
   a conference to a participant.  For example Alice, Bob, Carol, Dave
   and John are video particpants in a conference, John can see a
   Continuous Presence video that shows Alice, Bob, Carol and Dave at
   the same time on his video client, typically the video of the most
   recent active speakers.

2.5.  Video Shuffling

   Video shuffling is used in Continuous Presence use cases.  For
   example Alice, Bob, Carol, Dave, Mike and John are video particpants



Mostafa                   Expires June 25, 2012                 [Page 3]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   in a conference, John can see a four windows Continuous Presence
   video that has Alice, Bob, Carol and Dave on his client as the most
   recent active speakers, when Mike starts talking he becomes the most
   recent active speaker, the conference server shuffles Mike, Alice,
   Bob and Carol streams in place of previous Alice, Bob, Carol and Dave
   streams, this results in shuffling of particpants in the four windows
   CP view on client.


3.  Multi-Stream Continuous Presence Video

   The Multi-Stream Continuous Presence video delivers multi-stream
   video (e.g.  H.264 SVC or AVC) to the client from a conference server
   for the client to decode and render to the user.  Continuous Presence
   video displays multiple participants' windows on the client's
   display, usually for the most recent active speakers.  The multi-
   stream video streams are negotiated using (n) video m lines in the
   SDP where n > 1. example is n=4 where the CP video contains 4
   participants/streams.  A single video m line (n=1) means no CP and
   typically display the most recent active speaker.  Current video SDP
   negotiation covers only the codecs used (e.g.  H.264 SVC and AVC),
   bit rate, number of layers used (in SVC per[RFC6190]) and direction
   (recvonly, sendrcv, sendonly) but doesn't address the various aspects
   of the BW optimization, the shuffling mechanism, grouping and layout
   of the CP windows.


4.  Multi-Stream Continuous Presence video SIP and SDP negotiation

   This section describes the SIP and SDP negotiation required for the
   multi-stream CP video, some use cases, flows and examples.



        Audio/Video +------------+ Multistream CP video  +----------+
Alice  ------------>|            |---------Alice-------->|          |
                    |            |---------Bob-- ------->|          |
Bob    ------------>|            |---------Dave--------->|          |
                    |            |---------Mike--------->|          |
Carol  ------------>|            |                       |          |
                    | Conference |                       |  Client  |
Dave   ------------>|   Server   |------Mixed Audio----->|          |
                    |            |<--------Audio---------|          |
John   ------------>|            |                       |          |
                    |            |                       |          |
Mike   ------------>|            |<--------Video---------|          |
                    +------------+                       +----------+
                    Figure 1 - Multiple Video Streams Continuous Presence



Mostafa                   Expires June 25, 2012                 [Page 4]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


4.1.  Basic SIP and SDP negotiation and flows for multi-stream CP Video

   Multi-stream CP basic negotiation is initiated or escalated by
   clients where a client negotiates multiple video m lines to receive
   the CP video, this could be done in the initial offer from client to
   conference server or in a re-INVITE.

4.1.1.  Client Inbound CP Video

   Conference server MAY accept all video m lines, some, one or none
   (audio only call) depending on conference server capabilities and
   policies.  The conference sever should use m=0 in the answer for the
   m lines that it would like to reject.  Conference server can re-
   Invite to escalate/de-escalate the number of video streams (with m
   !=0) as participants join/leave.  The server SHOULD NOT add any extra
   video m lines in the answer than the ones originaly offered by
   client.

4.1.2.  Client Outbound Video

   The Conference server SHOULD NOT use more than one video m line in
   the outdial to client use cases, this is to achieve better backward
   compatibilty with older video clients that don't support multi-stream
   video.  Only the client can escalate the number of video m lines it
   can receive using a re-INVITE.  A separate m line for outbound video
   MAY be negotiated, the outbound video MAY also be negotiated in one
   of the CP inbound m lines (sendrecv).

4.1.3.  Audio

   A single audio stream is negotiated by a separate audio m line, the
   inbound audio to client is mixed by the conference server.

4.2.  Advanced SIP and SDP negotiation and flows for multi-stream CP
      Video

   A new SDP attribute is discussed in this section.  This attribute
   communicates the client preferences for the CP streams.

4.2.1.  SDP content attribute

   A new content attribute is negotiated in each m line by the client,
   this attribute is sent by client in the video m lines negotiated in
   the SDP offer/answer for CP video, follows the standard[RFC3261] and
   [RFC3264].

   a=content: window-id, group number, bw reduction limit, VAS Rank




Mostafa                   Expires June 25, 2012                 [Page 5]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   window-id = 1 digit; window1, window2, window3, ..

   group number = 1-2 digits ;range 1-99, lower number = higher priority

   bandwidth reduction limit = 1-3 digits ; range 0-100;
   0 = no reduction allowed, 100 = full reduction is allowed.

   VAS Rank = 1 digit ; range 0-9

   The new content attribute is negotiated by the client to communicate
   the client CP streams grouping, BW optimization and video shuffling
   mechanism.  There is no answer for this attribute in the response
   from the server, the answer is reflected in the response m lines and
   the shuffling of the video RTP.  Conference servers that don't
   support this attribute will ignore it and will process the offer
   video m lines according to its own algorithms/preferences.  The group
   number specifies the group that the stream belongs to.  All streams
   (UI windows) in same group have same resolution/size.  A group with
   lower number has higher priority than higher group number.  The CP
   streams/windows are grouped within a layout, grouping allows the
   conference server to scale down all windows in same group for BW
   optimization and to deliver a uniform user experience across those
   windows.  The conference server should scale down the high group
   number first before scaling down the next group, ex: group2 first and
   then group1.  The bandwidth reduction limit sets the maximum
   percentage of the original bandwidth that the conference server can
   reduce to satisfy the bandwidth constraints.

   Client Offer SDP example.  For simplicity, audio and sprop-operation-
   point-info details are not shown:

   v=0
   o=svcsrv 289083124 289083124 IN IP4 192.0.2.2
   s=conference
   t=0 0
   b=TIAS:812000

   m=video 30000 RTP/AVP 98 97 96
   c=IN IP4 192.0.2.2
   a=content:window1,1,25,1
   b=TIAS:512000
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42401e
   a=rtpmap:97 H264-SVC/90000
   a=fmtp:97 profile-level-id=530016; sprop-operation-point-
   info..(VGA/30)
   a=rtpmap:98 H264-SVC/90000
   a=fmtp:98 profile-level-id=53001e; sprop-operation-point-



Mostafa                   Expires June 25, 2012                 [Page 6]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   info..(720p/30)
   a=sendrecv

   m=video 40000 RTP/AVP 101 100 99
   c=IN IP4 192.0.2.2
   a=content:window2,2,50,1
   b=TIAS:300000
   a=rtpmap:99 H264/90000
   a=fmtp:99 profile-level-id=42401e
   a=rtpmap:100 H264-SVC/90000
   a=fmtp:100 profile-level-id=530013; sprop-operation-point-
   info..(VGA/30)
   a=rtpmap:101 H264-SVC/90000
   a=fmtp:101 profile-level-id=530016; sprop-operation-point-
   info..(360/30)
   a=recvonly

4.2.2.  VAS Rank

   Rules: If client wants the video stream/window from conference server
   to be switched by active speaker activity, then it has to assign a
   vasrank to the window.  The conference server will assign the window
   based on the active speaker history and rank.  Rank 1 gets the most
   recent speaker, rank 2 the next most recent, etc.  You can have
   multiple windows per rank.  This allows us to minimize the shuffling
   that takes place when the speakers switch in and out.  If not
   specified, default value is 1 for all windows (minimum shuffling).

   Example (1) of shuffling in a 2x2 or 1x4 layout (4 equal sized
   windows):
   a=content: window1,1,100, 1 (the most recent speaker)
   a=content: window2,1,100, 2 (2nd most recent)
   a=content: window3,1,100, 3 (3rd most recent)
   a=content: window4,1,100, 4 (4th most recent)

   In this example the client's offer to the conference server has 4
   video SDP m lines, a=content (second parameter) is the same for the
   four m lines, indicating same priority and 4 equal sized windows.
   The pa=content for each has a different vas rank value (last
   parameter in the examples above).  This means that the client is
   requesting the conference server to always send the most recent
   active speaker on first video stream negotiated in this example by
   first video m line, second most active speaker on second video
   stream, third most active speaker on third video stream and fourth
   most recent active speaker on fourth video stream.

   Example (2) of shuffling in a 2x2 or 1x4 layout (4 equal sized
   windows):



Mostafa                   Expires June 25, 2012                 [Page 7]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   a=content: window1,1,100, 1
   a=content: window2,1,100, 1
   a=content: window3,1,100, 1
   a=content: window4,1,100, 1
   All four windows will get switched with active speaker streams.  The
   order will be determined by conference server to minimize shuffling.

   In this example the client's offer to the conference server has 4
   video SDP m lines, a=content for each has a same vas rank value (last
   parameter in the examples above).  This means that the client is
   requesting the conference server to always minimize shuffling of
   speakers on video streams sent to client, i.e. if most recent active
   speaker changes, send his/her video on fourth stream replacing the
   least recent active speaker, leave other three streams unchanged.

   Example (3) of shuffling in a 1+3 layout (1 big + 3 small windows):
   a=content: window1,1,100, 1 (the most recent speaker)
   a=content: window2,2,100, 2 (2nd most recent)
   a=content: window3,2,100, 3 (3rd most recent)
   a=content: window4,2,100, 4 (4th most recent)

   In this example the client's offer to the conference server has 4
   video SDP m lines, a=content (second parameter) indicates two
   priorities and 1+3 layout.  The a=content for each has a different
   vas rank value (last parameter in the examples above).  This means
   that the client is requesting the conference server to always send
   the most recent active speaker on first video stream negotiated in
   this example by first video m line, second most active speaker on
   second video stream, third most active speaker on third video stream
   and fourth most recent active speaker on fourth video stream.

   Example (4) of shffling in a 1+3 layout (1 big + 3 small windows):
   a=content: window1,1,100, 1
   a=content: window2,2,100, 2
   a=content: window3,2,100, 2
   a=content: window4,2,100, 2

   The big window will always get the most recent active speaker.  The 3
   small windows will get the next 3 most recent active speaker.  The
   order for these three small windows will be determined by the server
   to minimize shuffling.

   Example (5) of shuffling in a 1+3 layout with pinned video (1 big + 3
   small windows):
   a=content: window1,1,100, 1 (the most recent speaker)
   a=content: window2,2,100, 2 (2nd or 3rd most recent)
   a=content: window3,2,100, 2 (2nd or 3rd most recent)
   a=content: window4,2,100, 0 (pinned / not switched based on speaker



Mostafa                   Expires June 25, 2012                 [Page 8]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


   activity))

   In this example the fourth m line has vas rank of 0, which means this
   video stream will not be switched and is pinned to a certain user
   regardles of his/her voice activity.


5.  Active Talker Indication

   The audio and video active talker indications use the RTP CSRC in the
   audio and video RTP [RFC3550].  The SSRC's in the RTP CSRC list is
   mapped to userid/user name using the RFC4575 notifications.  Only one
   SSRC is sent in the video RTP CSRC list, client can use this to
   display the user name on each CP video window.


6.  Security Considerations

   The multi-stream CP video uses the TLS and sRTP standards for SIP
   signaling and media securtiy.


7.  IANA Considerations

   This document has no actions for IANA.


8.  Acknowledgements

   Thanks to Alan Johnston, Dan Romascanu, Peter Musgrave and Rifaat
   Shekh-Yusef for their review of the document and comments.


9.  References

9.1.  Informative References

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
              "RTP Payload Format for Scalable Video Coding", RFC 6190,
              May 2011.







Mostafa                   Expires June 25, 2012                 [Page 9]

Internet-Draft          SIP Multi-Stream CP Video          December 2011


9.2.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.


Author's Address

   Adel Mostafa (editor)
   Avaya
   Toronto, Ontario
   Canada

   Email: amostafa@avaya.com




























Mostafa                   Expires June 25, 2012                [Page 10]