Internet DRAFT - draft-mostafa-mmusic-sip-cp
draft-mostafa-mmusic-sip-cp
Network Working Group A. Mostafa, Ed.
Internet-Draft Avaya
Intended status: Standards Track December 23, 2011
Expires: June 25, 2012
A Mechanism for Negotiating Multi-Stream Continuous Presence Video in
SIP
draft-mostafa-mmusic-sip-cp-00
Abstract
The NextGen video conferencing clients require multiple concurrent
video streams to provide a User eXperience (UX) in which multiple
participants can be viewed at the same time, this user experience is
called Continuous Presence (CP) video. The multi-stream CP video
provides more client control of the UX and less processing on the
conference server since the video streams are relayed by the server
rather than mixed to compose a CP video stream. The client CP
layout, processing power and bandwidth limitations require a per
stream bandwidth and resolution to be negtiated in the SIP Offer/
Answer with the conference server. Standard methods are used to
achieve this negotiation in addition to a new SDP parameter. This
document explains the methodology and solution to achieve this in SIP
and SDP.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 25, 2012.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
Mostafa Expires June 25, 2012 [Page 1]
Internet-Draft SIP Multi-Stream CP Video December 2011
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Key Words . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 3
2.3. Voice Activated Switching . . . . . . . . . . . . . . . . 3
2.4. Continuous Presence . . . . . . . . . . . . . . . . . . . 3
2.5. Video Shuffling . . . . . . . . . . . . . . . . . . . . . 3
3. Multi-Stream Continuous Presence Video . . . . . . . . . . . . 4
4. Multi-Stream Continuous Presence video SIP and SDP
negotiation . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Basic SIP and SDP negotiation and flows for
multi-stream CP Video . . . . . . . . . . . . . . . . . . 5
4.1.1. Client Inbound CP Video . . . . . . . . . . . . . . . 5
4.1.2. Client Outbound Video . . . . . . . . . . . . . . . . 5
4.1.3. Audio . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2. Advanced SIP and SDP negotiation and flows for
multi-stream CP Video . . . . . . . . . . . . . . . . . . 5
4.2.1. SDP content attribute . . . . . . . . . . . . . . . . 5
4.2.2. VAS Rank . . . . . . . . . . . . . . . . . . . . . . . 7
5. Active Talker Indication . . . . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
9.1. Informative References . . . . . . . . . . . . . . . . . . 9
9.2. Normative References . . . . . . . . . . . . . . . . . . . 10
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10
Mostafa Expires June 25, 2012 [Page 2]
Internet-Draft SIP Multi-Stream CP Video December 2011
1. Overview
This document describes the SIP and SDP negotiation required for the
multi-stream CP video using video codecs such as H.264 SVC and AVC
(SVC: Scalable Video Coding, AVC: Advanced Video Coding). It covers
the CP layout use cases, grouping, shuffling and bandwidth scaling
for the CP streams.
2. Terminology
2.1. Key Words
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC
2119[RFC2119].
2.2. Abbreviations
VAS: Voice Activated Switching
CP: Continuous Presence
UX: User eXperience
BW: Bandwidth
H.264 SVC: H.264 Scalable Video Coding
H.264 AVC: H.264 Advanced Video Coding
2.3. Voice Activated Switching
Voice Activated Switching in video delivers the video of single user
in a conference to a participant, this user is the current or most
recent active speaker. For example Alice, Bob, Carol, Dave and John
are video particpants in a conference, Alice is talking, John would
see Alice's video, when Bob starts talking John sees Bob's video.
2.4. Continuous Presence
Continuous Presence in video delivers the video of multiple users in
a conference to a participant. For example Alice, Bob, Carol, Dave
and John are video particpants in a conference, John can see a
Continuous Presence video that shows Alice, Bob, Carol and Dave at
the same time on his video client, typically the video of the most
recent active speakers.
2.5. Video Shuffling
Video shuffling is used in Continuous Presence use cases. For
example Alice, Bob, Carol, Dave, Mike and John are video particpants
Mostafa Expires June 25, 2012 [Page 3]
Internet-Draft SIP Multi-Stream CP Video December 2011
in a conference, John can see a four windows Continuous Presence
video that has Alice, Bob, Carol and Dave on his client as the most
recent active speakers, when Mike starts talking he becomes the most
recent active speaker, the conference server shuffles Mike, Alice,
Bob and Carol streams in place of previous Alice, Bob, Carol and Dave
streams, this results in shuffling of particpants in the four windows
CP view on client.
3. Multi-Stream Continuous Presence Video
The Multi-Stream Continuous Presence video delivers multi-stream
video (e.g. H.264 SVC or AVC) to the client from a conference server
for the client to decode and render to the user. Continuous Presence
video displays multiple participants' windows on the client's
display, usually for the most recent active speakers. The multi-
stream video streams are negotiated using (n) video m lines in the
SDP where n > 1. example is n=4 where the CP video contains 4
participants/streams. A single video m line (n=1) means no CP and
typically display the most recent active speaker. Current video SDP
negotiation covers only the codecs used (e.g. H.264 SVC and AVC),
bit rate, number of layers used (in SVC per[RFC6190]) and direction
(recvonly, sendrcv, sendonly) but doesn't address the various aspects
of the BW optimization, the shuffling mechanism, grouping and layout
of the CP windows.
4. Multi-Stream Continuous Presence video SIP and SDP negotiation
This section describes the SIP and SDP negotiation required for the
multi-stream CP video, some use cases, flows and examples.
Audio/Video +------------+ Multistream CP video +----------+
Alice ------------>| |---------Alice-------->| |
| |---------Bob-- ------->| |
Bob ------------>| |---------Dave--------->| |
| |---------Mike--------->| |
Carol ------------>| | | |
| Conference | | Client |
Dave ------------>| Server |------Mixed Audio----->| |
| |<--------Audio---------| |
John ------------>| | | |
| | | |
Mike ------------>| |<--------Video---------| |
+------------+ +----------+
Figure 1 - Multiple Video Streams Continuous Presence
Mostafa Expires June 25, 2012 [Page 4]
Internet-Draft SIP Multi-Stream CP Video December 2011
4.1. Basic SIP and SDP negotiation and flows for multi-stream CP Video
Multi-stream CP basic negotiation is initiated or escalated by
clients where a client negotiates multiple video m lines to receive
the CP video, this could be done in the initial offer from client to
conference server or in a re-INVITE.
4.1.1. Client Inbound CP Video
Conference server MAY accept all video m lines, some, one or none
(audio only call) depending on conference server capabilities and
policies. The conference sever should use m=0 in the answer for the
m lines that it would like to reject. Conference server can re-
Invite to escalate/de-escalate the number of video streams (with m
!=0) as participants join/leave. The server SHOULD NOT add any extra
video m lines in the answer than the ones originaly offered by
client.
4.1.2. Client Outbound Video
The Conference server SHOULD NOT use more than one video m line in
the outdial to client use cases, this is to achieve better backward
compatibilty with older video clients that don't support multi-stream
video. Only the client can escalate the number of video m lines it
can receive using a re-INVITE. A separate m line for outbound video
MAY be negotiated, the outbound video MAY also be negotiated in one
of the CP inbound m lines (sendrecv).
4.1.3. Audio
A single audio stream is negotiated by a separate audio m line, the
inbound audio to client is mixed by the conference server.
4.2. Advanced SIP and SDP negotiation and flows for multi-stream CP
Video
A new SDP attribute is discussed in this section. This attribute
communicates the client preferences for the CP streams.
4.2.1. SDP content attribute
A new content attribute is negotiated in each m line by the client,
this attribute is sent by client in the video m lines negotiated in
the SDP offer/answer for CP video, follows the standard[RFC3261] and
[RFC3264].
a=content: window-id, group number, bw reduction limit, VAS Rank
Mostafa Expires June 25, 2012 [Page 5]
Internet-Draft SIP Multi-Stream CP Video December 2011
window-id = 1 digit; window1, window2, window3, ..
group number = 1-2 digits ;range 1-99, lower number = higher priority
bandwidth reduction limit = 1-3 digits ; range 0-100;
0 = no reduction allowed, 100 = full reduction is allowed.
VAS Rank = 1 digit ; range 0-9
The new content attribute is negotiated by the client to communicate
the client CP streams grouping, BW optimization and video shuffling
mechanism. There is no answer for this attribute in the response
from the server, the answer is reflected in the response m lines and
the shuffling of the video RTP. Conference servers that don't
support this attribute will ignore it and will process the offer
video m lines according to its own algorithms/preferences. The group
number specifies the group that the stream belongs to. All streams
(UI windows) in same group have same resolution/size. A group with
lower number has higher priority than higher group number. The CP
streams/windows are grouped within a layout, grouping allows the
conference server to scale down all windows in same group for BW
optimization and to deliver a uniform user experience across those
windows. The conference server should scale down the high group
number first before scaling down the next group, ex: group2 first and
then group1. The bandwidth reduction limit sets the maximum
percentage of the original bandwidth that the conference server can
reduce to satisfy the bandwidth constraints.
Client Offer SDP example. For simplicity, audio and sprop-operation-
point-info details are not shown:
v=0
o=svcsrv 289083124 289083124 IN IP4 192.0.2.2
s=conference
t=0 0
b=TIAS:812000
m=video 30000 RTP/AVP 98 97 96
c=IN IP4 192.0.2.2
a=content:window1,1,25,1
b=TIAS:512000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42401e
a=rtpmap:97 H264-SVC/90000
a=fmtp:97 profile-level-id=530016; sprop-operation-point-
info..(VGA/30)
a=rtpmap:98 H264-SVC/90000
a=fmtp:98 profile-level-id=53001e; sprop-operation-point-
Mostafa Expires June 25, 2012 [Page 6]
Internet-Draft SIP Multi-Stream CP Video December 2011
info..(720p/30)
a=sendrecv
m=video 40000 RTP/AVP 101 100 99
c=IN IP4 192.0.2.2
a=content:window2,2,50,1
b=TIAS:300000
a=rtpmap:99 H264/90000
a=fmtp:99 profile-level-id=42401e
a=rtpmap:100 H264-SVC/90000
a=fmtp:100 profile-level-id=530013; sprop-operation-point-
info..(VGA/30)
a=rtpmap:101 H264-SVC/90000
a=fmtp:101 profile-level-id=530016; sprop-operation-point-
info..(360/30)
a=recvonly
4.2.2. VAS Rank
Rules: If client wants the video stream/window from conference server
to be switched by active speaker activity, then it has to assign a
vasrank to the window. The conference server will assign the window
based on the active speaker history and rank. Rank 1 gets the most
recent speaker, rank 2 the next most recent, etc. You can have
multiple windows per rank. This allows us to minimize the shuffling
that takes place when the speakers switch in and out. If not
specified, default value is 1 for all windows (minimum shuffling).
Example (1) of shuffling in a 2x2 or 1x4 layout (4 equal sized
windows):
a=content: window1,1,100, 1 (the most recent speaker)
a=content: window2,1,100, 2 (2nd most recent)
a=content: window3,1,100, 3 (3rd most recent)
a=content: window4,1,100, 4 (4th most recent)
In this example the client's offer to the conference server has 4
video SDP m lines, a=content (second parameter) is the same for the
four m lines, indicating same priority and 4 equal sized windows.
The pa=content for each has a different vas rank value (last
parameter in the examples above). This means that the client is
requesting the conference server to always send the most recent
active speaker on first video stream negotiated in this example by
first video m line, second most active speaker on second video
stream, third most active speaker on third video stream and fourth
most recent active speaker on fourth video stream.
Example (2) of shuffling in a 2x2 or 1x4 layout (4 equal sized
windows):
Mostafa Expires June 25, 2012 [Page 7]
Internet-Draft SIP Multi-Stream CP Video December 2011
a=content: window1,1,100, 1
a=content: window2,1,100, 1
a=content: window3,1,100, 1
a=content: window4,1,100, 1
All four windows will get switched with active speaker streams. The
order will be determined by conference server to minimize shuffling.
In this example the client's offer to the conference server has 4
video SDP m lines, a=content for each has a same vas rank value (last
parameter in the examples above). This means that the client is
requesting the conference server to always minimize shuffling of
speakers on video streams sent to client, i.e. if most recent active
speaker changes, send his/her video on fourth stream replacing the
least recent active speaker, leave other three streams unchanged.
Example (3) of shuffling in a 1+3 layout (1 big + 3 small windows):
a=content: window1,1,100, 1 (the most recent speaker)
a=content: window2,2,100, 2 (2nd most recent)
a=content: window3,2,100, 3 (3rd most recent)
a=content: window4,2,100, 4 (4th most recent)
In this example the client's offer to the conference server has 4
video SDP m lines, a=content (second parameter) indicates two
priorities and 1+3 layout. The a=content for each has a different
vas rank value (last parameter in the examples above). This means
that the client is requesting the conference server to always send
the most recent active speaker on first video stream negotiated in
this example by first video m line, second most active speaker on
second video stream, third most active speaker on third video stream
and fourth most recent active speaker on fourth video stream.
Example (4) of shffling in a 1+3 layout (1 big + 3 small windows):
a=content: window1,1,100, 1
a=content: window2,2,100, 2
a=content: window3,2,100, 2
a=content: window4,2,100, 2
The big window will always get the most recent active speaker. The 3
small windows will get the next 3 most recent active speaker. The
order for these three small windows will be determined by the server
to minimize shuffling.
Example (5) of shuffling in a 1+3 layout with pinned video (1 big + 3
small windows):
a=content: window1,1,100, 1 (the most recent speaker)
a=content: window2,2,100, 2 (2nd or 3rd most recent)
a=content: window3,2,100, 2 (2nd or 3rd most recent)
a=content: window4,2,100, 0 (pinned / not switched based on speaker
Mostafa Expires June 25, 2012 [Page 8]
Internet-Draft SIP Multi-Stream CP Video December 2011
activity))
In this example the fourth m line has vas rank of 0, which means this
video stream will not be switched and is pinned to a certain user
regardles of his/her voice activity.
5. Active Talker Indication
The audio and video active talker indications use the RTP CSRC in the
audio and video RTP [RFC3550]. The SSRC's in the RTP CSRC list is
mapped to userid/user name using the RFC4575 notifications. Only one
SSRC is sent in the video RTP CSRC list, client can use this to
display the user name on each CP video window.
6. Security Considerations
The multi-stream CP video uses the TLS and sRTP standards for SIP
signaling and media securtiy.
7. IANA Considerations
This document has no actions for IANA.
8. Acknowledgements
Thanks to Alan Johnston, Dan Romascanu, Peter Musgrave and Rifaat
Shekh-Yusef for their review of the document and comments.
9. References
9.1. Informative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
"RTP Payload Format for Scalable Video Coding", RFC 6190,
May 2011.
Mostafa Expires June 25, 2012 [Page 9]
Internet-Draft SIP Multi-Stream CP Video December 2011
9.2. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
A., Peterson, J., Sparks, R., Handley, M., and E.
Schooler, "SIP: Session Initiation Protocol", RFC 3261,
June 2002.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264,
June 2002.
Author's Address
Adel Mostafa (editor)
Avaya
Toronto, Ontario
Canada
Email: amostafa@avaya.com
Mostafa Expires June 25, 2012 [Page 10]