Internet DRAFT - draft-deng-rtcweb-svccontrol
draft-deng-rtcweb-svccontrol
Network Working Group Lingli Deng
Internet Draft Jin Peng
Intended status: Informational China Mobile
Expires: April 2013 October 15, 2012
Sender Media Control based on Local Status Detection
draft-deng-rtcweb-svccontrol-01.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 24, 2009.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
Deng, et al. Expires April 15, 2013 [Page 1]
Internet-Draft svc-control October 2012
This document proposes to add sender media control based on local
status detection by the browser, to further reduce multiparty video
conferencing's media bandwidth consumption with SVC codec for RTCWEB
use-cases.
Table of Contents
1. Introduction ................................................. 2
2. Problem Statement ............................................ 3
2.1. Existing Proposals in [use-case-draft] .................. 3
2.1.1. Receiver-harvested SVC-stream (SVC scheme) ......... 3
2.1.2. Receiver-selected multi-stream (Multi-stream scheme) 4
2.1.3. Receiver-transcoded HD stream (transcoding scheme) . 4
2.2. Analysis of existing proposals .......................... 4
3. Enhanced SVC scheme with sender control ...................... 5
3.1. Overview of the core idea ............................... 5
3.2. A simple example ........................................ 5
3.3. Extensions to the simple example ....................... 11
3.4. Applicable Scenarios ................................... 12
4. Derived Requirements......................................... 14
5. Security Considerations ..................................... 14
6. IANA Considerations ......................................... 14
7. References .................................................. 14
7.1. Normative References ................................... 14
7.2. Informative References ................................. 15
1. Introduction
As an important use-case of RTCWEB, multiparty video conferencing
has been discussed in depth in [use-case-draft], which states the
requirement for the receiving party (either a participant in the p2p
use-case or the centralized mixing server use-case) tend to render
the remote video input based on the audio input status (i.e. whether
the user is the spokesman of the conference at this moment) of the
sending party.
Assume an organization needs to establish a multiparty video
conference through a video communication system and hopes to be
connected point to point directly among the peers involved in the
process of the video conference session. Each conference participant
can send audio and video media stream to the other conference
participants. When one session participant receives the video and
audio stream from the others, he wants the conference spokesman to
be presented in a high-definition large window in the middle while
the other conference participants (including himself as local
playback) to a group of small windows by the side. The spokesman
Deng, et al. Expires April 24, 2013 [Page 2]
Internet-Draft svc-control October 2012
changes frequently from one to another with conducting of the
conference, while the participants always want to present the video
stream of the current conference spokesman in a high-definition
large window in the local browser and would like to present the
video stream of ordinary participants in small windows.
Taking into account the peer-to-peer video conferencing traffic
overhead and terminal traffic restrictions, how can we minimize the
cost of the video conferencing traffic overhead without affecting
the effects of the participants' presentation in video conference?
To satisfy the requirement that for each participant one high
resolution video is displayed in a large window, while a number of
low resolution videos are displayed in smaller windows, [use-case-
draft] provides three solutions:
1. The sender sends the high resolution SVC stream and the
receiver/server selects part or full presentation.
2. The sender sends both high resolution and low resolution stream,
and the receiver selects one to present.
3. The sender sends a high resolution stream, the server/receiver
transcodes into low/high resolution streams as required.
This proposal proposes to add sender media control based on the
determination on whether the local participant is speaking. For the sender,
high resolution media stream is only sent by the current spokesman, and the
ordinary media stream is sent by the other participants. For the receiver,
only to receive and present the HD media stream from the spokesman while
the ordinary media stream from the others in the conference. The
consumption of both the sender and receiver thus is further reduced in this
way.
2. Problem Statement
2.1. Existing Proposals in [use-case-draft]
[use-case-draft] provides three solutions to the window resizing
requirement for multiparty conferencing scenarios. Despite of their
differences elaborated in the following, they share a common nature that
they rely on the receiver to do the trick, while the sender blindly offers
HD media all the time.
2.1.1. Receiver-harvested SVC-stream (SVC scheme)
During a video conference, the sender uses the SVC coding, sends SVC HD
media streams (base layer plus extended layer) and the receiver / server
Deng, et al. Expires April 24, 2013 [Page 3]
Internet-Draft svc-control October 2012
selects the presentation based on the requirements. For example, present
the media stream from the current conference spokesman in a high resolution
full window and show part media stream (only base layer) from ordinary
participants in small windows.
2.1.2. Receiver-selected multi-stream (Multi-stream scheme)
During a video conference, the sender sends both high resolution and low
resolution media stream, the receiver choose one to present in the local
browser according to requirements. For example, the receiver presents the
high resolution media stream from the current conference spokesman and
shows low resolution media stream (only base layer) from ordinary
participants.
2.1.3. Receiver-transcoded HD stream (transcoding scheme)
During a video conference, the sender sends high resolution media stream,
and the server /receiver transcodes as requirement. For example, the
receiver transcodes the received media stream from ordinary participants
into low media stream and then present the stream in local browser.
2.2. Analysis of existing proposals
We can analyze the existing three schemes mentioned above from the two
dimensions of the flow transmission overhead and local processing consume.
Firstly, from the perspective of high resolution traffic transmission
consume, multi-stream seems the last thing to do. In terms of the SVC and
transcoding schemes, whether the user is speaking, his local browser will
send the high resolution media stream collected by local devices. If the
number of participants in the video conference based on the style of peer-
to-peer interaction is N, there would be 2*(N-1) high resolution media
stream resource consume needed in the transmission (including: the number
of high-definition video media stream sent by the local participants is N-1;
the number of high-definition video media stream received by the remote
peers is N-1, too). While in the multi-stream scheme, transmission overhead
is raised for additional low resolution media streams.
Secondly, we analyze their computation overheads from the perspectives
of the sender and receiver, respectively.
For the sender, the transcoding scheme incurs the lowest cost to
encoding the outgoing media stream. On the contrary, the multi-stream
scheme, which dictates that two versions of the local media stream be
encoded and sent separately, incurs the highest cost. The encoding cost for
a local sender to perform SVC scheme is between the above two.
Deng, et al. Expires April 24, 2013 [Page 4]
Internet-Draft svc-control October 2012
For the receiver, the transcoding scheme needs to do real-time
transcoding and therefore takes the highest computation overheads to the
receiver side (mixing server or a participating peer). In order to switch
from and to HD timely, a receiver in multi-stream scheme needs to
synchronize both high and low resolution media stream with considerable
consumption. While SVC scheme adjusts the processing consume by overlaying
/ removing the extension layer sub-stream data with a minimum extra cost.
3. Enhanced SVC scheme with sender control
3.1. Overview of the core idea
The core idea of this proposal is:
o Both ends use SVC, negotiating with the other side to establish
peer-to-peer media stream.
o The sender's browser detects local user call status by calling
some local devices, such as to detect the sustained voice input.
o The browser adjusts sending policy of different coding levels
according to the results of the detected local status:
o If the local user is the spokesman, send base layer and
extended layer media stream.
o If the local user is in a quiescent state, just send the base
layer media stream.
o The receiver's browser has the ability to call the local devices,
and presents the decoded input media stream according to SVC.
3.2. A simple example
In the multi-party video conferencing session (For example, in the
illustrative example, three users participate the video conferencing), the
workflow as shown in the figure goes as follows:
Deng, et al. Expires April 24, 2013 [Page 5]
Internet-Draft svc-control October 2012
A B C
| | |
| | |
A/B/C peer-to- |<=============>|<=============>|
peer connection |<=============================>|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Continuous | | Slient | | Slient |
|speech input| | status | | status |
+------------+ +------------+ +------------+
| | |
| | |
| Base layer + | |
|extended layer | |
|-------------->| |
| | |
| Base layer + extended layer |
|------------------------------>|
Deng, et al. Expires April 24, 2013 [Page 6]
Internet-Draft svc-control October 2012
| | Base layer |
| |-------------->|
| | |
| Base layer | |
|<--------------| |
| | |
| | |
| | Base layer |
| |<--------------|
| | |
| Base layer |
|<------------------------------|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Continuous | | Slient | | Slient |
|speech input| | status | | status |
+------------+ +------------+ +------------+
| | |
|------------+ |------------+ |------------+
Deng, et al. Expires April 24, 2013 [Page 7]
Internet-Draft svc-control October 2012
| Present A | | Present A | | Present A |
|(Large Size)| |(Large Size)| |(Large Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present B | | Present B | | Present B |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present C | | Present C | | Present C |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Slient | | Continuous | | Slient |
| status | |speech input| | status |
+------------+ +------------+ +------------+
| | |
| | |
| Base layer | |
Deng, et al. Expires April 24, 2013 [Page 8]
Internet-Draft svc-control October 2012
|-------------->| |
| | |
| Base layer |
|------------------------------>|
| | |
| | |
| | Base layer + |
| |extended layer |
| |-------------->|
| | |
| Base layer + | |
|extended layer | |
|<--------------| |
| | |
| | |
| | Base layer |
| |<--------------|
| | |
| Base layer |
|<------------------------------|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
Deng, et al. Expires April 24, 2013 [Page 9]
Internet-Draft svc-control October 2012
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Slient | | Continuous | | Slient |
| status | |speech input| | status |
+------------+ +------------+ +------------+
| | |
|------------+ |------------+ |------------+
| Present A | | Present A | | Present A |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present B | | Present B | | Present B |
|(Large Size)| |(Large Size)| |(Large Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present C | | Present C | | Present C |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
| | |
| | |
Figure 1 A B C Tripartite session flowchart
First of all, the session participant's browser of A, establishes a
peer-to-peer media stream connection with the other participants' browsers
with SVC codec. Subsequently, the browsers utilize local device's
Deng, et al. Expires April 24, 2013 [Page 10]
Internet-Draft svc-control October 2012
capability to monitor the local user audio input status. Since A is the
spokesman at beginning, A's browser detects a continuous audio input from
the local user, A's browser sends both the base and extended layers of SVC
sub-streams to the others(B and C). At the meantime, B and C, as the
listeners to A, remain silent. Therefore, B's local browser sends only the
SVC base layer media stream to the others (A and C) according to the result
of silent status detected by calling the local devices. Similarly C sends
only the base layer to A and B.
The spokesman changes from one to another as the conference continues.
For example, when A detects himself in a silent status from the local
devices, A switches to send only the base layer to the others (B and C).
Conversely, when B detects a continuous audio input from the local user, B
would change to send both the base and extended layers to the others (A and
C). While C detects no changes locally, C would continue sending base layer
to A and B.
3.3. Extensions to the simple example
The above description of a sender controlled SVC transmission modes
based on local audio input status detection, is a simple example of a more
generalized sender media control scheme, where the types of status
transition as control triggers, the mechanism to enforce media adjustment
afterwards, and whether to give JS API the ability to influence the media
behavior, may vary accordingly. In particular, a few extensions would be in
terms of:
1. Options of implementation on send-side state detection, include:
a. to make use of the DTX module at voice codec level, and
decide the strategy of video mode switching in accordance
with a given strategy (Note that the strategy of video mode
switching (whether need for local high resolution
presentation) can be different from the strategy of voice
codec (silence / voice signals),
b. to defer user presence status indirectly, by monitoring
whether the local audio input is muted from the web page, or
whether the conferencing page is currently active, etc.
2. Options of implementation on send-side media control, include:
a. to adjust the definition of the local camera, or
b. to adjust the bitrate of a given codec in use (the layer
composition in case of SVC codecs, for instance), or
Deng, et al. Expires April 24, 2013 [Page 11]
Internet-Draft svc-control October 2012
c. to trigger a re-negotiation for a new codec instead of the
one in use.
3. Options of implementation on clients include:
a. to enforce the transition detection and conduct media control
singly by the local browser, or
b. the browser offers related callback APIs for both defined
state transition events and media controlling to JS, who may
provide SP's entailed control behavior.
3.4. Applicable Scenarios
For simplicity, we used the fully distributed scenario as an example to
elaborate our proposal. It should be noticed that the proposed scheme is
also applicable and would bring benefit to a centralized mixer-based
conferencing setting.
In a fully distributed P2P conference, without any centralized mixing
server at the media plane, each participating node processes the modulation
of video stream locally. As mentioned above, the application of our
proposal would reduce the transmission overhead of both send-side nodes and
receive-side nodes, also reduces the processing and modulation overhead of
the receive-side nodes.
While in a mixing server-based conference, the dedicated mixer server
receives the participants' local captured video stream, and renders a
uniform rendered collective screen for display. The application of our
proposal will significantly reduce the transmission overhead for uploading
local video to the server, thus saving its bandwidth as a centralized sink.
3.5. Simulation Results
To further understand both the applicability and effectiveness of the
proposed sender-control mechanism, we setup a set of simple experiments to
simulate common P2P conferencing scenarios.
3.5.1. Setup
The experiment settings choices and reasoning behind are summarized as
follows:
1. Conferencing settings:
Deng, et al. Expires April 24, 2013 [Page 12]
Internet-Draft svc-control October 2012
a. We simulate up to 6-way P2P conferencing, for according to the
experience gained through a distributed conferencing services, we
found that most conferences involves 3-4 different participants,
and the current mainstream laptop supports at most 5-way
conferencing using H.264 codec.
b. We use Zipf distribution for a participants' speaking frequency and
duration values, and differentiate one-to-mul, mul-to-mul, and
roundtable conferencing, where the first stands for a lecture-like
conferencing with a dedicated lecturer, while the second for a
lecture without a dedicated lecturer, and the third for a free
discussion. In other words, in the first two types of conferences
there are dedicated listeners.
c. We use Uniform distribution for the transmission rate between a
specific pair of peers, and differentiate separate variables for
HD(uniform(500,900)KBps) and FD(uniform(100,300)KBps) streams
respectively.
2. Triggering Timer settings:
a. We evaluate the proposed scheme with variable triggering timer
values, ranging from 0 to 30 seconds. In practice, in order to
avoid frequent vibrations caused by aggressive switching for short-
lasting activities, an implementation always has to tradeoff
bandwidth reduction to traffic smoothness.
3.5.2. Results
We conduct repeated simulations and compare the mean traffic loads
between the proposed scheme and the exiting SVC scheme. The findings are
summarized as follows:
1. The theoretical traffic reduction (with the triggering timer
equals to zero) is considerable and ranges from X to Y for
different conferencing types. For instance, in 3-way scenarios,
the achieved traffic reduction is 60% for one-to-mul cases; 70%
for mul-to-mul cases; and 50% for roundtable cases.
2. Even with a reluctant triggering timer setting, the proposed
scheme outperforms existing SVC scheme in various conferencing
types. For instance, the longer timer up to 30 seconds does not
impact the reduction ratio in a noticeable way.
Deng, et al. Expires April 24, 2013 [Page 13]
Internet-Draft svc-control October 2012
4. Derived Requirements
In our proposal, video conferencing client detects the local user
session state (speaking/silence), achieving the purpose of saving media
plane transmission overhead by adjusting the sending rate of video media
stream. In order to realize it in a RTCWEB setting, additional requirements
are derived as follows:
1. Function Requirement for Browser
Fxx: The browser SHOULD be able to detect the audio input status
(speaking/ silent) of the local user.
2. API Requirements for Browser
Axx: It SHOULD be possible for the JS to be notified about the audio input
status (speaking/silent) of the local user, and to entail the media control
behavior in response.
5. Security Considerations
TBA
6. IANA Considerations
None.
7. References
7.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Crocker, D. and Overell, P.(Editors), "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, Internet Mail
Consortium and Demon Internet Ltd., November 1997.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2234] Crocker, D. and Overell, P.(Editors), "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, Internet Mail
Consortium and Demon Internet Ltd., November 1997.
Deng, et al. Expires April 24, 2013 [Page 14]
Internet-Draft svc-control October 2012
7.2. Informative References
[use-case-draft] Holmberg, C., Hakansson, S. and Eriksson, G., "Web
Real-Time Communication Use-cases and
Requirements",
draft-ietf-rtcweb-use-cases-and-requirements-09
(work in progress), June 27, 2012
Deng, et al. Expires April 24, 2013 [Page 15]
Internet-Draft svc-control October 2012
Authors' Addresses
Lingli Deng
China Mobile
Email: denglingli@chinamobile.com
Jin Peng
China Mobile
Email: pengjin@chinamobile.com
Deng, et al. Expires April 24, 2013 [Page 16]