Internet DRAFT - draft-hansen-clue-consumer-layout
draft-hansen-clue-consumer-layout
CLUE R. Hansen
Internet-Draft Cisco Systems
Intended status: Standards Track A. Pepperell
Expires: December 2, 2012 Silverflare
A. Romanow
B. Baldino
Cisco Systems
M. Duckworth
Polycom
May 31, 2012
The need for consumer spatial information in CLUE
draft-hansen-clue-consumer-layout-00
Abstract
This draft is for discussion in the CLUE working group. It proposes
adding the ability for the consumer to provide specific information
to the provider.
This document proposes allowing consumers to include spatial
parameters in their consumer requests to providers in order to
improve the provider's ability to assign media to streams in a way
that is helpful for rendering. The solution proposed here is in
partial response to CLUE Task #10, Does Framework provide sufficient
info for receiver?
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 2, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
Hansen, et al. Expires December 2, 2012 [Page 1]
Internet-Draft Consumer spatial information in CLUE May 2012
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Motiviation - Conferencing in CLUE . . . . . . . . . . . . . . 3
4. Issues associated with subscribing to multiple switched
captures . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Provider advertising spatially-related switched
captures . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Consumer includes optional spatial information . . . . . . . . 6
5.1. Applicability of consumer spatial information to audio . . 8
6. Implications and conclusions . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
Hansen, et al. Expires December 2, 2012 [Page 2]
Internet-Draft Consumer spatial information in CLUE May 2012
1. Introduction
This draft notes some limitations of CLUE when it comes to correctly
rendering video under certain conditions, and proposes the optional
addition of spatial information by the consumer to resolve these
issues. This does not imply that the authors believe that the
proposed solution is the only option available; rather, this draft is
meant as a starting point for discussion.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and
indicate requirement levels for compliant implementations.
3. Motiviation - Conferencing in CLUE
The current methodology of the CLUE framework
[I-D.ietf-clue-framework] is well suited to the case of systems with
a relatively static set of capture devices. However, scenarios with
a much more dynamic set of capture devices being presented to
consumers, such as a voice-switched conferencing where multiple
endpoints connect to a middle box such as an MCU, present additional
challenges. An example of such a scenario is shown below, with four
endpoints A, B, C and D in a conference:
+-----+
+---+ / \ +---+
| A |----/ \----| B |
+---+ / \ +---+
+ MCU +
+---+ \ / +---+
| C |----\ /----| D |
+---+ \ / +---+
+-----+
In this scenario endpoint A is not directly connected to any of the
other endpoints and so will not have the capture information
associated with their media streams directly available.
One approach is for the MCU to advertise B, C and D's captures as
separate capture scenes to A - A can then subscribe to any capture
from any of the other endpoints.
However, as the size of the conference increases the number of
Hansen, et al. Expires December 2, 2012 [Page 3]
Internet-Draft Consumer spatial information in CLUE May 2012
captures that must be advertised will quickly become impractical.
Further, in many conferencing scenarios, endpoints do not wish to
specify the endpoints they want to see - instead they wish to see the
video and audio from the 'most relevant' endpoints as determined by
the MCU (where relevance is usually determined by audio activity
level). Finally, advertising all available captures in this fashion
can be problematic in the case of captures that are simultaneously
exclusive, as one consumer may ask for one and a second for its
mutually exclusive partner.
As such, the MCU has the ability in CLUE to advertise switched
captures; these don't directly represent specific real video or audio
captures. Instead, subscribing to one of these captures means that
the provider will switch the stream it sends to the consumer based on
its internal logic. In the example above, the MCU might advertise a
single, switched video capture to A; if A subscribed to this then the
MCU would forward the video stream from B, C or D based on which it
felt was most relevant (often calculated based on the loudness of an
associated audio stream).
4. Issues associated with subscribing to multiple switched captures
As such, The consumer A from the previous example can subscribe to
one or more of these switched captures and will receive that many
streams from the MCU, switched from their originating source.
However, A does not receive the spatial capture information from the
originating source associated with these streams alongside the RTP
packets. As a result things become more complicated when A
subscribes to multiple video captures, and when the other endpoints
provide multiple video streams with correlated spatial information.
For example, if A is a three-screen system and hence requests three
streams, if all the streams it receives are independent it can render
them as it wishes, as shown below where it receives one stream from
each of B, C and D:
+------+ +------+ +------+
| | | | | |
| B | | C | | D |
| | | | | |
+------+ +------+ +------+
However, if A receives more than one stream from a particular
endpoint and these streams have related spatial relationships then it
is possible for A to lay them out erroneously. This is illustrated
below, where A is receiving three streams of video that originated at
B, which should correctly be ordered (L)eft, (C)enter, (R)ight:
Hansen, et al. Expires December 2, 2012 [Page 4]
Internet-Draft Consumer spatial information in CLUE May 2012
+------+ +------+ +------+
| | | | | |
| B(L) | | B(C) | | B(R) | Correct
| | | | | |
+------+ +------+ +------+
+------+ +------+ +------+
| | | | | |
| B(C) | | B(R) | | B(L) | Incorrect
| | | | | |
+------+ +------+ +------+
When laid out incorrectly this leads to objects (such as a person
being viewed) being split into sections displayed in disparate, non-
contiguous locations.
This problem could be solved if A had the spatial capture information
from B. In a small conference it may be possible for the middle box
to pre-send all the capture information from all other endpoints to A
(and to every other endpoint), but as the number of captures per
endpoint and the number of endpoints in a conference rise caching all
the data becomes impractical.
An alternative would be for A to request the originating capture
information for streams it is receiving, or for the MCU to send it
whenever it switches streams. However, because the RTP packets and
the CLUE capture information will be sent in separate channels this
will lead to cases where A is receiving RTP packets but has not yet
received the corresponding capture data and the same problem occurs.
The endpoint must then choose between displaying nothing or risk
making incorrect layout choices.
4.1. Provider advertising spatially-related switched captures
One tool that already exists within the CLUE framework that can be
used to partially solve this problem is the MCU including spatial
information for the switched captures it advertised. In this case,
for example, it would advertise three captures with area of capture
information for each that portray them as the left, center and right
captures of a single hypothetical room. In this case, when the MCU
has unrelated one-screen streams to send to A it can associate them
with whichever switched capture it chooses. But when sending a two-
or three-screen set of streams it can ensure that they are correctly
laid out adjacent to each other and in the correct order. A could
then request these three captures and render the streams
appropriately on its left, center and right screen, needing to take
no action to ensure that the streams are correctly laid out.
Hansen, et al. Expires December 2, 2012 [Page 5]
Internet-Draft Consumer spatial information in CLUE May 2012
However, this solution is not sufficient for all use-cases. The
issue is that the MCU will need to advertise a suitable separate
group of switched captures for each endpoint configuration that could
connect to it. If the possible endpoint configurations are limited,
this may still represent a plausible number; for instance, an MCU
that wanted to support endpoints with one, two, three or four screens
laid out contigously left-to-right could advertise a capture set with
the following entries:
{
[VC0]
[VC1, VC2]
[VC3, VC4, VC5]
[VC6, VC7, VC8, VC9]
}
where VC0 was a single switched capture, VC1 and VC2 were two
switched captures each representing half the scene, and so on.
But this means that the MCU is only able to support certain pre-
defined layouts - supporting additional configurations of screens
(such as a 2x2 array) requires a new entry for each, and designing a
new endpoint configuration means updating all the MCUs it
interoperates with. This problem becomes particularly acute if the
endpoint has many screens, or wants to perform local composition
(subscribing to multiple streams per screen and rendering them
locally for display) - this both substantially increases the number
of streams that the endpoint would wish to subscribe to, and
increases the complexity of layouts possible. For instance, an
endpoint with two screens that wanted to show a 2x2 grid of
participants on each would need to subscribe to eight captures with
appropriate spatial information.
5. Consumer includes optional spatial information
We can address these issues and allow an endpoint more complex stream
rendering configurations, while substantially reducing the number and
complexity of switched captures the MCU must advertise. The approach
is for the consumer to optionally include some information on the
spatial relationships with its rendering as part of its request.
This allows the MCU to advertise a single collection of switched
captures with no spatial information for the consumer to subscribe
to, rather than attempting to anticipate every layout an endpoint
might desire, and having to advertise an entry for each with suitable
spatial information.
There are a number of forms this consumer information could take.
Hansen, et al. Expires December 2, 2012 [Page 6]
Internet-Draft Consumer spatial information in CLUE May 2012
However, the form most consistent with the existing CLUE data model,
and offering most flexibility for the future, is for the consumer to
be able to describe the spatial relationship of its screens in the
same fashion and using the same system as in the provider's capture
attributes. 'Area of Display' would be an optional attribute of a
consumer request, and would have the same properties as the
provider's 'Area of Capture' (i.e. four co-planar {X,Y,Z}
coordinates). If the consumer includes information on the area of
display the provider may then choose to use that information to
inform its choice when switching video. Alternatively, in the cases
where there were no spatial constraints on the video the provider was
switching, or where fixed streams were being sent, the area of
display information could be ignored.
A straightforward example of this would be where consumer A is a
three-screen system wishing to join a large conference including
one-, two- and three-screen systems. The MCU offers a capture scene
including three switched captures, to which A wishes to subscribe. A
then sends a choice for each of those captures, and for each choice
includes an area of display attribute giving the position of each of
its screens. The MCU can then use that information to ensure that,
when switching in the video streams from multi-screen systems, it
does so in a way that they will be rendered correctly on A.
A more complicated example is where A is still a three-screen system
wishing to join a large conference including one-, two- and three-
screen systems, but now wishes to receive more than one video stream
per screen, composing them locally. The layout A wishes to achieve
is (three large screens, each with one main video displayed full-
screen and three picture-in-picture views):
+-------------+ +-------------+ +-------------+
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
| +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
+-------------+ +-------------+ +-------------+
The MCU advertises that it can send at least 12 switched video
streams to A simultaneously. A makes 12 choices, including a
suitable area of display for each one. This information allows the
MCU to not just ensure that multi-screen systems are not laid out
incorrectly, but potentially to also optimize other choices, such as
not splitting multi-screen systems being rendered in the smaller PiP
panes across bezels, show presentation and full-motion video received
from the same participant on the same screen, and so on.
Hansen, et al. Expires December 2, 2012 [Page 7]
Internet-Draft Consumer spatial information in CLUE May 2012
5.1. Applicability of consumer spatial information to audio
The text above is primarily concerned with resolving issues for
video, but it may still be relevant for audio; the consumer may wish
to provide spatial information about the locations at which they will
be playing out their audio. However, for the most part I believe
this is less relevant; that audio does not have the same rigid
requirements for playout that were described above for video, and
that for the most part the problem can be solved with the provider-
specified spatial coordinates already defined in the specification.
6. Implications and conclusions
CLUE has been designed as a provider-oriented protocol, with the
provider giving a list of the resources it can supply and the
consumer selecting from these. This proposal fits into that pattern;
spatial information included in a consumer request forms part of that
request, insofar as it does not limit the provider but instead gives
additional information for the provider to use as it sees fit.
Consumers that have no need for the spatial information need not
include it, and providers can choose to ignore the spatial
information if it is not relevant to their selection process.
Allowing the optional reuse of spatial information that is currently
sent only by the provider in the consumer request increases the range
of problems for which CLUE can provide a solution, while placing no
additional burden on systems which do not have these concerns, as
they can safely ignore the information
7. Security Considerations
The proposal herein has no security implications; the new information
from the consumer is optional and sent at their discretion, and
reveals nothing that can compromise their system.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Hansen, et al. Expires December 2, 2012 [Page 8]
Internet-Draft Consumer spatial information in CLUE May 2012
8.2. Informative References
[I-D.ietf-clue-framework]
Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
"Framework for Telepresence Multi-Streams",
draft-ietf-clue-framework-05 (work in progress), May 2012.
Authors' Addresses
Robert Hansen
Cisco Systems
San Jose, CA 95134
USA
Email: rohanse2@cisco.com
Andy Pepperell
Silverflare
Email: andy.pepperell@silverflare.com
Allyn Romanow
Cisco Systems
San Jose, CA 95134
USA
Email: allyn@cisco.com
Brian Baldino
Cisco Systems
San Jose, CA 95134
USA
Email: bbaldino@cisco.com
Mark Duckworth
Polycom
Andover, MA 01810
USA
Email: mark.duckworth@polycom.com
Hansen, et al. Expires December 2, 2012 [Page 9]