CLUE | C. Groves, Ed. |
Internet-Draft | W. Yang |
Intended status: Informational | R. Even |
Expires: February 12, 2014 | Huawei |
August 11, 2013 |
Describing Captures in CLUE and relation to multipoint conferencing
draft-groves-clue-multi-content-00
In a multipoint Telepresence conference, there are more than two sites participating. Additional complexity is required to enable media streams from each participant to show up on the displays of the other participants. Common policies to address the multipoint case include "site-switch" and "segment-switch". The document will discuss these policies as well as the "composed" policy and how they work in the multipoint case.
The current CLUE framework document contains the "composed" and "switched" attributes to describe situations where a capture is mix or composition of streams or where the capture represents a dynamic subset of streams. "Composed" and "switched" are capture level attributes. In addition to these attributes the framework defines an attribute "Scene-switch-policy" on a capture scene entry (CSE) level which indicates how the captures are switched.
This draft discusses composition/switching in CLUE and makes a number of proposals to better define and support these capabilities.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 12, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
One major objective for Telepresence is to be able to preserve the "Being there" user experience. However, in multi-site conferences it is often (in fact usually) not possible to simultaneously provide full size video, eye contact, common perception of gestures and gaze by all participants. Several policies can be used for stream distribution and display: all provide good results but they all make different compromises.
The policies are described in [I-D.ietf-clue-telepresence-use-cases]. [RFC6501] has the following requirement:
The policies described in the use case draft include the site-switch, segment-switch and composed policies.
Site switch is described in the CLUE use case "One common policy is called site switching. Let's say the speaker is at site A and everyone else is at a "remote" site. When the room at site A shown, all the camera images from site A are forwarded to the remote sites. Therefore at each receiving remote site, all the screens display camera images from site A. This can be used to preserve full size image display, and also provide full visual context of the displayed far end, site A. In site switching, there is a fixed relation between the cameras in each room and the displays in remote rooms. The room or participants being shown is switched from time to time based on who is speaking or by manual control, e.g., from site A to site B."
These policies are mirrored in the framework document through a number of attributes.
Currently in the CLUE framework document [I-D.ietf-clue-framework] there are two media capture attributes: Composed and Switched.
Composed is defined as:
Switched is defined as:
There is also a Capture Scene Entry (CSE) attribute "scene switch policy" defined as:
This section discusses a number of issues in the current framework around the support of switched/composed captures and media streams when considering multipoint conferencing. Some issues are more required functions and some are related to the current description in the framework document.
In a multipoint conference there is a central control point (MCU). The MCU will have the CLUE advertisements from all the conference participants and will prepare and send advertisements to all the conference participants. The MCU will also have more information about the conference, participants and media which it receives at conference creation and via call signalling. This data is not stable since each user who joins or leaves the conference causes a change is conference state. An MCU supporting SIP may utilise the Conference event package, XCON and CCMP to maintain and distribute conference state.
[RFC4575] defines a conference event package. Using the event framework notifications are sent about changes in the membership of this conference and optionally about changes in the state of additional conference components. The conference information is composed of the conference description, host information, conference state, users that has endpoints where each endpoint includes the media description.
[RFC6501] extends the conference event package and tries to be signalling protocol agnostic. RFC6501 adds new elements but also provides values for some of the elements defined in RFC4575, for example it defines roles ( like "administrator", "moderator", "user", "participant", "observer", and "none").
[RFC6503] Centralized Conferencing Manipulation Protocol (CCMP) allows authenticated and authorized users to create, manipulate, and delete conference objects. Operations on conferences include adding and removing participants, changing their roles, as well as adding and removing media streams and associated endpoints.
CCMP implements the client-server model within the XCON framework, with the conferencing client and conference server acting as client and server, respectively. CCMP uses HTTP as the protocol to transfer requests and responses, which contain the domain-specific XML-encoded data objects defined in [RFC6501] "Conference Information Data Model for Centralized Conferencing (XCON)".
The XCON data model and CCMP provides a generic way to create and control conferences. CCMP is not SIP specific but SIP endpoint will subscribe to the conference event package to get information about changes in the conference state.
Therefore when a MCU implements the above protocols there will be an interaction between any CLUE states and those within a conferencing framework. For example: if an endpoint leaves a conference this will mean that an MCU may need to indicate via CLUE to the other endpoints that those captures are no longer available and it would also need to indicate via the Conferencing framework that the endpoint is longer part of the conference.
The question is how do these concepts relate as the Conferencing framework does not have the concept of captures or scenes? Other aspects overlap, for example:
It is noted point to point calls may not implement the conferencing framework. It is desirable that CLUE procedures be the same whether an endpoint is communicating with a peer endpoint or an MCU.
One of the early justifications for switching / composition was the ability to switch between sites. When looking at the CLUE framework there is no concept of "site" in the CLUE hierarchy. The closest concept is an "endpoint" but this has no identity within the CLUE syntax. The highest level is the "clueInfo" that includes captureScenes and an endpoint may have multiple capture scenes.
If the switched and composed attributes are specified at a capture level it is not clear what the correlation is between the capture and the endpoint / scenes, particularly when the attributes are described in the context of sites. A scene may be composed of multiple captures. Where an MCU is involved in a conference with multiple endpoints, multiple capture scenes are involved. It becomes difficult to map all the scenes and capture information from the source endpoints into one capture scene sent to an endpoint. Discussion of switching, composition et al. needs to be described in terms of the CLUE concepts.
When considering the SIP conferencing framework it can be seen that there are complications with interworking with the scene concept. There may be multiple media of the same type e.g room view and presentation but they are easily identified. This also needs to be considered.
When considering switching and composition whilst this may be represented by one capture and one resulting media stream there may be multiple original source captures. Each of these source captures would have had their own set of attributes. A media capture with the composed attribute allows the description of the capture as whole but not a description of the constituent parts. In the case of a MCU taking multiple input media captures and compositing them into one output capture the CLUE related characteristics of these inputs are lost in the current solution. Alternate methods such as the RFC6501 layout field etc. may need to be investigated.
Consider the case where MCUs receive CLUE advertisements from various endpoints. Having a single capture with a switched attribute makes it difficult to fully express what the content is when it is from multiple endpoints. It may be possible to specify lists of capture attribute values when sending an advertisement from the MCU, i.e. role=speaker,audience but it becomes difficult to relate multiple attributes, i.e. (role=speaker,language=English),(role=audience,language=french).
One capture could represent source captures from multiple locations. A consumer may wish to examine the inputs to a switched capture , i.e. choose which of the original endpoints it wants to see/hear. In order to do this the original capture information would need to be conveyed in a manner that minimises overhead for the MCU.
By being able to link multiple source captures to one mixed (switched/composed) capture in a CLUE advertisement allows a fuller description of the content of the capture.
Today the "composed" and "switched" attributes appear at a media capture level. If "switched" is specified for multiple captures in a capture scene it's not clear from the framework what the switching policy is. For example: If a CSE contains three VCs each with "switched" does the switch occurs between these captures? Does the switch occur internal to each capture?
The "scene-switch-policy" CSE attribute has been defined to indicate switch policy but there doesn't appear to be a description of whether this only relates to captures marked with "switch" and/or "composed"? If a CSE marked with "scene-switch-policy" contains non-switched, non-composed captures what does this mean?
What are the interactions between the two properties? E.g. Are "switched" and "composed" attributes mutually exclusive or not? Is switched capture with a scene switch policy of "segment-switched" a "composed" capture?
These issues need to be clarified in the framework.
The "Scene-switch-policy" attribute allows the indication of whether switched captures are "site" or "segment" switched. However there is no indication of what the switch or the composition "trigger" policy is. Content could be provided based on a round robin view, loudest speaker etc. Where an advertising endpoint supports different algorithms it would be advantageous for a consumer to know and select an applicable policy.
Whether single or multiple streams are used for switched captures is not clear from the capture description. For example:
There are 3 endpoints (A,B,C) each with 3 video captures.(VCa1,VCa2,VCa3, etc.). A MCU wants to indicate to endpoint C that it can offer a switched view of endpoints A and B.
It could send an Advertisement with CSE (VCa1,VCa2,VCa3, VCb1,VCb2,VCb3),scene-switch-policy=site-switch.
Normally such a configuration (without the switch policy) would relate to 6 different media streams. Switching introduces several possibilities.
For site switching:
For segment switching this is further complicated because the MCU may choose to send media related to endpoint A or B. There no text describing any limitation so the MCU may send 1 VC or 5.
Utilising CLUE "encodings" may be a way to describe how the switch is taking place in terms of media provided but such description is missing from the framework. One could assume that an individual encoding be assigned to multiple media captures (i.e. multiple VCs to indicate they are encoded in the same stream) but again this is problematic as the framework indicates that "An Individual Encoding can be assigned to at most one Capture Encoding at any given time."
This could do with further clarification in the framework.
Simultaneous Transmission Set is defined as "a set of Media Captures that can be transmitted simultaneously from a Media Provider." It's not clear how this definition would relate to switched or composed streams. The captures may not be able to be sent at the same time but may form a timeslot on a particular stream. They may be provided together but not at precisely the same time.
The current version of the framework in section 6.3 it indicates that:
If switching or composition is specified at a capture level only it is evident that simultaneity constraints do not come into play. However if multiple captures are used in a single media stream I.e. associated with the CSE then these may be subject to a simultaneous transmission set description.
It is also noted that there is a similar issue for encoding group. See section 8/[Framework]:
If "switching" is used then there is no need to send the encodings at the same time.
This needs to be clarified.
CLUE currently allows the ability to signal spatial information related to a media capture. It is unclear in the current draft how this would work with switching/composition. In section 6.1 / [I-D.ietf-clue-telepresence-use-cases] it does say:
This describes a single capture not when there are multiple switched captures. It appears to focus on segment switching rather than site switching and does not appear to cover "composed" (if it is related).
An advertiser may or may not want to use common spatial attributes for captures associated with a switched captures. For example: it may be beneficial for the Advertiser in a composed image to indicate that different captures have a different capture area in a virtual space.
This should be given consideration in the framework.
In section 6.2.2 of version 9 [I-D.ietf-clue-framework] it indicates that an Advertiser may provide multiple values for the "scene-switch-policy" and that the Consumer may choose and return the value it requires.
In version 9 of the framework there was no mechanism in CLUE for a Consumer to choose and return individual values from capture scene, CSE or media capture attributes.
In version 10 of the framework the text was updated to indicate that the consumer could choose values from a list. It is not clear that this capability is needed as the procedure only relates to the "scene-switch-policy". The switching policy may be better specified by other means.
As has been discussed above there are a number of issues with regards to the support of switched/composed captures/streams in CLUE particularly when considering MCUs. The authors believe that there is no single action that can address the above issues. Several options are discussed below. The options are not mutually exclusive.
The authors believe that there are a number of requirements for this:
The authors also believe that whether media is composed, segment switched, site switched the common element is that the media stream contains multiple captures from potentially multiple sources.
[I-D.ietf-clue-framework] does have the "Scene-switch-policy" attribute at a CSE level but as described in section 2 it is not sufficient for several reasons. E.g. it is not possible assign an encoding to a CSE, a CSE cannot reference captures from multiple scenes and there is a relationship with STSs that needs to be considered.
In order to be able to fully express and support media stream with multiple captures the authors propose a new type of capture, the "multiple content capture" (MCC). The MCC is essentially the same as audio or video captures in that it may have its own attributes the main difference is that it can also include other captures. It indicates that the MCC capture is composed of other captures. This composition may be positional (i.e. segments/tiling) or time composition (switched) etc. and specified by a policy attribute. The MCC can be assigned an encoding. For example:
This would indicate that MCC1 is composed of 3 video captures according to the policy.
One further difference is that a MCC may reference individual captures from multiple scenes. For example:
This would indicate that scene #3 contains a MCC that is composed from individual encodings VC1 and VC3. This allows the consumer to associate any capture scene properties from the original scene with the multiple content capture.
The MCC would be able to be utilised by both normal endpoints and MCUs. For example: it would allow an endpoint to construct a mixed video stream that is a virtual scene with a composition of presentation video and individual captures.
This proposal does not consider any relation to the SIP conferencing framework.
The sections below provide more detail on the proposal.
Multiple content capture: Media capture for audio or video that indicates the capture contains multiple audio or video captures. Individual media captures may or may not be present in the resultant capture encoding depending on time or space. Denoted as MCCn in the example cases in this document.
The MCC indicates that multiple captures are contained in one media capture by referencing the applicable individual media captures. Only one capture type (i.e. audio, video, etc.) is allowed in each MCC instance. The MCC contains a reference to the media captures as well attributes associated with the MCC itself. The MCC may reference individual captures from other capture scenes. If an MCC is used in a CSE that CSE may also reference captures from other Capture Scenes.
Note: Different Capture Scenes are not spatially related.
Each instance of the MCC has its own captureID i.e. MCC1. This allows all the individual captures contained in the MCC to be referenced by a single ID.
The example below shows the use of a MultipleContent capture:
CaptureScene1 [VC1 {attributes}, VC2 {attributes}, VC3 {attributes}, MCC1(VC1,VC2,VC3){attributes}]
This indicates that MCC1 is a single capture that contains the captures VC1, VC2 and VC3 according to any MCC1 attributes.
One or more MCCs may also specified in a CSE. This allows an Advertiser to indicate that several MCC captures are used to represent a capture scene.
Note: Section 6.1/[I-D.ietf-clue-framework] indicates that "A Media Capture is associated with exactly one Capture Scene". For MCC this could be further clarified to indicate that "A Media Capture is defined in a capture scene and is given an advertisement unique identity. The identity may be referenced outside the Capture Scene that defines it through a multiple content capture (MCC).
Attributes may be associated with the MCC instance and the individual captures that the MCC references. A provider should avoid providing conflicting attribute values between the MCC and individual captures. Where there is conflict the attributes of the MCC override any that may be present in the individual captures.
There are two MCC specific attributes "MaxCaptures" and "Policy" which are used to give more information regarding when the individual captures appears and what policy is used to determine this.
The spatial related attributes can be further used to determine how the individual captures "appear" within a stream. For example a virtual scene could be constructed for the MCC capture with two video captures with a "MaxCaptures" attribute of 2 and an "area of capture" attribute provided with an overall area. Each of the individual captures could then also include an "area of capture" attribute with a sub-set of the overall area. The consumer would then know the relative position of the content in the composed stream. For example: The above capture scene may indicate that VC1 has an x-axis capture area 1-5, VC2 6-10 and VC3 11-15. The MCC capture may indicate an x-axis capture area 1-15.
MaxCaptures:{integer}
This field is only associated with MCCs and indicates the maximum number of individual captures that may appear in a capture encoding at a time. It may be used to derive how the individual captures within the MCC are composed with regards to space and time. Individual content in the capture may be switched in time so that only one of the individual captures/CSEs are shown (MaxCaptures:1). The individual captures may be composed so that they are all shown in the MCC (MaxCaptures:n).
For example:
This would indicate that the Advertiser in the capture encoding would switch (or compose depending on policy) between VC1, VC2 or VC3 as there may be only a maximum of one capture at a time.
TBD - This attribute is to address what algorithm the endpoint/MCU uses to determine what appears in the MCC captures. E.g. loudest, round robin.
Note: The {scene-switch-policy} attribute has values that indicates "site-switch" or "segment" switch. The distinction between these is that "site-switch" indicates that when there is mixed content that captures related to an endpoint appear together. "segment-switch" indicates that different endpoints captures could appear together. An issue is that a Consumer has no concept of "endpoints" only "capture scenes". Also as highlighted a Consumer has no method to return parameters for CSEs.
The use of MCCs enables the Advertiser to communicate to the Consumer that captures originate from different captures scenes. In cases where multiple MCCs represent a scene (i.e. multiple MCCs in a CSE) an Advertiser may wish to indicate that captures from one capture scene are present in the capture encodings of specified MCCs at the same time. Having an attribute at capture level removes the need for CSE level attributes which are problematic for consumers.
Synch-id: { integer}
This MCC attribute indicates how the individual captures in multiple MCC captures are synchronised. To indicate that the capture encodings associated with MCCs contain captures from the source at the same time the Advertiser should set the same SynchID on each of the concerned MCCs. It is the provider that determines what the source for the captures is. For example when the provider is in an MCU it may determine that each separate CLUE endpoint is a remote source of media.
For example:
CaptureScene1[Description=AustralianConfRoom, VC1(left),VC2(middle),VC3(right), CSE1(VC1,VC2,VC3)] CaptureScene2[Description=ChinaConfRoom, VC4(left),VC5(middle),VC6(right), CSE2(VC4,VC5,VC6)] CaptureScene3[MCC1(VC1,VC4){Sync-id:1}{encodinggroup1}, MCC2(VC2,VC5){Sync-id:1}{encodinggroup2}, MCC3(VC3,VC6){encodinggroup3}, CSE3(MCC1,MCC2,MCC3)]
Figure 1: Synchronisation Example
The above advertisement would indicate MCC1,MCC2,MCC3 make up a capture scene. There would be three capture encodings. Because MCC1 and MCC2 have the same Sync-id, each encoding1 and encoding2 would together have content from only capture scene 1 or only capture scene 2 at a particular point in time. Encoding3 would not be synchronised with encoding1 or encoding2.
Without this attribute it is assumed that multiple MCCs may provide different sources at any particular point in time.
MCCs shall be assigned an encoding group and thus become a capture encoding. The captures referenced by the MCC do not need to be assigned to an encoding group. This means that all the individual captures referenced by the MCC will appear in the capture encoding according to any MCC attributes. This allows an Advertiser to specify capture attributes associated with the individual captures without the need to provide an individual capture encoding for each of the inputs.
If an encoding group is assigned to an individual capture referenced by the MCC it indicates that this capture may also have an individual capture encoding.
For example:
CaptureScene1 [VC1 {encoding group1}, VC2 ] MCC1(VC1,VC2){encoding group3}]
This would indicate that VC1 may be sent as its own capture encoding from encoding group1 or that it may be sent as part of a capture encoding from encoding group3 along with VC2.
Note: The section 8/[I-D.ietf-clue-framework] indicates that every capture is associated with an encoding group. To utilise MCCs this requirement has to be relaxed.
The MCC can be used in simultaneous sets, therefore providing a means to indicate whether several multiple content captures can be provided at the same time. Captures within a MCC can be provided together but not necessarily at the same time. Therefore by specifying a MCC in an STS it does not indicate that all the referenced individual captures may be present at a time. The MaxCaptures attributes indicates the maximum number of captures that may be present.
An MCC instance of is limited to one media type e.g. video, audio, text.
Note: This gets around the problem where the framework says that all captures (even switched ones) within a CSE have to be allowed in a STS to be sent at the same time.
On receipt of an advertisement with an MCC the Consumer treats the MCC as per other individual captures with the following differences:
For example if the Consumer receives:
A Consumer should choose all the captures within a MCCs however if the consumer determines that it doesn't want VC3 it can return MCC1(VC1,VC2). If it wants all the individual capture then it returns just a reference to the MCC (i.e. MCC1).
Note: The ability to return a subset of capture is for consistency with the current framework. It says that a Consumer should choose all the captures from a CSE but it allows it to select a subset (if the STS is provided). The intent was to provide equivalent functionality for a MCC.
The use of MCCs allows the MCU to easily construct outgoing Advertisements. The following sections provide several examples.
Four endpoints are involved in a CLUE session. To formulate an Advertisement to endpoint 4 the following Advertisements received from endpoint 1 to 3 and used by the MCU. Note: The IDs overlap in the incoming advertisements. The MCU is responsible for making these unique in the outgoing advertisement.
Endpoint 1 CaptureScene1[Description=AustralianConfRoom, VC1(role=audience)] Endpoint 2 CaptureScene1[Description=ChinaConfRoom, VC1(role=speaker),VC2(role=audience), CSE1(VC1,VC2)] Endpoint 3 CaptureScene1[Description=USAConfRoom, VC1(role=audience)]
Figure 2: MCU case: Received advertisements
Note: Endpoint 2 above indicates that it sends two streams.
If the MCU wanted to provide a multiple content capture containing the audience of the 3 endpoints and the speaker it could construct the following advertisement:
CaptureScene1[Description=AustralianConfRoom, VC1(role=audience)] CaptureScene2[Description=ChinaConfRoom, VC2(role=speaker),VC3(role=audience), CSE1(VC2,VC3)] CaptureScene3[Description=USAConfRoom, VC4(role=audience)] CaptureScene4[MCC1(VC1,VC2,VC3,VC4){encodinggroup1}]
Figure 3: MCU case: MCC with multiple audience and speaker
Alternatively if the MCU wanted to provide the speaker as one stream and the audiences as another it could assign an encoding group to VC2 in Capture Scene 2 and provide a CSE in Capture Scene 4:
CaptureScene1[Description=AustralianConfRoom, VC1(role=audience)] CaptureScene2[Description=ChinaConfRoom, VC2(role=speaker){encodinggroup2}, VC3(role=audience), CSE1(VC2,VC3)] CaptureScene3[Description=USAConfRoom, VC4(role=audience)] CaptureScene4[MCC1(VC1,VC3,VC4){encodinggroup1}, CSE2(MCC1,VC2)]
Figure 4: MCU case: MCC with audience and separate speaker
Therefore a Consumer could choose whether or not to have a separate "role=speaker" stream and could choose which endpoints to see. If it wanted the second stream but not the Australian conference room it could indicate the following captures in the Configure message:
MCC1(VC3,VC4),VC2
Figure 5: MCU case: Consumer Response
Multiple MCCs can be used where multiple streams are used to carry media from multiple endpoints. For example:
A conference has three endpoints D,E and F, each end point has three video captures covering the left, middle and right regions of each conference room. The MCU receives the following advertisements from D and E:
Endpoint D CaptureScene1[Description=AustralianConfRoom, VC1(left){encodinggroup1}, VC2(middle){encodinggroup2}, VC3(right){encodinggroup3}, CSE1(VC1,VC2,VC3)] Endpoint E CaptureScene1[Description=ChinaConfRoom, VC1(left){encodinggroup1}, VC2(middle){encodinggroup2}, VC3(right){encodinggroup3}, CSE1(VC1,VC2,VC3)]
Figure 6: MCU case: Multiple captures from multiple endpoints
Note: The Advertisement uses the same identities. There is no co-ordination between endpoints so it is likely there would be identity overlap between received advertisements.
The MCU wants to offer Endpoint F three capture encodings. Each capture encoding would contain a capture from either Endpoint D or Endpoint E depending on the policy. The MCU would send the following:
CaptureScene1[Description=AustralianConfRoom, VC1(left),VC2(middle),VC3(right), CSE1(VC1,VC2,VC3)] CaptureScene2[Description=ChinaConfRoom, VC4(left),VC5(middle),VC6(right), CSE2(VC4,VC5,VC6)] CaptureScene3[MCC1(VC1,VC4){encodinggroup1}, MCC2(VC2,VC5){encodinggroup2}, MCC3(VC3,VC6){encodinggroup3}, CSE3(MCC1,MCC2,MCC3)]
Figure 7: MCU case: Multiple MCCs for multiple captures
Note: The identities from Endpoint E have been renumbered so that they are unique in the outgoing advertisement.
The CLUE protocol extends the EP description defined in the signalling protocol (SDP for SIP) by providing more information about the available media. If we look at XCON it uses the information available from the signalling protocol but instead of using SDP to distribute the participants information and to control the multipoint conference. This is done using a data structure defined in XML using the CCMP protocol over HTML (note that CCMP can be used also over CLUE channel if required). XCON provide a hierarchy the starts from conference information that includes users having endpoints that have media.
The role is part of the user structure while the mixing mode is part of the conference level information specifying the mixing mode per each of the media available in the conference.
CLUE on the other end does not have such structure it start from what is probably, in XCON terms, an end points that has media structured by Scenes that has media. There is no user or conference level information though the "role" proposal tries to add the user information (note that use information is different from the role in the call or the conference).
The XCON structure looks better when looking at a multipoint conference. Yet it does not make sense to have such a data model for the point to point calls. Therefore only going with this option means that capture attribute information will not be available for point to point calls.
As discussed in section 2 the existing CLUE attributes surrounding switching and composition have a number of open issues. This section proposes changes to the text describing the attributes to better describe their usage and interaction. It is also assumed that by using these attributes there is no attempt to describe the any component source capture information.
The current CLUE framework describes the "Composed" attribute as:
It is proposed to update the description:
The current CLUE framework describes the "Switched" attribute as:
It is proposed to update the description:
The current CLUE framework describes the "Scene Switch Policy" attribute as:
Firstly it is proposed to rename this attribute to "Capture Source Synchronisation" in order to remove any confusion with the switch attribute and also to remove the association with a scene as the any information regarding source scenes is lost. This is due to that the CSE represents the current scene. No change in functionality is intended by the renaming. It is proposed to describe it as follows:
Furthermore it is assumed that if the current set of parameters is maintained that the indication of the mechanism for the trigger of switching sources (e.g. loudest source, round robin) is not possible because the Consumer only chooses captures and not sources. If it's purely up to the provider then this information would be superfluous. It is proposed to capture this:
When a CLUE endpoint is acting as a MCU it implies the need for an advertisement aggregation function. That is the endpoint receives CLUE advertisements from multiple endpoints uses this information, its media processing capabilities and any policy information to form advertisements to the other endpoints.
Contributor's note: TBD I think there needs to be a discussion here about that source information is lost. How individual attributes are affected. i.e. it may be possible to simply aggregate language information but not so simple when there's different spatial information. Also need to consider capture encodings.
This template was derived from an initial version written by Pekka Savola and contributed by him to the xml2rfc project.
It is not expected that the proposed changes present the need for any IANA registrations.
It is not expected that the proposed changes present any addition security issues to the current framework.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |