Network Working Group | M. Westerlund |
Internet-Draft | B. Burman |
Intended status: Standards Track | Ericsson |
Expires: April 24, 2014 | S. Nandakumar |
Cisco | |
October 21, 2013 |
Using Simulcast in RTP Sessions
draft-westerlund-avtcore-rtp-simulcast-03
In some application scenarios it may be desirable to send multiple differently encoded versions of the same Media Source in independent Source Packet Streams. This is called Simulcast. This document discusses the best way of accomplishing Simulcast in RTP and how to signal it in SDP. A solution is defined by making three extensions to SDP, and using RTP/RTCP identification methods to relate RTP Source Packet Streams. The first SDP extension consists of two new session level SDP attributes that express capability to send or receive Simulcast Source Packet Streams, respectively. The second SDP extension introduces an SDP media level attribute that groups and identifies a selected set of media level parameters for a specific direction, called a media configuration. The third SDP extension describes how to group such media configurations on SDP session or media level for Simulcast purposes.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 24, 2014.
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Most of today's multiparty video conference solutions make use of centralized servers to reduce the bandwidth and CPU consumption in the endpoints. Those servers receive Source Packet Streams from each participant and send some suitable set of possibly modified streams to the rest of the participants, which usually have heterogeneous capabilities (screen size, CPU, bandwidth, codec, etc). One of the biggest issues is how to perform stream adaptation to different participants' constraints with the minimum possible impact on video quality and server performance.
Simulcast is the act of simultaneously sending multiple different versions of the same media content, e.g. the same video source encoded with different video encoder types or image resolutions. This can be done in several ways and for different purposes. This document focuses on the case where it is desirable to provide a Media Source as multiple Source Packet Streams over RTP [RFC3550] towards an intermediary so that the intermediary can provide the wanted functionality by selecting which Source Packet Stream to forward to other participants in the session, and more specifically how the identification and grouping of the involved Source Packet Streams are done. From an RTP perspective, Simulcast is a specific application of the aspects discussed in RTP Multiplexing Guidelines [I-D.ietf-avtcore-multiplex-guidelines].
The purpose of this document is to describe a few scenarios where it is motivated to use Simulcast, and propose a suitable solution for signaling and performing RTP Simulcast.
This document makes use of the terminology defined in RTP Taxonomy [I-D.lennox-raiarea-rtp-grouping-taxonomy]. In addition, the following terms are used:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
Many use cases of Simulcast as described in this document relate to a multi-party Communication Session where one or more central nodes are used to adapt the view of the Communication Session towards individual Participants, and facilitate the Media Transport between Participants. Thus, these cases targets the RTP Mixer topology defined in [RFC5117] (Section 3.4: Topo-Mixer), further elaborated and extended with other topologies in [I-D.ietf-avtcore-rtp-topologies-update] (Section 3.6 to 3.9).
There are two principle approaches for an RTP Mixer to provide this adapted view of the Communication Session to each receiving Participant:
The use of Simulcast is relates to the latter approach, where it is more important to reduce the load on the RTP Mixer and/or minimize QoE impact than to achieve an optimal adaptation of resource usage.
A multicast/broadcast case where the receivers themselves selects the most appropriate simulcast version and tune in to the right transport to receive that version is also considered [sec-multicast] . This enables large receiver populations with heterogeneity where it comes to capabilities and the use network paths bandwidth.
In this section, an "RTP switch" is used as a common short term for the terms "switching RTP mixer", "source projecting middlebox", and "video switching MCU" as discussed in [I-D.ietf-avtcore-rtp-topologies-update].
The Media Sources provided by a sending Participant potentially need to reach several receiving Participants that differ in terms of available resources. A discussion on that topic is included in Appendix A. The receiver resources that typically differ include, but are not limited to:
Letting the sending Participant create a Simulcast of a few differently configured Source Packet Streams per Media Source can be a good trade-off when using an RTP switch as middlebox, instead of sending a single Source Packet Stream and using an RTP Mixer to create individual transcodings to each receiving Participant.
This requires that the receiving Participants can be categorized in terms of available resources and that the sending Participant can choose a matching configuration for a single Source Packet Stream per category and Media Source.
For example, assume for simplicity a set of receiving Participants that differ only in that some have support to receive Codec A, and the others have support to receive Codec B. Further assume that the sending participant can send both Codec A and B. It can then reach all receivers by creating two Simulcasted Source Packet Streams from each Media Source; one for Codec A and one for Codec B.
In another simple example, a set of receiving Participants differ only in screen resolution; some are able to display video with at most 360p resolution and some support 720p resolution. A sending Participant can then reach all receivers by creating a Simulcast of Source Packet Streams with 360p and 720p resolution for each sent video Media Source.
In more elaborate cases, the receiving Participants differ both in available Sampling and Bitrate, and maybe also Codec, and it is up to the RTP switch to find a good trade-off in which Simulcasted stream to choose for each intended receiver. It is also the responsibility of the RTP switch to negotiate a good fit of Simulcast streams with the sending Participant.
The maximum number of Simulcasted Source Packet Streams that can be sent is mainly limited by the amount of processing and uplink network resources available to the sending Participant.
The application logic that controls the Communication Session may include special handling of some Media Sources. It is for example commonly the case that the media from a sending Participant is not sent back to itself.
It is also common that a currently active speaker Participant is shown in larger size or higher quality than other Participants (the Sampling or Bitrate aspects of Section 3.1). Not sending the active speaker media back to itself means there is some other Participant's media instead that receive special handling towards the active speaker; typically the previous active speaker. This way, the previously active speaker is needed both in larger size (to current active speaker) and in small size (to the rest of the Participants), which can be solved with a Simulcast from the previously active speaker to the RTP switch.
When using Broadcast or Multicast technology to distribute real-time media streams to large populations of receivers there can still be significant heterogeneity among the receiver population. This can depend on several factors:
To handle these variations, a transmitter of real-time media may want to apply Simulcast to its Source Packet Streams and provide a set of media configurations, enabling the receivers to select the best fit from these sets themselves. The endpoint capabilities will usually result in a single initial choice. However, the network bandwidth can vary over time, which requires a client to continuously monitor its reception to determine if the received media streams still fit within the available bandwidth. If not, another Simulcast media configuration containing a thinner set of Source Packet Streams will have to be chosen.
When one uses IP multicast, the level of Simulcast granularity that the receiver can select from is by choosing different multicast addresses. Thus, different Simulcast versions need to be put on different Media Transports using different multicast addresses. If these Simulcast versions are described using SDP, they need to be part of different SDP media descriptions, as SDP binds to transport on media description level. To enable more than the initial choice to function well, there is a need to enable correct mapping of Source Packet Streams in one Simulcast media configuration to a corresponding Source Packet Stream in another Simulcast media configuration on another multicast group.
The application logic that controls the Communication Session may allow receiving Participants to apply preferences to the characteristics of the Source Packet Stream they receive, for example in terms of the aspects listed in Section 3.1. Sending a Simulcast of Source Packet Streams is one way of accommodating receivers with conflicting or otherwise incompatible preferences.
The following requirements need to be met to support the use cases in previous sections:
Signaling Simulcast is about negotiating between media sender and receiver what the different Simulcast versions should be, how to identify them in terms of Source Packet Streams, and how to inter-relate those Source Packet Streams.
The proposed solution consists of:
This section further details the signaling solution outlined above [sec-solution].
There are numerous media properties that can be varied to construct a set of Simulcast versions. A Simulcast enabled endpoint could also support Simulcast based on several of those properties. As long as those properties are relatively independent and if each Simulcast version need explicit definition in the SDP, this would lead to an exponential number of Simulcast version candidates and a very long SDP that is likely also hard to interpret. There is thus a need to limit the Simulcast version candidates included in the SDP to cover as small set of properties as possible.
If a legacy endpoint not supporting Simulcast were to be presented with an SDP including media descriptions for a set of Simulcast versions, it may not know how to correctly handle or interpret these "surplus" media descriptions.
Based on the functionality that Simulcast is intended to achieve, it should be clear that the reasons to send Simulcast versions are not the same as to receive Simulcast versions, seen from a single endpoint.
For these reasons, it is proposed to define two new SDP session level attributes, "a=sim-send-cap" and "a=sim-recv-cap", which explicitly signal support for Simulcast media transmission and Simulcast media reception, respectively, for that media description. "a=sim-send-cap" and "a=sim-recv-cap" MAY be used independently and simultaneously. These attributes are also proposed to have parameters indicating the media properties used to create the Simulcast versions, and their preferred ranking. The meaning of the attributes on SDP media level is undefined and MUST NOT be used.
simulcast-cap = "a="( "sim-send-cap:" / "sim-recv-cap:" ) cap-prop-list cap-prop-list = cap-prop-entry *(WSP cap-prop-entry) cap-prop-entry = cap-prop ["=" q-value] cap-prop = "rtpmap" / "fmtp" / "imageattr" / "framerate" / token ; for future extensions q-value = ( "0" "." 1*2DIGIT ) / ( "1" "." 1*2("0") ) ; Values between 0.00 and 1.00 ; WSP and DIGIT defined in [RFC5234] ; token defined in [RFC4566]
Figure 1: ABNF for Simulcast Capability
The media property values are taken from existing (and could be extended to cover other or future) SDP attributes that express media properties that can be varied to create different Simulcast versions:
The optional q-value expresses the relative preference to base a Simulcast version on that media property, with 1.00 meaning maximum (100%) preference and 0.00 meaning no (0%) preference. Several media properties can share the same q-value, in which case they are equally preferred. Not including any q-value for a media property value SHALL default to a q-value of 1.00.
The list of media properties is made extensible, to allow introducing additional dimensions for Simulcast versions.
When used as a declarative media description, sim-recv-cap indicates the configured end-point's required capability to recognize and receive a specified set of Source Packet Streams as Simulcast streams. In the same fashion, sim-send-cap requests the end-point to send a specified set of Source Packet Streams as Simulcast streams. sim-recv-cap and sim-send-cap MAY be used independently and at the same time and they need not specify the same capability properties.
An offerer wanting to use Simulcast SHALL include either one or both of those attributes, depending on in which direction(s) Simulcast is both supported and desirable. An offerer that receives an answer without "a=sim-send-cap" or "a=sim-recv-cap" MUST NOT define or use any Simulcast alternatives in that direction to the answerer.
An answerer that does not understand the concept of Simulcast will also not know those attributes and will remove them in the SDP answer, as defined in existing SDP Offer/Answer procedures. An answerer that does understand the attributes and that wants to support Simulcast in the indicated direction SHALL reverse directionality of the attribute; "sim-send-cap" becomes "sim-recv-cap" and vice versa, and include it in the answer.
An offerer that intends to send Simulcast alternatives and thus includes "a=sim-send-cap", MUST also include at least one media property parameter that it intends to use to construct the Simulcast alternatives, but it MAY include more media property parameters. Including multiple media property parameters in "a=sim-send-cap" SHALL be interpreted as an offer to send Simulcast versions covering all combinations thereof, but MAY be further restricted by other information in the SDP such as for example the number of simulcast-related media descriptions in the SDP or use of max-ssrc signaling [I-D.westerlund-mmusic-max-ssrc].
An offerer that is capable of receiving Simulcast alternatives and thus includes "a=sim-recv-cap", MUST also include at least one media property parameter that it is willing to use as discriminator between received Simulcast alternatives, but MAY include more media property parameters. Including multiple media property parameters in "a=sim-recv-cap" SHALL be interpreted as an offer to receive Simulcast versions covering all combinations thereof, but MAY be further restricted by other information in the SDP such as for example the number of simulcast-related media descriptions in the SDP or use of max-ssrc signaling [I-D.westerlund-mmusic-max-ssrc].
An answerer that either lacks the capability or does not desire to use Simulcast versions based on a certain media property parameter in a specific direction MUST remove such media property parameter from "a=sim-send-cap" or "a=sim-recv-cap". The answerer MUST NOT add any media property parameters that were not included in the offer.
An answerer SHOULD take the offerer's q-values into account when choosing which media configurations [sec-media-config] to include in the answer and how to group them [sec-group-config] into the resulting Simulcast(s).
Media that constitutes a Simulcast version has certain desirable characteristics that is meant to suit one category of diverse receivers [sec-diverse-receivers]. A receiver that is willing to receive Simulcast streams must be given sufficient means to express what it is capable of and desires to receive. A sender that is willing to send Simulcast streams must similarly be given sufficient means to express what it is capable of and desires to send.
An obvious candidate to express those characteristics is the media format in an SDP media description, defined by the rtpmap and fmtp attributes, which is typically mapped to an RTP Payload Type. Some of the most interesting characteristics for Simulcast purposes are however not included in rtpmap or fmtp, but are instead defined as separate attributes. Some of those individual attributes are possible to directly relate to a defined media format and could form a configuration together with the media format, but some attributes cannot be related to a specific media format and using the existing media format as a common identifier for a media configuration is not fully sufficient.
The act of Simulcast is trying to handle senders and receivers belonging to the vast multi-dimensional parameter space of "media configuration" by sub-dividing that parameter space into manageable and meaningful sub-sets. Communication between a sender and a receiver can be established successfully only when the actually sent media configuration (sub-set) fits within the receiver's available media configuration sub-set. At the same time, practical and implementation aspects often limits the size of those sub-sets. When that receiver or sender sub-set is either too small or is not known, the probability of successful communication decreases significantly. To increase the probability of finding a match between sender and receiver media configurations, it is essential that a media configuration can be a set instead of a single point in the parameter space, i.e. include parameter listings and/or ranges instead of single values.
Therefore, it is proposed to define a new media level SDP attribute, "a=config-id", which has relate the needed parameter types and the corresponding value ranges that together constitute a Simulcast media configuration. Each SDP media description MAY contain zero or more config-id attributes. The meaning of the attribute on SDP session level is undefined and MUST NOT be used.
configuration = "a=config-id:" config-id WSP config-dir WSP config-list config-id = token config-dir = "send" / "recv" config-list = config-entry *(WSP config-entry) config-entry = "pt" "=" pt-value *("," pt-value) / image-attr / "framerate" "=" fr-param / "b" "=" bw-mod ":" bw-value *1("-" bw-value) / ext-config-id [ "=" ext-config-value ] ; for future ext image-attr = "imageattr" "=" resolution-list resolution-list = resolution-set *("," resolution-set) ext-config-id = token ext-config-value = non-ws-string pt-value = 1*3DIGIT ; could be made more strict resolution-set = "[" "x=" xyrange "," "y=" xyrange *key-values "]" key-values = ( "," key-value ) key-value = ( "sar=" srange ) / ( "par=" prange ) / ( "q=" qvalue ) onetonine = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" xyvalue = onetonine *5DIGIT step = xyvalue xyrange = ( "[" xyvalue ":" [ step ":" ] xyvalue "]" ) / ( "[" xyvalue 1*( "," xyvalue ) "]" ) / ( xyvalue ) spvalue = ( "0" "." onetonine *3DIGIT ) / ( onetonine "." 1*4DIGIT ) srange = ( "[" spvalue 1*( "," spvalue ) "]" ) / ( "[" spvalue "-" spvalue "]" ) / ( spvalue ) prange = ( "[" spvalue "-" spvalue "]" ) qvalue = ( "0" "." 1*2DIGIT ) / ( "1" "." 1*2("0") ) fr-param = fr-value *("," fr-value) / fr-value "-" fr-value fr-value = 1*3DIGIT [ "." 1*2DIGIT ] bw-mod = "AS" / "TIAS" / token ; for future extensions bw-value = 1*DIGIT ; WSP, DQUOTE and DIGIT defined in [RFC5234] ; token and non-ws-string defined in [RFC4566]
Figure 2: ABNF for Media Configuration
A media configuration is thus identified by:
The media configuration MUST contain at least one and MAY contain more of the below media configuration entries. Each entry type MUST NOT appear more than once in every media configuration.
Media configuration entry types "pt" and "b" MUST be supported by all implementations of this specification. Otherwise, an implementation MAY ignore any media configuration entry types that are not understood. A media configuration MAY be re-used to describe more than a single Source Packet Stream.
The Session and Media level attributes and parameters outside of individual media configurations (a=config-id) provides limitations on the set of media configurations in simultanuous use. For example a media description bandwidth limitation using b=AS would apply on all the Packet Streams sent within the scope of that media description, thus forcing the sum of the media configuration bandwidth in use to share that available bandwidth. Don't forget other Packet Streams such as RTP retransmission or FEC flows that also needs to be included.
There exist a number of different limitations, and this section does not intend to be complete. The payload formats and their configurations can offer limitations, for example video profile and levels imposes a joint limit on bit-rate, frame-rate and resolution. The bandwidth parameters on session and media description level apply according to their semantics and their level. Packetization limitations, e.g. maxptime, as well as recommendations apply to all the configurations within the scope where this parameter is defined.
It is important to note that limits, such as bandwidth expressed within a media configuration are not limited by the media description values. First of all, the sum of bit-rates across all media configurations in a media description can be greater than the media description limit as not all configurations may be in simultanuous use. For example, only a single configuration can be enabled, which is then allowed to consume the full outer limit. Secondly, the media configuration directionality needs to be taken into account, for example that SDP receiver limitations are not applied to the sender configuration.
When used as a declarative media description, config-id with recv parameter indicates the configured end-point's required media configuration to receive a specified set of Source Packet Streams as Simulcast streams. In the same fashion, config-id with send parameter requests the end-point to use the specified media configuration when sending a specified set of Source Packet Streams as Simulcast streams.
An offerer wanting to use Simulcast in a specific direction SHALL use config-id to describe the media configurations to use in that direction in the Offer.
An answerer receiving a config-id media configuration for a specific direction, accepting to use that media configuration SHALL include a corresponding media configuration with the reverse direction in the Answer. The config-id identification value MUST be kept between the Offer and the Answer. An answerer not accepting to use a specific media configuration SHALL remove it from the Answer.
The Answer MUST keep exactly the same media configuration types in a media configuration as were present in the corresponding media configuration in the Offer.
The answerer MAY remove values from enumerations and MAY reduce ranges of media configuration entries in the Answer. If the reduced media configuration entry relates to the answerer's send direction, negotiation is complete and no further action is needed. If the reduced media configuration relates to the answerer's receive direction, the offerer SHOULD send another Offer where that related, send direction media configuration is reduced at least to the level in the previous Answer, but MAY be reduced even more, and MAY be removed entirely.
A set of media configurations [sec-media-config] is needed to describe a Simulcast. Each Source Packet Stream in the Simulcast share the same Media Source, but have different media configurations. Thus, the actual grouping of media configurations is what defines a specific Simulcast. It is proposed to define two new media level and session level SDP attributes, "a=sim-send" and "a=sim-recv", which uses config-id values to group media configurations for the purpose of Simulcast transmission and reception, respectively. "a=sim-send" and "a=sim-recv" MAY be used independently and simultaneously. They MAY be used on session level to group media configurations when different Simulcast encodings of a Media Source are to be sent in different Media Transports and RTP sessions. They MAY also be used on media level to group media configurations when different Simulcast encodings of a Media Source are to be sent based on the same media description and thus use the same Media Transport and RTP session. When used on media level, the Simulcast direction MAY conflict with the general media description direction, but a conflict MUST be interpreted as the Simulcast being effectively inhibited. For example, sim-send in a recvonly media description means that no Simulcast Source Packet Streams are sent.
simulcast = "a="( "sim-send:" / "sim-recv:" ) config-id-list config-id-list = config-item *(WSP config-item) config-item = config-id [":" config-param-list] config-id = token config-param-list = config-param *("," config-param) config-param = "inactive" / token ["=" param-value] ; for future extension param-value = 1*(value-char) / DQUOTE non_ws_string DQUOTE value-char = token-char / %x28 / %x29 / %x2F / %x3A-3C / %x3E-40 / %x5B-5D ; VCHAR except "=" and "," ; WSP and VCHAR defined in [RFC5234] ; token, token-char and non_ws_string defined in [RFC4566]
Figure 3: ABNF for Simulcast Configuration Grouping
The config-id identification of a media configuration MUST be defined by a "config-id" attribute in any of the media descriptions that are part of the SDP.
When used as a declarative media description, sim-recv indicates the configured end-point's required ability to receive Source Packet Streams with the specified set of media configurations as Simulcast streams. In the same fashion, sim-send requests the end-point to send Source Packet Streams with the specified set of media configurations as Simulcast streams.
The configuration parameter "inactive" SHALL be interpreted as the related Source Packet Stream is in PAUSED state [I-D.westerlund-avtext-rtp-stream-pause] at the start of the session, and applicable RTP level procedures from that specification SHALL be applied.
An offerer wanting to send a set of Source Packet Streams as Simulcast streams includes sim-send in the Offer to describe which media configurations to use for that Simulcast. Similarly, an offerer wanting to receive a set of Source Packet Streams as Simulcast streams includes sim-recv in the Offer to describe which media configurations to use for that Simulcast.
An answerer receiving sim-send, accepting to receive those media configurations as Simulcasted Source Packet Streams SHALL include sim-recv with the accepted media configurations in the Answer. Similarly, an answerer receiving sim-recv, accepting to send those media configurations as Simulcasted Source Packet Streams SHALL include sim-send with the accepted media configurations in the Answer. An answerer MAY remove media configurations from sim-send or sim-recv included in the Answer compared to the ones included in the sim-send or sim-recv in the Offer. The answerer MUST NOT add any media configurations to sim-send or sim-recv in the Answer that were not in the corresponding ones in the Offer.
An "inactive" parameter present in the Offer MUST be kept in the Answer. The Answer MAY add an "inactive" parameter to any of the media configurations. An "inactive" parameter on a media configuration in "sim-recv" is equivalent to a PAUSE (or in some cases, an equivalent TMMBR 0) message [I-D.westerlund-avtext-rtp-stream-pause] being sent for the received Source Packet Stream at the start of the session, and applicable RTP level procedures from that specification SHALL be applied. An "inactive" parameter on a media configuration in "sim-send" is equivalent to the related Source Packet Stream being in PAUSED state at the start of the session, and applicable RTP level procedures SHALL be applied.
The number of different Source Packet Streams used for a Simulcast related to a single media description MUST NOT exceed the number of listed media configurations in the corresponding sim-recv in that media description sent by the media receiver.
To ensure that Simulcast Packet Streams can be related correctly on RTP level, SDES SRCNAME [I-D.westerlund-avtext-rtcp-sdes-srcname] MUST be used to label Simulcast versions belonging to the same Media Source. The RTP Header Extension option of that specification MAY be used with Simulcast.
The SRCNAME identifier for Simulcast MUST contain a first part that uniquely identifies the Media Source within a given CNAME, followed by a single "." (period) and the config-id as defined above [sec-media-config].
The SRCNAME parameter to source-specific signaling [RFC5576] ("a=ssrc") MAY be used for Source Packet Streams in the send direction to relate SRCNAME to SSRC already in the SDP.
The new "a=sim-send-cap" and "a=sim-recv-cap" attributes MAY be included in the SDP as an optional pre-stage in a two-phased approach, where the pre-stage involves a first SDP Offer/Answer procedure that only establishes Simulcast capability at both the offerer and the answerer. This has the additional advantage to avoid sending media descriptions related to Simulcast to an endpoint that does not support simulcast. In case two Offer/Answer procedures are already used for other reasons, it will not incur any significant extra signaling round-trips. Such other two-phase techniques include use of SIP OPTIONS, SIP UPDATE [RFC3311] with reliable provisional responses, and BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation].
Thus, when using the pre-stage Offer/Answer, it SHOULD NOT include any simulcast-grouped media descriptions, which SHOULD then instead be added in a main Offer/Answer phase. When using the pre-stage Offer/Answer, half a signaling round-trip time can sometimes be saved if main phase is initiated by the Simulcast receiver, meaning that the endpoint that included "a=sim-recv" in the pre-stage SDP is the offerer in the main phase. If both endpoints are Simulcast receivers, it does not matter which endpoint sends the main Offer, using regular Offer/Answer rules to handle any race conditions.
It is not possible to use any pre-stage to establish capability with declarative SDP, in which case it SHALL be by-passed, using only the main phase directly.
These examples are for a case of client to video conference service using a centralized media topology with an RTP mixer.
+---+ +-----------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Mixer | +---+ | | +---+ | F |<---->| |<---->| J | +---+ +-----------+ +---+
Figure 4: Four-party Mixer-based Conference
Alice is calling in to the mixer with a Simulcast-enabled Unified Plan client capable of a single Media Source per media type. The only difference to a non-Simulcast client is capability to send video resolution [RFC6236] ("imageattr") and framerate based Simulcast. Alice uses a pre-stage Offer, which looks like:
v=0 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 s=Simulcast Enabled Unified Plan Client t=0 0 c=IN IP4 192.0.2.156 b=AS:665 a=sim-send-cap:imageattr framerate m=audio 49200 RTP/AVP 96 8 b=AS:145 a=rtpmap:96 G719/48000/2 a=rtpmap:8 PCMA/8000 m=video 49300 RTP/AVP 97 b=AS:520 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c01e a=imageattr:97 send [x=640,y=360] [x=320,y=180] \ recv [x=640,y=360] [x=320,y=180]
Figure 5: Unified Plan Simulcast Pre-Stage Offer
In this pre-stage, the only thing in the SDP that indicates Simulcast capability is the line in the video media description containing the "sim-send-cap" attribute, which also indicates that sent Simulcast versions can differ in video resolution and/or framerate.
The Answer from the server indicates both that it too is Simulcast capable and that it would prefer to use video resolution ("imageattr") based Simulcast, but that it supports both video resolution and framerate. Should it not have been Simulcast capable, the "a=sim-recv-cap" line would not have been present and communication would have started with the media negotiated in the SDP.
v=0 o=server 823479283 1209384938 IN IP4 192.0.2.2 s=Answer to Simulcast Enabled Unified Plan Client t=0 0 c=IN IP4 192.0.2.43 b=AS:665 a=sim-recv-cap:imageattr=1.0 framerate=0.8 m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 m=video 49300 RTP/AVP 97 b=AS:520 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c01e a=imageattr:97 send [x=640,y=360] [x=320,y=180] \ recv [x=640,y=360] [x=320,y=180]
Figure 6: Unified Plan Simulcast Pre-Stage Answer
Since the server is the Simulcast media receiver, it immediately initiates another Offer/Answer including details on the Simulcast versions. The server also keeps the "sim-recv-cap" as explicit Simulcast capability indication in this main Offer/Answer. Note that the "non-simulcast" media can be started already now, before the main Offer/Answer, with the only restriction that the Simulcast functionality is not yet established.
v=0 o=server 823479283 1209384938 IN IP4 192.0.2.2 s=Server Inviting Simulcast Enabled Unified Plan Client t=0 0 c=IN IP4 192.0.2.43 b=AS:825 a=sim-recv-cap:imageattr=1.0 framerate=0.8 m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 m=video 49300 RTP/AVP 97 b=AS:2200 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c01e a=config-id:a recv pt=97 imageattr=[x=640,y=360],[x=1280,y=720] \ framerate=25-60 b=AS:500-2500 a=config-id:b recv pt=97 imageattr=[x=320,y=180],[x=640,y=360] \ framerate=25-60 b=AS:150-500 a=config-id:c recv pt=97 imageattr=[x=256,y=144],[x=320,y=180] \ framerate=10-30 b=AS:100-250 a=sim-recv:a b c
Figure 7: Unified Plan Simulcast Main Offer
The server chooses to structure the Answer according to Unified Plan and has added three config-id lines in the video media description, one for each Simulcast media configuration that it is prepared to receive. Each media configuration refers to a defined media format, and lists a set of preferred video resolutions as well as a range of acceptable framerates, concluded by a bandwidth range. It also includes the sim-recv attribute for those three media configurations, indicating that the Simulcast it is prepared to receive in this media description can include one or more of those media configurations.
Alice's Answer is:
v=0 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 s=Final answer from Simulcast Enabled Unified Plan Client t=0 0 c=IN IP4 192.0.2.156 b=AS:825 a=sim-send-cap:imageattr framerate m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 m=video 49300 RTP/AVP 97 b=AS:520 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c01e a=config-id:b send pt=97 imageattr=[x=640,y=360] \ framerate=25-30 b=AS:150-400 a=config-id:c send pt=97 imageattr=[x=320,y=180] \ framerate=10-12.5 b=AS:100-150 a=sim-send:b c:inactive a=ssrc:31053821 cname=SDIe93850aQFid9P srcname=1.b a=ssrc:43298172 cname=SDIe93850aQFid9P srcname=1.c a=imageattr:97 send [x=640,y=360] [x=320,y=180] \ recv [x=640,y=360] [x=320,y=180]
Figure 8: Unified Plan Simulcast Main Answer
The Simulcast capability, sim-send-cap, is kept from Alice's previous Offer. One of the media configurations from the server Offer, config-id:a, is not acceptable to Alice's client for some reason and is removed from the Answer. The resulting Simulcast, described by sim-send, thus contains two media configurations, b and c, where c is initially set to "inactive" that effectively means it is paused from the start of the session. The media configuration parameter value ranges are in some cases reduced, which makes a more precise definition of what will actually be sent. This Answer SDP also includes a specification of the SSRC values that will be sent and what media configurations those SSRC will carry, by including the srcname parameter. The first part of srcname, before the ".", is the Media Source identification. Both SSRC share the same Media Source identification, since they are part of the same Simulcast. The second part, after the ".", is the config-id of the media configuration sent with that SSRC.
Bob is calling in to the mixer with a Simulcast-enabled client, like Alice's capable of a single Media Source per media type, but also capable of sending Source Packet Streams as Simulcast versions on separate Media Transports. In this example, Bob's client knows that the server is capable of Simulcast and does not use any pre-stage Offer, but goes straight to the main Offer.
v=0 o=bob 94572932847 3429478298 IN IP4 192.0.2.93 s=Offer from Simulcast Enabled Multi-Transport Client t=0 0 c=IN IP4 192.0.2.93 b=AS:825 a=sim-send-cap:imageattr=1.0 framerate=0.9 a=sim-send:x y m=audio 50138 RTP/AVP 101 b=AS:145 a=rtpmap:101 G719/48000/2 m=video 50226 RTP/AVP 118 b=AS:500 a=rtpmap:118 H264/90000 a=fmtp:118 profile-level-id=42c01e a=config-id:x send pt=118 imageattr=[x=320,y=180],[x=640,y=360] \ framerate=25-50 b=AS:200-500 a=ssrc:3929384298 cname=Nsdko39Oen828FKn srcname=M.x a=imageattr:118 send [x=640,y=360] [x=320,y=180] \ recv [x=640,y=360] [x=320,y=180] m=video 50228 RTP/AVP 119 b=AS:150 a=config-id:y send pt=119 imageattr=[x=256,y=144],[x=320,y=180] \ framerate=12.5-25 b=AS:100-200 a=ssrc:1923419284 cname=Nsdko39Oen828FKn srcname=M.y a=imageattr:119 send [x=320,y=180] [x=256,y=144] a=sendonly
Figure 9: Multi-Transport Simulcast Main Offer
As can be seen from above, this Offer uses sim-send on session level and has split the Simulcast media configurations on two media descriptions, in order to be able to use separate Media Transports and enable differentiated treatment of the two Simulcast streams.
The server accepts this structure to the Answer:
v=0 o=server 283479882 9384298374 IN IP4 192.0.2.2 s=Server Answering Simulcast Enabled Multi-Transport Client t=0 0 c=IN IP4 192.0.2.45 b=AS:825 a=sim-recv-cap:imageattr framerate a=sim-recv:x y m=audio 49200 RTP/AVP 96 b=AS:145 a=rtpmap:96 G719/48000/2 m=video 49300 RTP/AVP 118 b=AS:500 a=rtpmap:118 H264/90000 a=fmtp:118 profile-level-id=42c01e a=config-id:x recv pt=118 imageattr=[x=640,y=360] \ framerate=25-50 b=AS:350-500 a=imageattr:118 send [x=640,y=360] [x=320,y=180] \ recv [x=640,y=360] [x=320,y=180] m=video 49300 RTP/AVP 119 b=AS:150 a=rtpmap:119 H264/90000 a=fmtp:119 profile-level-id=42c01e a=config-id:y recv pt=119 imageattr=[x=256,y=144] \ framerate=12.5-25 b=AS:120-150 a=imageattr:119 recv [x=320,y=180] [x=256,y=144] a=recvonly
Figure 10: Multi-Transport Simulcast Main Answer
Fred is calling in to the same conference as in the examples above with a three-camera, three-display system, thus capable of handling three separate Media Sources in each direction, where each Media Source is also Simulcast-enabled in the send direction. Fred's client is a Unified Plan client, restricted to a single Media Source per media description.
v=0 o=fred 238947129 823479223 IN IP4 192.0.2.125 s=Offer from Simulcast Enabled Multi-Source Client t=0 0 c=IN IP4 192.0.2.125 b=AS:825 a=sim-send-cap:imageattr=1.0 framerate=0.5 m=audio 49200 RTP/AVP 98 b=AS:145 a=rtpmap:98 G719/48000/2 m=video 49600 RTP/AVP 100 b=AS:3500 a=rtpmap:100 H264/90000 a=fmtp:100 profile-level-id=42c02a a=config-id:1h send pt=100 imageattr=[x=1920,y=1080] \ framerate=30-60 b=AS:2000-3500 a=config-id:1m send pt=100 imageattr=[x=1280,y=720] \ framerate=15-60 b=AS:1000-2000 a=config-id:1l send pt=100 imageattr=[x=640,y=360] \ framerate=10-60 b=AS:200-1000 a=sim-send:1h 1m 1l a=ssrc:2397234521 cname=EkeS32892FeO29DK srcname=1.1h a=ssrc:1023894789 cname=EkeS32892FeO29DK srcname=1.1m a=ssrc:4029284928 cname=EkeS32892FeO29DK srcname=1.1l a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \ recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] m=video 49600 RTP/AVP 100 b=AS:3500 a=rtpmap:100 H264/90000 a=fmtp:100 profile-level-id=42c02a a=config-id:2h send pt=100 imageattr=[x=1920,y=1080] \ framerate=30-60 b=AS:2000-3500 a=config-id:2m send pt=100 imageattr=[x=1280,y=720] \ framerate=15-60 b=AS:1000-2000 a=config-id:2l send pt=100 imageattr=[x=640,y=360] \ framerate=10-60 b=AS:200-1000 a=sim-send:2h 2m 2l a=ssrc:2301017618 cname=EkeS32892FeO29DK srcname=2.2h a=ssrc:639711316 cname=EkeS32892FeO29DK srcname=2.2m a=ssrc:3293473905 cname=EkeS32892FeO29DK srcname=2.2l a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \ recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] m=video 49600 RTP/AVP 100 b=AS:3500 a=rtpmap:100 H264/90000 a=fmtp:100 profile-level-id=42c02a a=config-id:3h send pt=100 imageattr=[x=1920,y=1080] \ framerate=30-60 b=AS:2000-3500 a=config-id:3m send pt=100 imageattr=[x=1280,y=720] \ framerate=15-60 b=AS:1000-2000 a=config-id:3l send pt=100 imageattr=[x=640,y=360] \ framerate=10-60 b=AS:200-1000 a=sim-send:3h 3m 3l a=ssrc:4115355057 cname=EkeS32892FeO29DK srcname=3.3h a=ssrc:3196538337 cname=EkeS32892FeO29DK srcname=3.3m a=ssrc:3757973912 cname=EkeS32892FeO29DK srcname=3.3l a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \ recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]
Figure 11: Fred's Multi-Source Simulcast Main Offer
The three media descriptions for video are essentially the same, except values that needs to be unique are provided unique values. The above also assumes that BUNDLE will be used across these three video media description to create a common RTP session.
Simulcast is in defined as the act of sending multiple alternative encodings of the same underlying media source. When transmitting multiple independent streams that originate from the same source, it could potentially be done in several different ways using RTP. A general discussion on considerations for use of the different RTP multiplexing alternatives can be found in Guidelines for Multiplexing in RTP [I-D.ietf-avtcore-multiplex-guidelines]. Discussion and clarification on how to handle multiple streams in an RTP session can be found in [I-D.ietf-avtcore-rtp-multi-stream].
The network aspects that are relevant for Simulcast are:
This document requests that five new attributes, sim-send-cap, sim-recv-cap, sim-send, sim-recv, and config-id. It is also requested to make a new registry of defined parameters taken from existing SDP attributes for sim-send-cap, sim-recv-cap, and config-id.
Formal registrations to be written.
The Simulcast capability and configuration attributes and parameters are vulnerable to attacks in signaling.
A false inclusion of Simulcast attributes may result in generation of a second phase SDP that potentially contains a large number of non-supported media descriptions expressing Simulcast alternatives. A correct SDP implementation will however be able to reject any non-supported media descriptions and the effect from that should be limited.
A hostile removal of the Simulcast attributes will result in skipping any second phase Offer/Answer and that Simulcast is not used.
The Simulcast grouping semantics are vulnerable to attacks in the signalling. Changing the set of media configurations that are used in a Simulcast will impact the number of Source Packet Streams.
A hostile removal of Simulcast grouping will prevent streams from being interpreted as Simulcast, which obviously prevents use of the Simulcast functionality. It will also risk that intended Simulcast streams are instead presented as separate, independent streams to a receiver.
Neither of the above will likely have any major consequences and can be mitigated by signaling that is at least integrity and source authenticated to prevent an attacker to change it.
Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have contributed with important material to the first versions of this document.
Receiver diversity can be handled in a number of different ways, each with its own advantages and disadvantages. In that, there are relations between RTP Mixer processing requirement, bandwidth usage on uplink from sending Participant to RTP Mixer, bandwidth usage on downlink from RTP Mixer to receiving Participant, and media Quality of Experience at the receiving Participant.
The following is a listing of possible approaches:below [tab-diversity]:
A summary of the advantages and disadvantages of the above four principle alternatives is given
Method | Mixer CPU | Uplink | Downlink | QoE |
---|---|---|---|---|
1 | Low | Low | Low | Low |
2 | Very high | Optimum | Optimum | Near optimum |
3 | Low | Very high | Optimum | Optimum |
4 | Low | High | Near optimum | Near optimum |
The authors of this document believes that alternative 4, the Grouped Simulcast, can be a good tradeoff whenever supported by sufficient uplink resources.