Network Working Group | M. Westerlund |
Internet-Draft | B. Burman |
Intended status: Standards Track | L. Hamm |
Expires: September 04, 2012 | Ericsson |
March 5, 2012 |
Codec Operation Point RTCP Extension
draft-westerlund-avtext-codec-operation-point-00
The Audio-Visual Profile with Feedback (AVPF) specification defines a framework and messages for fast feedback and media control over RTCP. The Codec Control Messages (CCM) specification defines an extension to AVPF, by specifying additional messages for codec control and feedback. This specification extends CCM, by specifying messages that let participants dynamically communicate a set of codec configuration parameters, which enables better optimization of resource efficiency and quality of media transmission.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 04, 2012.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Multimedia real-time communication services, such as video telephony and videoconferencing, use the real-time transport (RTP/RTCP) [RFC3550] protocol to transmit media streams, such as audio and video. A session establishment protocol, such as SIP [RFC3261], in combination with a capability negotiation protocol, such as SDP offer/answer [RFC3264] is normally used to establish the session and negotiate media capabilities. In some cases, a set of codec parameters is negotiated that does not express any specific limit or capability, but just describes a certain codec configuration.
During session establishment, the participating endpoints normally have limited knowledge about the session environment, e.g. whether the session will be point-to-point or contain some multi-party scenario, how users will interact with the application, how network conditions will vary during the session, etc. To take those variations into account, the participants can re-negotiate session parameters to better suit the communication environment. At times, when variations or changes are frequent in nature, it will require the needed reaction time to be short, which may make repeated session re-negotiation inefficient and/or too slow. In addition, variations may not even affect negotiated session parameters, if the variations occur within the negotiated boundaries.
The above scenario can become critical especially in cases where a given media stream is transmitted towards, and received by, multiple receivers. In multi-party environments, scalable encoding or simulcast can be used to make the system more efficient and provide better quality to participants that are capable of receiving and utilizing the higher quality. These use cases results in that a sending party is requested to deliver multiple encoder operation points.
The Audio-Visual Profile with Feedback (AVPF) specification [RFC4585] defines a framework and messages for fast feedback and media control over RTCP. The Codec Control Messages (CCM) specification [RFC5104] defines an extension to AVPF, by specifying additional messages for codec control and feedback. This specification extends CCM, by specifying messages that let participants dynamically communicate a set of codec configuration parameters, which enables better optimization of resource efficiency and quality of media transmission.
The codec configuration parameters specified in this document focus on some basic audio and video properties, such as video resolution, video frame rate, media stream bit-rate, audio sampling rate, number of audio channels, maximum RTP packet size and rate. Additional parameters can be standardized in the future.
The codec control messages are not meant to replace configuration performed using e.g. SDP. Instead, the messages can be used to communicate dynamic and frequent changes that take place within boundaries that have been negotiated as part of the session establishment.
The following terms and abbreviations are used in this document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
Networks can contain endpoints with different capabilities, including CPU power, capture and render device fidelity (e.g. image resolution), and codecs. In addition, the characteristics and properties of networks can vary, which endpoints have to cope with. For example, in videoconferencing and telepresence services, a large number of endpoints may participate, and there may be a large number of media streams associated with the session. Such multi-party scenarios typically use entities for media mixing, switching and transcoding. The aim is generally to provide the best possible quality to each endpoint, taking endpoint and network capabilities into consideration.
Many communication services today use codecs that can be configured in a number of different ways. Often, the codecs have multiple properties that can be configured and those properties may also be inter-related, often in complex ways. One example is the H.264 (AVC) [H264] video codec and its scalable (SVC) and multi-view (MVC) versions. Most other video codecs, and codecs for many other types of media, also have multiple configurable properties. Such configurable properties will be referred to as "Codec Configuration Parameters" in this specification.
There can be several reasons to change the media rate or other encoding or packetization properties during an ongoing communication session. One reason can be that the available network bandwidth varies. Another reason can be that other network properties changes, such as effective MTU or packet rate limitations. Other reasons can be that the quality or representation of the media rendered to the end user changes, maybe as a direct result of the user manipulating the GUI (e.g. changing window position or size), the relative importance of the received media stream changes (e.g. active or non-active speaker in a conferencing scenario), or the user selects to show some other content source that is available among the advertised media streams.
The codec changes above can be made directly between endpoints in a point-to-point scenario, or they may involve, and be acted upon, by media aware intermediaries (e.g. RTP mixers). An RTP mixer can do transcoding to provide each receiver with media streams of adapted quality, but transcoding has drawbacks as it always consumes processing power, typically impacts media quality in a negative way, and often introduces additional delays.
In order to avoid separate transcoding towards each endpoint, an RTP Mixer can, by taking the capabilities of the endpoints into account, decide to request specific codec configurations from endpoints, which will minimize the need for transcoding. Also, in scenarios where no RTP Mixers are used and transmitted media reaches multiple endpoints, the sender will have to take into account that each endpoint may have different capabilities. The use cases section [sec-usecases] shows different use cases, with and without RTP Mixers.
Resource optimization involving bandwidth is expected to be one of the major reasons for changing encoding properties, since it is in general desirable to avoid using more bandwidth than absolutely necessary, especially considering that
Other resources that may be desirable to optimize include, but is not limited to, endpoint and middle node processing (CPU) utilization, and transport quality (QoS).
A media receiver cannot be assumed to know exactly what codec configuration will be best for the media sender to use, given that the sender needs to take multiple aspects into account, including implementation limitations in the actual encoder. It should be more likely to find a value acceptable to both sender and receiver if the receiver can indicate an acceptable range instead of just a single value.
When an RTP Mixer distributes streams to multiple receivers with different media quality requirements, it is sometimes possible to avoid targeted transcoding for every single receiver. That can be accomplished if the media sender has the ability to produce multiple media versions, such as for example scalable encoding or simulcast. Thus there is a need to both address specific media versions and describe the fact that multiple media versions with different configurations should be used.
The session description protocol (SDP) [RFC4566] is commonly used to negotiate and configure codecs and establish RTP/RTCP session parameters during session establishment, and during sessions, e.g. by using it in conjunction with SIP [RFC3261] and SDP Offer/Answer [RFC3264].
As described Section 3.1 above, many of the underlying reasons that makes media receivers desire certain codec encoding properties are highly dynamic in nature and using SIP/SDP to re-negotiate the session will in many cases be too slow to be useful. SIP messages containing an SDP may become quite large for sessions containing many media, and since there is no defined way to send a partial SDP, even very small changes require sending the entire SDP. Most of the current defined properties in SDP are also oriented to be common for all media streams in the same RTP session, rather than be specific to one media stream.
The mechanism in this specification does not replace SDP, or the SDP Offer/Answer mechanism. It is expected that SDP is used in order to negotiate and configure boundary values for codec properties, and COP can then be used to communicate specific values within those boundaries, as long as there is no impact on the values negotiated using SDP. It is possible to establish communication sessions even if one or more endpoints do not support COP.
As discussed in CCM, regular RTCP reporting or extended reports [RFC3611] can to some extent be used to re-configure an encoder, but the reported measures seldom map directly back to encoding properties and they typically cannot express an unwanted situation in terms of encoding properties and what the receiver would like to receive instead. Communicating codec properties indirectly as a set of network properties will require interpretation by both sender and receiver and will thus risk misinterpretations and ambiguity. Since it is likely that a decoder is able to identify unwanted characteristics of the media stream in terms of encoding properties, the most straightforward approach is to convey those properties directly to the encoder.
Responsive techniques to control encoding are already available, e.g. Codec Control Messages (CCM) [RFC5104]. Although highly applicable, the possibilities to control encoding is however not explicit enough, both in terms of the amount of available parameters to control, and the fact that they may be inter-related, alternative, or both.
Some codecs define codec-specific methods to enable receiver control of some encoding aspects, but it should be beneficial for interoperability to use codec agnostic signaling instead.
This section discusses a number of use cases for Codec Operation Points.
This set of use cases are all focused on that communication is directly point to point between a media sender and a receiver. There is no need for further forwarding of the media streams. Thus, the goal should be to produce a media stream, transport it to the media receiver, where it is consumed as optimal as possible for the application. Thanks to this one-to-one mapping between encoder and decoder, great flexibility exists to produce a media stream tailored to the receiver's needs, given the constraints that exist from media sender, transport network and the receiver.
Some constraints will be static (and thus suitable for session configuration signalling), but a number of these are highly dynamical and thus desirable to adapt to during the session:
This section considers a multiparty session with a centralized media intermediary, like an RTP mixer, where the media receiver uses COP to affect the delivered media.
+------------+ +---+ | |--RTP-->| B | | |<--COP--| | | | +---+ | | +---+ | | +---+ | A |-RTP->| Mixer |--RTP-->| C | +---+ | | +---+ | | | | +---+ | |--RTP-->| D | +------------+ +---+
In the above Figure 1 we focus on the possible usages of COP by a media receiver, like B. Here the functional role of the intermediary becomes important. An RTP mixer uses its own SSRC(s) to channel selected media streams to B from other participants like A. If the intermediary is instead a translator, the Receiver B can see A's SSRC(s) directly instead of possibly showing up as CSRC. We will in this section focus on the Mixer case. The RTP translator case is further discussed in Section 4.4.
The RTP mixer's usage of its own SSRC allows particular mixer to receiver media flows to be associated with a particular role or purpose in the application rather than a given media source. When there exist multiple RTP streams from the mixer to a receiver, the receiver can use COP to request an operations point that better suits the receiver needs on each particular stream and possibly role of the media stream. It also allows the receiver to select its desired trade-off in properties and quality between multiple delivered media streams.
There exist some different reasons why B would need to indicate changes in its capabilities to receive a particular media stream;
In all the above cases the receiver sends a COP request to the mixer for new codec operation points on mixer controlled media stream(s). It then becomes the mixer's responsibility to determine if and how the requested COPs can be supported. For example by requesting new operations points from the media source as discussed in Section 4.3. The selection of another media source to deliver in a media stream can result in that the mixer may have to update the receiver on the properties of the operations point.
This section looks at the usage of COP in cases of multiparty with centralized media intermediary, like an RTP mixer, selecting and requesting tailored media stream or streams a media sender delivers to the intermediary for further forwarding or manipulation. This usage can be simplified down to looking at the media streams from one media sender (A), which is currently being delivered to multiple receivers (B-D) as depicted in Figure 2.
+------------+ +---+ | |--RTP-->| B | | | +---+ +---+ | | | A |<-COP-| | +---+ | |-RTP->| Mixer |--RTP-->| C | +---+ | | +---+ | | | | +---+ | |--RTP-->| D | +------------+ +---+
The media path from the Mixer to B, C and D are different and thus the available resources may vary between them. In addition B, C and D may have different capabilities when it comes to handling media streams. These limitations can be learned by the Mixer through session configuration signalling, media transmission feedback (e.g. RTCP), or usage of COP by the receivers (See Section 4.2). Limitations are also expected to be updated during the session lifetime.
The media sender (A) has certain capabilities and what is possible to do will depend on A's capabilities and what has been configured between A and the Mixer. Let's look at a few different cases of the capabilities A may have and how that influence how the Mixer can use COP to affect the media stream(s) delivered to the Mixer.
The use of COP as described above can be triggered by a multitude of reasons. We will here discuss some of them. We already mentioned that bit-rate adaptation (congestion control) on the Mixer to receiver path can indicate a need to change an operation point. Another reason is when a new session participant joins that has certain receiver capabilities (both decoding or other hardware, as well as network path related), thus potentially changing the optimal set of operation points. There also exist a number of different cases where the desired application behavior results in changes in desired operation points, like change of active speakers, reconfiguration of the display layout, etc.
It is also important to remember that Figure 2 only presents the view of a single media sender. In most communication sessions there are multiple media senders, and the mixer will need to take the combination of media streams from multiple media senders into account when choosing what is to be sent to a given receiver. Thus changes at one media sender can result in related changes of the operation points at the other media senders.
This section covers usage of COP in multicast transported RTP sessions, as well as when transport translators [RFC5117] are used. Transport translators can be used to emulate any source multicast (ASM) over unicast. Multicast usages also include Source Specific Multicast (SSM) [RFC4607], which according to "RTP Control Protocol (RTCP) Extensions for Single-Source Multicast Sessions with Unicast Feedback" [RFC5760] has two main modes; simple mode and summary feedback mode, affecting the usage of functionality that COP provides.
+---+ +------------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Translator | +---+ | | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+
A transport translator [RFC5117] , which main purpose is to forward any incoming packets to all the other session participants, emulates an ASM session. As anyone can send to all other in both cases, there are some properties in these sessions that can make use in large scale sessions with many participants require some extra consideration.
+-----+ +-----+ +-----+ | MS1 | | MS2 | .... | MSm | +-----+ +-----+ +-----+ ^ ^ ^ | | | V V V +---------------------------------+ | Distribution Source | +--------+ | | FT Agg | | +--------+------------------------+ ^ ^ | : . | : +...................+ : | . : / \ . +------+ / \ +-----+ | FT1 |<----+ +----->| FT2 | +------+ / \ +-----+ ^ ^ / \ ^ ^ : : / \ : : : : / \ : : : : / \ : : : ./\ /\. : : /. \ / .\ : : V . V V . V : +----+ +----+ +----+ +----+ | R1 | | R2 | ... |Rn-1| | Rn | +----+ +----+ +----+ +----+
In the above Figure 4, the media senders (MS1 .., MSm) send their media streams and RTCP traffic to the distribution source (DS). The DS forwards the RTP and RTCP traffic from the media senders to the SSM group. Using the RTCP extension for unicast RTCP feedback [RFC5760], the receivers (R1...Rn) send their RTCP traffic to their configured feedback target. This sample session has two feedback targets to scale with the amount of receivers. RTCP messages that needs to go to a media sender is forwarded to the FT aggregator part of the distribution source for further forwarding over the unicast paths between the distribution source and the media senders. The feedback target and the feedback aggregator also forwards all RTCP messages from receivers in simple mode, and aggregate it in summary mode. Some RTCP messages from a receiver may still have to be forwarded over the SSM group.
COP needs to support some reasonable functionality over the different multiparty topologies described above and it is also important that COP does not cause significant issues in any of the environments.
In the basic case, where only a single multicast group exists, there is a well known problem associated with adapting content and bit-rate to the receiver population. The more receivers, the larger the potential for non-matching requirements in requests from the different receivers. One strategy for meeting this is to use the lowest common denominator among the requests from the receiver population. This normally results in sub-optimal quality for a significant part of the session participants, the main benefit being that all participants will be able to receive some content.
Because the above limitations of operation within a single group, usage of COP in larger groups becomes difficult unless the parameters that can be adopted and affected by COP requests are such that a limited set of participants is expected to request them, and the impact for the others are limited or acceptable. The authors therefore expects the usage of COP in large groups to be limited and this specification focuses on operation in smaller groups. However, as it is not possible to define the threshold when a group changes from being small to be too large to work well with COP in the generic case, it is important that COP can operate safely in a large group, although the possibilities to satisfy the request may be severely limited.
There also exist use cases for COP where the media application uses multiple multicast groups to enable multiple operation points and allows each receiver to join the multicast groups that suits the participant's capabilities. An example of such usage would be Scalable Video Coding (SVC) using the Multi-Session Transport (MST) mode of the SVC RTP payload format [RFC6190]. The SVC MST RTP streams that are sent in each group can still contain multiple scalability layers; one could combine coarse-grained control on the operation points by having the receiver join a particular session with a more fine-grained control using COP to adjust the included scalability layers to suit the receiver's needs, such as lower CPU load.
The solution outlined in this specification should fulfill the following requirements:Guidelines for Extending RTCP [RFC5968] should be followed to the furthest extent possible.
In addition,
The mechanism described in this specification especially targets heterogeneous multi-party scenarios where different endpoints require differently encoded media from the same source, but its use in other situations is not precluded, in fact point to point scenarios is considered to be of equal importance but no more demanding that the multiparty case. In the targeted scenario, the media stream from one encoder is sent to multiple decoders, and hence the encoder must possibly provide an encoding with multiple operation points, suitable for the receivers. This is typically only possible with so-called scalable codecs, but some codecs may have inherent scalability features without being generally considered as scalable (e.g. H.264/AVC temporal scalability through non-reference frames). Multi-party services often involve a media mixer (Topo-Mixer) [RFC5117] as a central network node.
+---+ | S | +---+ | v +-------+ | Mixer | +-------+ / | \ v v v +---+ +---+ +---+ | A | | B | | C | +---+ +---+ +---+
The solution defined in this specification can be used during an active session to quickly adapt to changes in media receiver available bandwidth and/or preferences for one or more other codec properties, while still conforming to the session configuration, like SDP offer/answer negotiated minimum or maximum limits (depending on individual SDP property semantics). Some needed or wanted codec property changes will also motivate to re-negotiate the SDP, but the scope of this specification intends to cover only changes that lies within the SDP negotiated set and thus do not impact the SDP.
Three message types are defined to support the solution; a request, a notification, and a status report:
More details about the individual messages, but still on an overview level, can be found in sub-sections below. To do that, some other aspects need to be described first.
A COP message is sent from an RTP session participant in it's role either as media receiver or media sender. Each message can contain one or more message items of one or more message types, all originating from a single media source.
The individual message items each relate only to a single operation point, describing part of an atomic notification or request.
The general structure is outlined below:
+--------------------------------------+ | AVPF PSFB FMT="COP" | | SSRC of Packet Sender | | SSRC of Media Source | | +----------------------------------+ | | | COP Message Item 0 | | | +----------------------------------+ | | | (Codec Configuration Parameters) | | | +----------------------------------+ | | +----------------------------------+ | | | COP Message Item 1 | | | +----------------------------------+ | | | (Codec Configuration Parameters) | | | +----------------------------------+ | | ... | +--------------------------------------+
Note that the Request is the only COP Message Item defined in this specification that is sent in the media receiver role and makes use of "SSRC of Media Source" as the targeted media stream for the Request. Both the Notification and the Status Report Message Items are sent in the media sender role, reporting on the message sender's own configuration and thus relate only to the "SSRC of Packet Sender", being agnostic to the "SSRC of Media Source" field.
It is thus for example possible to co-locate COPS and COPN messages for the same media source in the same COP FCI. It is also possible to co-locate one or more COPR referring to a single "SSRC of Media Source" with one or more COPN and/or COPS relating to a single "SSRC of Packet Sender" within a single COP message.
Multiple Message Items of the same type in the same COP Message are used to describe a notification, status or request for a media stream containing multiple Operation Points [sec-overview-operation-point].
Multiple COP messages are needed to be able to refer to multiple different "SSRC of Packet Sender" and/or "SSRC of Media Source".
The Codec Configuration Parameters that are applicable to a certain codec may be specific to the media type (audio, video, ...), but may also be codec-specific. Some codec properties (described by Codec Configuration Parameters) have to be explicitly enabled by (non-RTCP based) capability signaling to be possible or permitted to use.
An end-point implementing this specification need not support all available Codec Configuration Parameters defined herein or in extensions to this specification. A certain parameter could also be uninteresting for a certain codec or media stream, even if it is generally supported by the end-point. This specification therefore defines capability signaling that allows a COP receiver to declare explicit support per parameter type on a per-codec level. The set of Codec Configuration Parameters that can be used for a certain media stream by a COP sender is thus restricted by the combination of applicability, capability signaling and explicit receiver parameter support signaling.
Any Codec Configuration Parameter that is applicable and feasible to use, but is not included as part of an Operation Point, has a default value. This default is defined for each Parameter Type, but should preferably whenever possible be taken from capability signaling. It is not necessary to use all defined Parameter Types in a media stream description. Some Parameter Types can, depending on media type or codec, either be un-interesting or not possible to describe or control in detail, in which case they can be left out, meaning that the effective value is "undefined" within the limits set by capability signaling (outside the scope of this specification).
The Codec Configuration Parameters contained in a single Message Item jointly constitutes a description of an Operation Point for a specific media stream from a media sender.
For the purpose of COP signaling, each such Operation Point is identified with an ID number, OPID, which is scoped by the media sender's RTP SSRC identification, and can be chosen freely by the media sender. The need for this media sub-stream identification basically only appears with scalable coding or other media encoding methods that introduces separable and configurable sub-streams within the same SSRC. An OPID thus refers to such configurable sub-stream, described by a set of related Codec Configuration Parameters.
+--RTP Session 1 ---------------------+ Media Source 1----+-+-> SSRC1 --> Sub-Stream 1 -> OPID1 | (MIC, Camera) | \-> Sub-Stream 2 -> OPID2 | | | Media Source 2-+--+---> SSRC2 --> Sub-Stream 1 -> OPID3 | | | \-> Sub-Stream 2 -> OPID4 | | | \-> Sub-Stream 3 -> OPID5 | | +-------------------------------------+ | | +--RTP Session 2 ---------------------+ +--+---> SSRC3 --> Sub-Stream 1 -> OPID6 | | \-> Sub-Stream 2 -> OPID7 | +-------------------------------------+
The above Figure 7 de-picts the possible relations between media sources, RTP sessions, RTP streams (SSRCs) and their sub-streams and the OPID.
For example, a single video camera may be encoded using SVC for a combined SST and MST transmission configuration. In that case some subset of scalability layers are sent as SST in the first RTP session using SSRC2. Another set of scalability layers are transported in the second RTP session as another SST using SSRC3. The RTP packet stream from each SSRC can thus contain several sub-streams, each identified with its own OPID. As a result, a single media source is present in two RTP sessions, using two different SSRCs (2 and 3) containing a total of five sub-streams (OPID 3 to 7).
Since an Operation Point can be expected to change over time, as a result of media receiver requests [sec-overview-request], resulting from local media sender considerations [sec-overview-notification], or both, the Operation Point (OPID) is version-handled. The version is scoped by SSRC and OPID.
It is expected that all encoders dividing a media stream into sub-streams will include some means to identify those sub-streams in the media stream. However, it is also expected that such identification is in general codec-specific. There is thus at times a need to map the codec agnostic COP OPID identification to codec specific identification, and this specification therefore includes a method for such mapping [sec-codec-sub-stream-id].
The request is sent by a media receiver, which can be either an end-point or a middle node such as an RTP Mixer. The receiver of the request may similarly be either the original media sender or a RTP Mixer. Included in the request is a description of the desired codec configuration for a specific media (sub-)stream. The parameter values communicated in a notification [sec-overview-notification] of that (sub-)stream is taken as a starting point when deciding what parameters and parameter values to choose for the request, and only parameters with changed values need to be in the request. The media receiver can of course also use other sources of information when choosing parameters and values, such as for example observation of the received media stream and capability signaling.
It is not an absolute requirement to have received a notification to be able to create a meaningful request. The request can include a set of changed properties for existing streams, but it can also request the addition or removal of one or more media sub-streams having certain properties, in which case there will be no notification to base the request on. A media receiver may also want to send a request prior to having received any notifications for existing streams, and can then base the request on other information such as for example observing the media stream or use information from the capability signaling. In case there is no existing stream and OPID to refer in the request, a "provisional" OPID MUST be chosen in the request, which will have to be mapped back to an existing (sub-)stream and "real" OPID through methods defined in this specification [sec-codec-sub-stream-id].
The media sender receiving a specific request is not required to re-configure the encoder accordingly, even if it should try to do so, but is allowed to take other (previous or concurrent) requests and any local considerations into account, possibly modifying some of the parameter values, or even totally rejecting the request if it is not seen as feasible. It is thus not possible for a media receiver to uniquely see from the media stream or even from a notification if the media sender received the request or if the request was lost and needs to be re-sent.
A request should typically be based on a certain notification, but there may be situations where a request is sent approximately simultaneously with a new notification for the same stream. In that case, there is a risk that the request is based on the wrong set of codec properties compared to the new notification. It is therefore necessary to have the set of codec properties, identified by an OPID, be version controlled. If a notification announces a specific version of the operation point, where the version is updated every time it is changed, the request can refer to that specific version and any mis-reference can be clearly identified and resolved. In addition, it allows for easy identification of repeated notifications and requests, simply by checking the operation point identification and the version, and without having to parse through all of the codec properties to see if any one changed.
The notification is sent by a media sender and describes a media stream or sub-stream in terms of a defined, finite set of codec properties. That same set of codec properties can also be used in a request [sec-overview-request]. The notification and a common set of defined properties is important to a media receiver since it is rarely possible to see from the media stream itself what controllable properties were used to generate the stream. The set of codec properties and their values used to describe a certain media stream at a certain point in time is henceforth called a codec configuration. Each Operation Point in this codec configuration is implemented using a certain RTP Payload Type, defined by capability signaling outside the scope of this specification.
It must be possible for a media sender to change codec configuration not only based on requests from media receivers, but also based on local limitations, considerations or user actions. This implies that the notification must be possible to send standalone and not only as a response to a request. To avoid that media receivers have to guess what codec configuration is used, a media sender should always send notifications whenever codec configuration for a stream changes. Loss of a notification should anyway not be critical since a media receiver could either fall back to infer approximate codec configuration from the media stream itself, or simply wait with a request until the next notification is sent.
A notification can potentially contain a large amount of codec properties. However, parameters that are not enabled by codec and COP capability signaling, or inherently not part of the used codec will not be included. The notification only describes the currently used codec configuration, and each parameter in an operation point will thus be described by a single value. To further limit the amount of properties that needs to be sent, it is possible to rely on parameter defaults (listed by individual parameter type definitions) whenever those values are acceptable.
The media receiver could want to take some local action at the time when the codec configuration in the media stream changes. Using the same reasoning as above, this may not be possible to see from the media stream itself. This functionality is explicitly enabled by inclusion of an RTP Time Stamp in the notification, where the Time Stamp describes a time (possibly in the future) when the media stream codec configuration is (estimated to be) effective.
The status report is sent by a media sender and is needed to confirm reception of a specific request OPID to avoid unnecessary retransmission of requests. Loss of a status report will likely trigger a request retransmission, except when the request sender can infer from the media stream or a notification that the stream is now acceptable.
The status report is not a required acknowledgement of every request, but instead reports on the last received request, identified by a request sequence number in addition to the OPID. That de-coupling of request and status report reduces the needed amount of status reports in case of frequently updated requests and/or lack of resources to send status reports.
If a request is somehow not acceptable to a media sender, the status report can also indicate failure and a reason for that failure.
In case the OPID in the request is a "provisional" OPID [sec-overview-request], the status report responds with that exact OPID, but also includes a reference to a "real" media (sub-)stream identification or OPID that the media sender considers appropriate for the request.
No description of any codec configuration is included in a status report, even if the corresponding request was successful. Used codec configuration is only carried in the notification [sec-overview-notification] message. Multiple status reports targeted for multiple request senders can through media (sub-)stream identification and OPID point to the same notification message, reducing the need to repeat applicable codec configuration parameters with every accepted request.
A media sender can unilaterally create a new Operation Point by simply selecting a free OPID identifier and use COPN to announce it.
To remove an Operation Point, the media sender simply stops announcing it in COPN. This procedure can be used both for entire media streams containing a single Operation Point and to add/remove sub-streams in media streams containing multiple Operation Points.
The media receiver can request a new Operation Point to be created by using a COPR with an unused identifier and a by setting a flag to indicate that this requests a new OPID. The media sender then decides if it honors the request or not, and announces the new OPID as described above.
The media receiver can indicate that it is no longer interested in receiving an Operation Point corresponding to a media sub-stream by not including any COPR Message Item for it in a single COP Message. The media receiver can indicate a wish to continue to receive an unmodified Operation Point using a COPR without any codec properties (no change).
This specification specifies a new feedback message, COP, for codec control of real-time media, as an extension to the AVPF [RFC4585] and CCM [RFC5104] specifications. The AVPF specification outlines a mechanism for fast feedback messages over RTCP, which is applicable for IP based real-time media transport and communication services. It defines both transport layer and payload-specific feedback messages. This specification targets the payload-specific type, since a certain codec is typically described by a payload type.
AVPF defines three and CCM defines four payload-specific feedback messages (PSFB). All AVPF and CCM messages are identified by means of the feedback message type (FMT) parameter. This specification specifies one additional payload-specific feedback message.
One new PSFB FMT value is assigned in this specification: next section [sec-parameters].
This section defines the feedback message structure, message items and their semantics with the exception of the actual codec configuration parameters which are defined in the
The COP message is a payload-specific AVPF CCM message identified by the PSFB FMT value listed above. It carries one or more COP Message Items, each with either a request for, a description of a certain "Operation Point"; a set of codec parameters, or a request status indication.
Not all Message Items makes use of the "SSRC of media source" in the common packet header. "SSRC of media source" SHALL be set to 0 if no Message Item that makes use of it is included in the FCI.
The COP FCI MUST contain one or more Codec Operation Point Message Items. The maximum number of COP Message Items in a COP message is limited by the [RFC4585] Common Packet Format 'length' field.
The definition of the AVPF feedback message format mandates that the FCI part is a multiple of 32-bit words. The below defined message items will not be 32-bit word aligned. Therefore it is sometimes necessary to insert one to three padding bytes at the end of the FCI. The number of padding bytes are determined by a receiver by comparing the sum of the message items and the feedback message length fields. The padding byte MUST be set to zero (0) and ignored on reception.
All Codec Operation Point Message Items share a common header format:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Type | Payload Length | OPID |N| Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : (Message Item Payload) :
The message header fields are:
The Message Types defined in this specification are:
Value | Message Item Type |
---|---|
0 | Codec Operation Point Notification (COPN) |
1 | Codec Operation Point Request (COPR) |
2 | Codec Operation Point Status (COPS) |
3-6 | Unassigned |
7 | Reserved for future extensions |
Each Message Type defined in this specification is described in detail in subsequent sections.
All RTP media streams belonging to the same session can per definition be identified by the SSRC. However, identification of any sub-streams contained in the same RTP media stream (SSRC) needs to use some other identification method, scoped by the SSRC. This is the case for a media stream containing more than one Operation Point, like for example SVC [RFC6190] streams being sent using Single Stream Transport (SST) RTP packetization.
The encoding of and restrictions for such sub-stream (Operation Point) identification will in general be codec specific. Therefore, the OPID used in this specification is merely an SSRC-unique identification number. It is however necessary to create a mapping between this generic number and the codec specific sub-stream identification that can be found in the media stream. This mapping is achieved by including the ID Parameter [sec-id] in a Message Item carrying a certain OPID.
In Section 10, codec specific ID Parameter formats are defined for a few of the most common codecs that supports scalability.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Type | Payload Length | OPID |N| Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transition Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|Payload Type | Codec Configuration Parameters : +-+-+-+-+-+-+-+-+ : : :
The COPN-specific message fields are (see also Message Item Format [sec-item-format]):
This message is used to inform the media receiver(s) about used Codec Configuration Parameters at the media sender. The available Codec Parameter Types that can be used to describe the Codec Configuration are defined in Section 8.
Some codecs may have clear inband indications in the encoded media stream of how one or more of the Codec Configuration Parameters are configured. For those codecs and Codec Configuration Parameters, COPN is not strictly necessary. Still, for some codecs and / or for some Codec Configuration Parameters, it is not unambiguously possible to see individual Codec Configuration Parameter Values from the encoded media stream, or even possible to see some Codec Configuration Parameters at all, motivating use of COPN.
COPN SHOULD be scheduled for transmission when it becomes known that there are media receivers in the RTP session that did not yet receive any Codec Configuration Parameters for an active Operation Point, or whenever the effective Codec Configuration Parameters has changed significantly, but MAY be scheduled for transmission at any time. The media sender decides what amount of change is required to be considered significant.
The reason for a Codec Configuration Parameter change can either be local to the sending terminal, for example as a result of user interaction or some algorithmic decision, or resulting from reception of one or more COPR messages [sec-copr].
If a media sender can no longer fulfill the established Codec Configuration Parameter restrictions of a Operation Point that was previously described by a COPN, it MAY change any Codec Configuration Parameter or even remove the entire Operation Point, and SHOULD then signal this at the earliest opportunity by sending an updated COPN to the media receiver(s).
An OPID can implicitly be indicated as no longer being used by omitting that OPID from the set of COPN message items in the COP PSFB message. All OPIDs that the media sender intends to use at the latest time indicated by any transition timestamp value in the set of COPN present in the COP PSFB message, MUST be included in that COP message.
All Operation Points referred by a COPS [sec-cops] SHOULD also be detailed by a COPN message contained in the same or in a subsequent COP feedback message, even if the Operation Point did not change significantly from previous COPN.
Note that the OPID Version of that COPN, subsequent to COPS, will be equal or larger than the Version indicated in the COPS. The Version difference may be larger than one (taking field wraparound into account) depending on the number of updated COPN sent since the COPR that triggered the COPS. See also description of those messages below.
Note: COPN may be seen as a more explicit and elaborate version of the TSTN message of [RFC5104] and most of the considerations detailed there for TSTN also apply to COPN.
The media sender decides what Codec Configuration Parameters to use in the COPN to describe an Operation Point. It is RECOMMENDED that all Codec Configuration Parameters that were accepted as restrictions based on received COPR messages are included. All Codec Configuration Parameters significantly more restrictive than implicit or explicit restrictions set by capability signaling (outside the scope of this specification) SHOULD also be included. Any Codec Configuration Parameter that are either not applicable to the Payload Type or not enabled by capability signaling MUST NOT be included. All Codec Configuration Parameters not covered by the above restrictions MAY be included.
When the Operation Point has dependency to other Operation Points (such as in scalable coding), the values to use for Codec Configuration Parameters MUST describe the result when all dependencies are utilized. For example, assume an Operation Point describing a base layer with 15 Hz framerate, and a dependent Operation Point describing an enhancement layer adding another 15 Hz to the base layer, resulting in 30 Hz framerate when both layers are combined. The correct Parameter value to use for that latter, dependent "enhancement" Operation Point is 30 Hz, not the 15 Hz difference.
The value of a Codec Configuration Parameter that was not included in a COPN message SHOULD either be inferred from other signaling, e.g. session setup or capability negotiation, outside the scope of this specification, or if such signaling is not available or not applicable, use the default value as defined per Parameter Type [sec-parameters].
An Operation Point describes one specific setting of Codec Parameters, and a COPN Message therefore MUST NOT include the ALT Parameter Type [sec-alt] in the Codec Parameters describing the Operation Point.
To limit RTCP bandwidth and avoid bandwidth expansion, COPN is not mandated as response to every received COPR [sec-copr].
A media sender implementing this specification SHOULD take requested Operation Points from COPR messages into account for future encoding, but MAY decide to use other Codec Configuration Parameter Values than those requested, e.g. as a result of multiple (possibly contradicting) COPR messages from different media receivers, or any media sender policies, rules or limitations. Thus, a COPN message Operation Point MAY use other Codec Configuration Parameters and other values than those requested in a COPR.
The media sender SHOULD try to maintain OPIDs between COPR and COPN when COPR sender suggests a new OPID value (N flag is set) in the COPR, but MAY use another OPID in COPN. Examples where other OPID values have to be chosen are for example when the suggested OPID conflicts with an already existing OPID, or when the media sender decides that a the suggested new OPID can be fulfilled by an already existing OPID.
Even if a COPR references an existing OPID (N flag cleared), the media sender may have to take other aspects than a specific COPR into account when choosing how many Operation Points to use, and the exact contents of those Operation Points. See the description on COPS [sec-cops] on how to achieve mapping between a suggested new OPID and what OPID will actually be used.
When OPID cannot be kept the same between COPN and COPR, the mapping SHALL be done using identical ID Parameters [sec-id] in the COPS and COPN resulting from the COPR. Further details are described in the section on COPS [sec-cops].
Since COPR references a certain COPN OPID, Version, and COPN is send unreliably and may be lost, COPN senders MUST keep at least the two last COPN Versions for each SSRC, OPID tuple and SHOULD keep at least four.
The timing follows the rules outlined in section 3 of AVPF [RFC4585]. This notification message may be time critical and SHOULD be sent using early or immediate feedback RTCP timing, but MAY be sent using regular RTCP timing.
A typical example when regular RTCP timing can be appropriate is when the sent media stream is further restricted from what was described by the most recent COPN, which should not cause any problems in the media receivers. Similarly, it is likely appropriate to use early or immediate timing when effective media stream restrictions urgently needs to be removed, which may require media receivers to increase their resource usage.
Any media sender, including Mixers and Translators, that sends RTP media marked with it's own SSRC and that implements this specification SHALL also be prepared to send COPN, even if it is not the originating media source. As a result of that, such media sender may have to send updated COPN whenever the included media sources (CSRC) changes, subject to rules laid out above [sec-copn-semantics]. Note that this can be achieved in different ways, for example by forwarding (possibly cached) COPN from the included CSRC when the Mixer is not performing transcoding.
In cases where a Mixer or Translator needs to forward a COPR from one side (A) to the other (B) (as described in Section 7.4.4), the COPN sent to the A side MAY need to be delayed until the Mixer or Translator has received a corresponding COPN from the B side, as indicated in Figure 10 below.
+-------+ 1. COPR +-------+ 2. COPR +-------+ | |-------->| |-------->| | | A | 4. COPN | Mixer | 3. COPN | B | | |<--------| |<--------| | +-------+ +-------+ +-------+
If a Mixer or Translator has decided to act partially (modify the media stream with respect to some Parameter Types, but not all) on a received COPR from the A side, and a COPN is received from the B side indicating that the current media modifications are no longer necessary, the mixer or translator SHOULD cease it's own actions that are no longer needed. It SHOULD then also issue a COPN describing the new situation to the A side, as indicated in Figure 11 below.
+-------+ 1. COPR +-------+ +-------+ | |-------->| | 2. COPR | | | | 3. COPN | |-------->| | | A |<--------| Mixer | 4. COPN | B | | | 5. COPN | |<--------| | | |<--------| | | | +-------+ +-------+ +-------+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Type | Payload Length | OPID |N| Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence No | Codec Configuration Parameters : +-+-+-+-+-+-+-+-+ : : :
The COPR-specific message fields are:
This Message Item is sent by a media receiver wanting to control one or more Codec Configuration Parameters of the targeted media sender. The requested values MUST stay within the media capability negotiated by other means than this specification. The available Codec Configuration Parameters that can be controlled are listed in Section 8.
Note: COPR may be seen as a more explicit and elaborate version of the TSTR message of [RFC5104] and most of the considerations detailed there for TSTR also apply to COPR.
If at least one COPN [sec-copn] is received for the targeted stream, the Codec Configuration Parameters for that stream (SSRC) with defined OPID and Version are known to the COPR sender. The COPR MUST refer to the OPID and Version of the most recently received COPN (if any) for the targeted stream. Since it references a defined set of Codec Configuration Parameters from a COPN, the COPR SHOULD only include the Codec Configuration Parameters it wishes to change in the message, but it MAY include also unchanged Codec Configuration Parameters.
If no COPN is received for the targeted stream, the COPR sender MUST choose an arbitrary OPID and set the N flag to indicate that the OPID does not refer to any existing Operation Point. In this case the Version field is not used and MUST be set to 0. The OPID value SHALL NOT be identical to any OPID from the same media source that the media receiver is aware of and has received COPN for. Since in this case no COPN reference exist, the COPR sender SHOULD include all Codec Configuration Parameters that it wishes to include a specific restriction for (other than the default). Note that for some codecs, some Codec Configuration Parameters may be possible to infer from the media stream, but if the wanted restriction includes also those and lacking a describing COPN, they SHOULD anyway be included explicitly in the COPR.
Any Codec Configuration Parameter that are not enabled by capability signaling MUST NOT be included.
A COPR sender MUST increment the SN field modulo 2^8 with every new COPR that includes any update to the Codec Configuration Parameters (referring to a specific version of an OPID compared to the previously sent SN, as long as it does not receive any COPS [sec-cops] with the same OPID, Version, and SN as was used in the most recently sent COPR. COPR having a later SN MUST be interpreted as replacing any COPR with identical OPID and Version but with lower SN, taking field wrap into account.
A COPR sender that did not receive any corresponding COPS, but did receive a COPN with the same OPID and with a higher Version than was used in the last COPR SHALL re-consider the COPR and MAY send an updated COPR referencing the new Version.
If the capability negotiation has established that a codec supporting scalable operation is used, and if the media receiver wishes to request that scalability is used, it MAY do so by sending multiple COPR with different OPID to the same media sender. The OPID and Version used in such request MAY be based on an existing Operation Point, but it MAY also indicate a desire to introduce scalability into a previously non-scalable stream by choosing a new OPID (indicated by setting the N flag). In any case, the resulting OPIDs and sub-streams are identified through use of the ID Parameter [sec-id] in subsequent COPS and COPN. See also the description of COPS [sec-cops].
An Operation Point without any Codec Configuration Parameters MAY be used and MUST be interpreted as a request to keep the Operation Point unchanged. This is especially useful when modifying some but not all in a set of sub-streams.
When a COPR sender is receiving multiple Operation Points and wants to continue to do so, it MUST include all Operation Points it still wishes to receive in the COPR, also those that can be left unchanged.
An COPR MAY also describe alternative Operation Points that the media sender can choose from, through use of one or more ALT Parameters [sec-alt].
Since COPR references a specific COPN using SSRC, OPID and Version, a COPR sender typically needs to keep the latest Version of received COPN for each SSRC and OPID, also including the Codec Configuration Parameters.
A media sender receiving a COPR SHOULD take the request into account for future encoding, but MAY also take COPR from other media receivers and other information available to the media sender into account when deciding how to change encoding properties.
A media receiver sending COPR thus cannot always expect that all Parameter Values of the request are fully honored, or even honored at all. It can only know that the COPR was taken into account when receiving a COPS [sec-cops] from the media sender with a matching OPID, Version and SN.
To what extent a COPR is honored is described by the chosen Codec Configuration Parameter values contained in a subsequent COPN message [sec-copn] with a later (taking wraparound into account) Version than the one referred by the COPR.
The timing follows the rules outlined in section 3 of [RFC4585]. This request message MAY be sent using Immediate, Early or Regular timing depending on the application's needs.
A COPR sender that did not receive a corresponding COPS MAY choose to re-transmit the COPR, without increasing the SN.
When an RTP media receiver (SSRC) is timing out or leaves (BYE received) from the RTP session, it SHALL implicitly imply that all COPR restrictions put by that media receiver are removed.
A Mixer or media Translator that implements this specification and encodes content sent to the media receiver issuing the COPR SHALL consider the request to determine if it can fulfill it by changing its own encoding parameters. A Mixer encoding for multiple session participants will need to consider the joint needs of all participants when generating a COPR on its own behalf towards the media sender.
A Mixer or Translator able to fulfill the COPR partially MAY act on the parts it can fulfill (and SHALL then send COPS and COPN accordingly), but SHOULD anyway forward the unaltered COPR towards the media sender, since it is likely most efficient to make the necessary Codec Configuration Parameter changes directly at the original media source.
A media Translator that does not act on COP messages will forward them unaltered, according to normal Translator rules.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Type | Payload Length | OPID |N| Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC of COPR sender | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence No | RC | Reason |Codec Configuration Parameters : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : :
The COPS-specific message fields are:
The COPS Message Item indicates the request status related to a certain SSRC OPID tuple by listing the latest received COPR [sec-copr] SN. It effectively informs the COPR sender that it no longer needs to re-send that COPR SN (or any previous SN).
COPS indicates that the specified COPR was successfully received by the media sender targeted in the request. If the COPR suggested Codec Configuration Parameters could be understood [tab-retcode], they may be taken into account, possibly together with COPR messages from other receivers and other aspects applicable to the specific media sender. The Return Code carries an indication to which extent the COPR could be honored.
Value | Meaning |
---|---|
0 | Success |
1 | Partial success |
2 | Failure |
3-6 | Unassigned |
7 | Reserved for future extension |
A Success Return Code indicates that the resulting media configuration is fully in line with the COPR.
A Partial Success Return Code indicates that the resulting media configuration is not fully in line with the COPR, but that the media sender regards the COPR to be sufficiently well represented by one or more of the existing Operation Points.
A Failure Return code indicates that the media sender failed to take the COPR into account, either due to some error condition or because no media stream could be created or changed to comply.
The Reason Values defined below are independent of Return Code, but all reasons may not be meaningful with all return codes. More reasons MAY be defined in extensions to this specification.
Value | Meaning |
---|---|
0 | Success |
1 | Unknown OPID |
2 | Too many Operation Points |
3 | Request violates capability limits |
4 | Too old Operation Point Version |
5 | Unknown Parameter Type |
6 | Parameter Value too long |
7 | Invalid Comparison Type |
8 | One or more parameter values in the request were changed |
9-31 | Unassigned |
COPS is typically sent without any Codec Configuration Parameters. When the N flag was set in the related COPR, a non-failing COPS MUST include an ID Parameter [sec-id] identifying the actual sub-stream that the media sender considers applicable to the COPR. The OPID used by that sub-stream can be found through examining ID Parameters of subsequent COPN from the same media source for ID values matching the one in COPS.
Senders implementing this specification MUST NOT use any other Codec Configuration Parameter Types than ID in a COPS message. The contained ID Parameter points to the specific media (sub-)stream that the media sender regards as applicable to the COPR.
When a COPR receiver has received multiple COPR messages from a single COPR source with the same OPID but with several different values of Version and/or SN, and for which it has not yet sent a COPS, it SHALL only send COPS for the COPR with the Highest SN, taking field wrap of those two fields into account.
COPS SHALL be sent at the earliest opportunity after having received a COPR, with the following exception:
The exception is introduced to avoid unnecessary COPS transmission when there is a chance that already sent COPS or COPN may satisfy or invalidate the COPR.
A Mixer or media Translator that implements this specification, encoding content sent to media receivers and that acts on COPR SHALL also report using COPS, just like any other media sender. An RTP Translator not knowing or acting on COPR will forward all COP messages unaltered, according to normal RTP Translator rules.
This section defines the general Codec Configuration Parameter (CCP) TLV format. Then a number of different parameter formats are defined. It is expected that a number of additional CCPs will be defined in the future as the needs of different codecs are explored or developed.
COP Message Items MAY contain one or more Codec Configuration Parameters, encoded in TLV (Type-Length-Value) format, which SHOULD then be interpreted as simultaneously applicable to the defined Operation Point. Parameter Values MUST be byte-aligned.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ParamType | C | Length | | +---------------+---+-----------+ | | | / Parameter Value / / +--------------+ | | +------------------------------------------------+
The meaning of Multiple Codec Configuration Parameters with the same ParamType and the same Comparison Type included as part of the same Operation Point is undefined and SHALL NOT be used.
A Codec Configuration Parameter that is encoded in a way (including incorrectly) that cannot be interpreted by the receiver SHALL be ignored.
The below parameters encoded as signed or unsigned integers uses a variable size representation in the value field. It is RECOMMENDED to only include the minimal number of bytes necessary to represent the value that is to be included in the parameter TLV. The length field in the parameter TLV will explicitly indicate how many bytes are present in the value field. All parameters using a variable size representation of their value MUST define the maximum number of bytes possible to include in the value field.
The ParamType values and the SDP tags (see Section 9) for the Codec Configuration Parameter Types defined in this specification are listed below.
Value | Meaning | Tag |
---|---|---|
0 | ALT | alt |
1 | ID | id |
2 | Payload Type | pt |
3 | Bitrate | bitrate |
4 | Token Bucket Size | token-bucket |
5 | Framerate | framerate |
6 | Horizontal Pixels | hor-size |
7 | Vertical Pixels | ver-size |
8 | Channels | channels |
9 | Sampling Rate | sampling |
10 | Maximum RTP Packet Size | max-rtp-size |
11 | Maximum RTP Packet Rate | max-rtp-rate |
12 | Frame Aggregation | aggregate |
13-254 | Undefined | |
255 | Reserved for future extension |
The values of the defined Parameter Value Comparison Type are listed below.
Value | Meaning |
---|---|
0 | Exact |
1 | Minimum |
2 | Maximum |
3 | Target |
The following sub-sections describe the syntax and semantics of the different Codec Configuration Parameter Types defined in this specification.
Unless explicitly specified in the sub-sections below, or in extensions to this specification, all Parameter Type values are binary encoded unsigned integers, most significant byte first (for multi-byte values).
This Codec Parameter Type is a special parameter, separating the Codec Configuration Parameters preceding it from the ones that follow into two separate, alternative Operation Points.
This Codec Parameter Type is a special parameter that enables codec specific identification of sub-streams, for example when there are multiple sub-streams in a single SSRC. It can also be used to reference OPID, when the used codec does not support or use sub-streams. When used, it SHALL be listed first among the Codec Parameters used to describe the sub-stream.
As described in [RFC4585] and [RFC5104], the rtcp-fb attribute may be used to negotiate capability to handle specific AVPF commands and indications, and specifically the "ccm" feedback value is used for codec control. All rules defined there related to use of "rtcp-fb" and "ccm" also apply to the new feedback message defined in this specification.
In this document, a new "ccm" rtcp-fb-ccm-param is defined, according to the method of extension described in [RFC5104]:ABNF [RFC5234] for the new rtcp-fb-ccm-param is:
The
rtcp-fb-ccm-param =/ SP "cop" 1*rtcp-fb-ccm-cop-param ; rtcp-fb-ccm-param defined in [RFC5104] rtcp-fb-ccm-cop-param = SP "alt" / SP "id" / SP "pt" / SP "bitrate" / SP "token-bucket" / SP "framerate" / SP "hor-size" / SP "ver-size" / SP "channels" / SP "sampling" / SP "max-rtp-size" / SP "max-rtp-rate" / SP "aggregate" / SP token ; for future extensions ; token defined in [RFC4566]
Token values for rtcp-fb-ccm-cop-param are defined in Table 4. Their semantics are described in Section 8.
Supported Parameter Types are indicated by including one or more rtcp-fb-ccm-cop-param.
The usage of Offer/Answer [RFC3264] in this specification inherits all applicable usage defined in [RFC5104].
In order to announce support, and willingness to use, the CCM "cop" feedback message, an offerer or answerer SHALL indicate that capability through the extended SDP rtcp-fb attribute, defined in Section 9.1. The offerer or answerer MUST include a list of the Parameter Types that it is willing to receive.
If an SDP offer does not indicate support of the CCM "cop" feedback message, the answerer MUST NOT indicate support in the associated SDP answer.
The answerer MAY add and/or remove Parameter Types that were not present in the associated SDP offer. If the answerer adds Parameter Types to the SDP answer, it MUST be able to receive such messages, but the answerer MUST NOT send such messages towards the offerer.
If an SDP answer does not indicate support of the CCM "cop" feedback message, the offerer MUST NOT send such messages towards the answerer.
The offerer and the answerer SHOULD NOT send any Parameter Types that the remote party did not indicate receive support for. As described in Section 8, a parameter with an unknown ParamType SHALL be ignored on reception in a COPN and SHALL either be reported as unknown in COPS or be ignored when received in COPR.
Entities MUST list all supported Parameter Types in every subsequent SDP offer or answer associated with the session. If a Parameter Type is not listed, it is an indication that the offerer or answerer is no longer willing to receive such messages within the session.
Declarative use of the CCM "cop" does not differ from the Offer/Answer usage.
The defined mechanism is not bound to a specific codec. It uses the main characteristics of a chosen set of media types, including audio and video. To what extent this mechanism can be applied depends on which specific codec is used.
When using a codec that can produce separate sub-streams within a single SSRC, those sub-streams can only be referred with a COP OPID if there is a defined relation to the codec-specific sub-stream identification. This is accomplished in this specification by defining an ID Parameter format using codec-specific sub-stream identification for each such codec.
If such sub-streams have dependencies, the OPID describes the characteristics of the sub-stream including all it's dependencies, but excluding any sub-streams that are dependent on this sub-stream. The sub-stream identification describes a single, payload specific node in a dependency tree, and does in general not include any identification of the sub-streams it depends on, or the dependency structure between sub-streams. Any dependency structure must thus be described by the media stream payload format and is out of scope for this specification.
This section contains ID Parameter format definitions for a few selected codecs. The format definitions MUST use an integer number of bytes and MUST define all bits in those bytes. Note, the ID parameter is interpreted in the context of a given SSRC and a specific RTP payload type.
Extensions to this specification MAY add more codec-specific definitions than the ones described in the sub-sections below. Such definitions made in extensions to this specification SHOULD be considered as an integrated part of this section, with respect to usage with other mechanisms defined in this specification.
Some non-scalable video codecs such as H.264 AVC [H264] and corresponding RTP payload format [RFC6184] can accomplish simultaneous encoding of multiple operation points. H.264 AVC can encode a video stream using limited-reference and non-reference frames such that it enables limited temporal scalability, by use of the nal_ref_id syntax element.
The ID Parameter Type is defined below:
0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Reserved | N | +-+-+-+-+-+-+-+-+
This document specifies the usage of multiple, simultaneous codec operation points and therefore maps well to scalable video coding. Scalable video coding such as H.264 SVC (Annex G) [H264] uses three scalability dimensions: temporal, spatial, and quality. It also includes the possibility to use redundant encodings and priority among sub-streams.
The ID SHALL be considered describing an SVC sub-bitstream, which is defined in G.3.59 of H.264 [H264] and corresponding RTP payload format [RFC6190]. For use with H.264 SVC, ID SHALL be constructed as defined below:
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| PID | RPC | DID | QID | TID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
COP messages are binary encoded. However, in the following examples, all COP messages are for clarity listed in symbolic, pseudo-code form, where only COP message fields of interest to the example are included, along with the COP Parameters.
The SDP capabilities for COP are defined as receiver capabilities, meaning that there is no explicit indication what COP messages an end-point will use in the send direction. It is however reasonable to expect that an end-point can also send the same messages that it can understand and act on when received. This is assumed in all the SDP examples below, but note that symmetric COP capabilities is not a requirement.
The example below shows an SDP Offer, where support of CCM "cop" message is announced for the video codecs.
v=0 o=alice 2890844526 2890844526 IN IP4 host.atlanta.example s=- c=IN IP4 host.atlanta.example t=0 0 m=audio 50000 RTP/AVP 0 8 97 b=AS:80 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 iLBC/8000 m=video 50010 RTP/AVPF 31 32 b=AS:600 a=rtpmap:31 H261/90000 a=rtpmap:32 MPV/90000 a=rtcp-fb:31 ccm cop framerate bitrate token-rate a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ token-rate
Note that the offer contains two different video payload types, and that the COP Parameters differ between them, meaning that the possibility for codec configuration also differ. In this case, the MPEG-1 codec can control both framerate and image size, but for H.261 only the framerate can be controlled.
In the SDP Answer below, responding to the above offer, the answerer supports CCM "cop" messages.
v=0 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example s=- c=IN IP4 host.biloxi.example t=0 0 m=audio 52000 RTP/AVP 0 b=AS:80 a=rtpmap:0 PCMU/8000 m=video 52100 RTP/AVPF 32 b=AS:600 a=rtpmap:32 MPV/90000 a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ token-rate packet-size
Note that the answerer indicates support for more parameter types than the offerer.
Below is another SDP Answer, also responding to the same offer above, where the answerer does not support "cop".
v=0 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example s=- c=IN IP4 host.biloxi.example t=0 0 m=audio 52000 RTP/AVP 0 b=AS:80 a=rtpmap:0 PCMU/8000 m=video 52100 RTP/AVPF 32 b=AS:600 a=rtpmap:32 MPV/90000
In this example, two COP-enabled end-points communicate in an audio/video session. The receiving end-point has a graphical user interface that can be dynamically changed by the user. This user interaction includes the ability to change the size of the receiving video window, which is also indicated in the previous SDP example [sec-ex-offer-answer].
At some point during the established communication, a notification about current video stream Codec Operation Point is sent to the re-sizable window end-point that receives the video stream.
COPN {SSRC:123456, OPID:123, Version:5, bitrate(max):325000, token-bucket(exact):1000, framerate(exact):15, hor-size(exact):320, ver-size(exact):240}
Some time later the user of the re-sizable window end-point reduces the size of the video window. As a result of the re-size operation, the video window can no longer make full use of the received video resolution, wasting bandwidth and decoder processing resources. The re-sizable window end-point thus decides to notify the video stream sender about the changed conditions by sending a request for a video stream of smaller size:
COPR {SSRC:123456, OPID:123, Version:5, hor-size(target):243, ver-size(target):185}
The COPR refers to the previously received COPN with the same OPID and Version, and thus need only list parameters that need be changed. The request could arguably contain also other parameters that are potentially affected by the spatial resolution, such as the bitrate, but that can be omitted since the media sender is not slaved to the request but is allowed to make it's own decisions based on the request.
The request sender has chosen to use target type values instead of an exact value for the horizontal and vertical sizes, which can be interpreted as "anything sufficiently similar is acceptable". The target values is in this example chosen to correspond exactly to the re-sized video display area. Many video coding algorithms operate most efficiently when the image size is some even multiple, and this way of expressing the request explicitly leaves room for the media sender to take such aspect into account.
The media sender (COPR receiver) responds with the following:
COPS {SSRC:123456, OPID:123, Version:5, Partial Success, One or more parameter values in the request were changed} COPN {SSRC:123456, OPID:123, Version:6, bitrate(max):240000, token-bucket(exact):1000, framerate(exact):15, hor-size(exact):240, ver-size(exact):176}
It can be noted that the updated COPN (version 6) indicates that the media sender has, in addition to reducing the video horizontal and vertical size, chosen to also reduce the bitrate. This bitrate reduction was not in the request, but is a reasonable decision taken by the media sender. It can also be seen that the horizontal and vertical sizes are not chosen identical to the request, but is in fact adjusted to be even multiples of 16, which is a local restriction of the fictitious video encoder in this example. To handle the mismatch of the request and the resulting video stream, the video receiver can perform some local action such as for example automatic re-adjustment of the re-sized window, image scaling (possibly combined with cropping), or padding.
In this example, the sent request is asking the media sender to go beyond what is negotiated in the SDP. The SDP Offer below indicates to use video with H.264 Constrained Baseline Profile at level 1.1.
v=0 o=alice 2893746526 2893746526 IN IP4 host.atlanta.example s=- c=IN IP4 host.atlanta.example t=0 0 m=audio 49160 RTP/AVP 96 b=AS:80 a=rtpmap:96 G722/16000 m=video 51920 RTP/AVPF 97 b=AS:200 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42e00b a=rtcp-fb:97 ccm cop framerate bitrate token-rate
Assuming this offer is accepted and that the answerer also supports COP, further assume that this COP message exchange occurs at some time during the established communication:
Media Sender Media Receiver ------------ -------------- COPN {SSRC:9876, OPID:67, -> Version:2, bitrate(exact):190000, token-bucket(exact):500, framerate(exact):10, hor-size(exact):320, ver-size(exact):240} <- COPR {SSRC:9876, OPID:67, Version:2, framerate(exact):10, hor-size(exact):352, ver-size(exact):288} COPS {SSRC:9876, OPID:67, -> Version:2, Failure, Request violates capability limits}
The failure above is due to a combination of frame size and frame rate that exceeds H.264 level 1.1, which would thus exceed the limits established by SDP Offer/Answer. The maximum permitted framerate for 352x288 pixels (CIF) is 7.6 Hz for H.264 level 1.1, as defined in Annex A of [H264].
When scalable coding is used, each layer correspond to a Codec Operation Point. A media receiver can thus target a request towards a single layer. Assume a video encoding with three framerate layers, announced in a (multiple operation point) notification as:
COPN {SSRC:9876, OPID:67, Version:2, ID:2 bitrate(exact):190000, token-bucket(exact):500, framerate(exact):10, hor-size(exact):320, ver-size(exact):240} COPN {SSRC:9876, OPID:73, Version:1, bitrate(exact):350000, ID:1 token-bucket(exact):600, framerate(exact):30, hor-size(exact):320, ver-size(exact):240} COPN {SSRC:9876, OPID:95, Version:5, ID:0 bitrate(exact):400000, token-bucket(exact):800, framerate(exact):60, hor-size(exact):320, ver-size(exact):240}
Assume further that the media receiver is not pleased with the low framerate of OPID 67, wanting to increase it from 10 Hz to 25-30 Hz. Note that the media receiver still wants to receive the other layers unchanged, not remove them, and thus has to explicitly indicate this by including them without parameters.
COPR {SSRC:9876, OPID:67, Version:2, framerate(greater):25, framerate(less):30} COPR {SSRC:9876, OPID:73, Version:1} COPR {SSRC:9876, OPID:95, Version:5}
The media sender decides it cannot meet the request for OPID 67, but instead considers (an unmodified) OPID 73 (with ID 1) to be a sufficiently good match:
COPS {SSRC:9876, OPID:67, Version:2, Partial Success, One or more parameter values in the request were changed, ID:1} (COPN for the other two OPIDs omitted here for brevity) COPN {OSSRC:9876, OPID:73, Version:1, ID:1 bitrate(exact):350000, token-bucket(exact):600, framerate(exact):30, hor-size(exact):320, ver-size(exact):240}
The COPS indicates partial success and uses the ID number to refer another OPID, describing the best compromise that can currently be used to meet the request. COPS does not contain the referred OPID, but ID should be defined in a codec-specific way that makes it possible to identify the layer directly in the media stream. If the corresponding OPID is needed, for example to attempt another request targeting that, it can be found by searching the active set of COPN for matching ID values.
In this example, the media receiver is receiving a non-scalable stream from a codec that can support scalability, and wishes to add a scalability layer. Assume the existing OPID from the media sender is announced as:
COPN {SSRC:3492, OPID:4, Version:2, bitrate(exact):350000, token-bucket(exact):600, framerate(exact):30, hor-size(exact):320, ver-size(exact):240}
The media receiver constructs a request for multiple streams by including multiple requests for different OPID. Since the new stream does not exist, it has no OPID from the media sender and the receiver chooses a random value as reference and indicates that it is a new, temporary OPID. The request for the new stream includes all parameters that the media receiver has an opinion on, and leaves the other parameters to be chosen by the media sender. In this case it is a request for identical frame size and doubled framerate.
COPR {SSRC:3492, OPID:4, Version:2} COPR {SSRC:3492, OPID:237, New, Version:0, framerate(exact):60, hor-size(exact):320, ver-size(exact):240}
The media sender decides it can start layered encoding with the requested parameters. The status response to the new OPID contains a reference to an ID that is included as part of the matching, subsequent COPN. Note that since both the original and the new streams are now part of a scalable set, they must both be identified with ID parameters to be able to distinguish between them. The media sender has chosen an OPID for the new stream in the COPN, which need not be identical to the temporary one in the request, but the new stream can anyway be uniquely identified through the ID that is announced in both the COPS and COPN.
Note that since the ID has a defined relation to the media sub-stream identification, decoding of that new sub-stream can start immediately after receiving the COPS. It may however not be possible to describe the new stream in COP parameter terms until the COPN is received (depending on COP parameter visibility directly in the media stream).
COPS {SSRC:3492, OPID:4, Version:2, Success, Success, ID:1} COPS {SSRC:3492, OPID:237, New, Version:0, Success, Success, ID:0} COPN {SSRC:3492, OPID:4, Version:2, ID:1, bitrate(exact):350000, token-bucket(exact):600, framerate(exact):30, hor-size(exact):320, ver-size(exact):240} COPN {SSRC:3492, OPID:9, Version:0, ID:0, bitrate(exact):390000, token-bucket(exact):600, framerate(exact):60, hor-size(exact):320, ver-size(exact):240}
Following the guidelines in [RFC4566], in [RFC4585], and in [RFC3550], the IANA is requested to register:
Editor's Note: Security considerations must be added.
There is currently no defined way for a media receiver to indicate that it wants to release the restrictions it previously had on an Operation Point, if the media stream contains only a single Operation Point.
The authors would like to thank Prof. Dr.-Ing. Markus Kampmann at Fachhochschule Koblenz University of Applied Sciences and Prof. Dr.-Ing. Frank Hartung at Multimediatechnik, Audio- und Videotechnik at Fachhochschule Aachen for fruitful contributions and discussions during the initial stages of writing this specification. The authors would also like to thank Christer Holmberg for feedback on the specification.