MMUSIC Working Group | T. Frankkila |
Internet-Draft | M. Westerlund |
Intended status: Standards Track | B. Burman |
Expires: April 26, 2012 | Ericsson |
October 24, 2011 |
Extensible Bandwidth Attribute for SDP
draft-westerlund-mmusic-sdp-bw-attribute-00
Knowledge of what bandwidths the end-points intend to use is important both for the other end-point and for resource allocation in various types of networks. This is especially important for wireless access networks which typically have quite limited resources. The bandwidth attribute in Session Description Protocol (SDP), ‘b=AS’, is today quite widely used to define the bandwidth that the end-points intends to use, in various types of sessions. This document will show that the existing bandwidth attribute, such as ‘b=AS’, although widely used in todays scenarios, has limitations that make it hard or even impossible for the end-points to express their intentions accurately when it comes to bandwidth usage. To solve the identified problems, this document defines a new extensible SDP bandwidth attribute ‘a=bw’ which enables more detailed control over the bandwidth declarations, request, and allocations. With the new bandwidth attribute it is possible to define different scopes in the session setup and then negotiate the bandwidth individually for each scope.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 26, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document looks at the issues of non-basic usage of RTP [RFC3550] and analyzes how well the existing SDP [RFC4566] attribute ‘b=AS’ for bandwidth negotiation performs in different scenarios.
This analysis is done by defining a number of use cases, containing sessions with:
It is shown that the existing bandwidth attributes ‘b=AS’ [RFC4566] and 'b=TIAS' [RFC3890] has limitations which make it unclear or even impossible for end-points and for resource allocation functions in the network to determine how much bandwidth the service will use. The analysis also provides the design rationale for the new bandwidth attribute.
This document then proposes a general and extensible mechanism for bandwidth negotiation that can be used for any type of session. Interoperability with the existing mechanisms for bandwidth negotiation is especially important since the existing bandwidth attribute has a wide-spread usage.
This document also presents several examples for how the new bandwidth attribute can be used in the session setup phase for various types of sessions. The examples are derived for IP/UDP/RTP transport although nothing should prevent using the new bandwidth attribute also for other transport protocols.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
The following terms and abbreviations are used in this document:
This section describes a number of use cases where the existing bandwidth attribute ‘b=AS’ is used for bandwidth definition. It also discusses why the limitations of the existing bandwidth attribute makes it hard for other end-points and resource allocation functions to know or estimate how much bandwidth that will be used in the ongoing session.
The analysis is made by defining a set of use cases. The first use cases include fairly simple session types, i.e. point-to-point sessions with or without asymmetry. A few more complex use cases are then analyzed. The last set up use cases reflect fairly advanced session types, e.g. various variants of multiplexing and usage of multiple media streams.
The discussion is then summarized and the design rationales for the new bandwidth attribute are outlined.
The existing bandwidth modifier ’b=’ defined in RFC 4566 [RFC4566] is reviewed in this section.
The existing bandwidth attribute ‘b=’ is defined in Section 5.8 of RFC 4566 [RFC4566]. The syntax is:
where:
Bandwidth types have been defined for the negotiation of the RTCP bandwidth using ‘b=RS’ and ‘b=RR’, RFC 3556 [RFC3556].
There is also a bandwidth type for negotiating the transport independent application specific maximum bandwidth, ‘b=TIAS’, RFC 3890 [RFC3890]. This bandwidth type is similar to the ‘b=AS’ bandwidth type, except that the overhead caused by the transport protocol headers is not included.
One issue with the existing bandwidth attribute is that the syntax is very limited since it only allows for defining new bandwidth types (<bwtype>) and their respective single numerical value. This limitation needs to be considered in the discussion below.
"An Offer/Answer Model with the Session Description Protocol (SDP)" [RFC3264] describes the offer/answer procedures for the existing bandwidth attribute. For the SDP offer, it describes that the bandwidth attribute indicates the desired bandwidth that the offerer would like to receive. For the SDP answer, it describes that the bandwidth attribute indicates the bandwidth that the answerer would like the offerer to use when sending media. Thus, for offer/answer negotiations, the bandwidth attribute indicates the bandwidth for the receive direction of each end-point.
The solution presented in this document focuses primarily on clarifying and assisting the Application Specific (AS) bandwidth.
[It is an open question to decide if and how to handle the RTCP bandwidth negotiation, e.g. corresponding to b=RS and b=RR.]
[It is an open question to develop semantics for the transport independent bandwidth negotiation, e.g. corresponding to b=TIAS.]
When an end-point is sending media then this can be done in many different ways, depending on the choices the implementers have made.
Some end-points may send it’s data in a fairly “nice and smooth” media stream, which means that both the packet sizes and the packet rates are more or less constant all the time. An example of a smooth stream is when the end-point is encoding speech and is sending one packet every 20 ms and when the packets are of equal size.
Other end-points may generate bursty streams, which have a large peak-to-average ratio. An example of a bursty stream is when an end-point is encoding video. Most of the time, the end-point is sending packets with almost the same size and with constant packet rate. However, it happens occasionally that the encoder generates much more data for a frame, which may give a very large packet size. It may even happen that the sender has to segment the data into several packets, which may be transmitted in a burst, thereby causing a very high peak rate.
Whether the stream is smooth or bursty makes a big difference for the network and the policy control that usually applies in QoS controlled networks. If the stream is too bursty, then a policy control function may decide to drop packets that exceed the granted rate. This will lead to degraded quality and reduced user satisfaction.
The existing bandwidth attribute offers no mechanism to negotiate what temporal variations that can be allowed for a stream. The only available mechanism is to negotiate the maximum bandwidth, but there is nothing that defines any kind of averaging window (or something similar) that can be used to control the bandwidth variations from the transmitted stream.
It is therefore proposed to use a Token Bucket model to describe the bandwidth with two parameters, the token bucket rate and the bucket size, see RFC 2212 [RFC2212].
The existing modifier for the application specific bandwidth ‘b=AS’ is frequently used in the SDP offer/answer negotiation RFC 3264 [RFC3264] for setting up point-to-point sessions, for example for bi-directional point-to-point VoIP or video telephony sessions. In this section, the use of the legacy bandwidth modifier is reviewed for the use in point-to-point sessions using SDP offer/answer.
This example below shows the SDP offer from end-point A for several fixed-rate codecs, mu-law and A-law PCM/G.711 [G.711], AD-PCM/G.726 [G.726] and CS-ACELP/G.729 [G.729]. The codecs have different bit rates. PCM encodes speech at 64 kbps. G.726 can encode speech at four different rates, 64, 32, 24 and 16 kbps, but in this case it is assumed that the 32 kbps variant is used. G.729 encodes speech at 8 kbps. The IP/UDP/RTP overhead with 20 ms packetization and IPv4 becomes 16 kbps in all cases giving 80, 48 and 24 kbps, respectively.
m=audio 49200 RTP/AVP 8 0 96 18 b=AS:80 a=rtpmap:96 G726-32/8000/1 a=ptime:20 a=maxptime:80
If end-point B accepts to use this codec then a likely SDP answer would be:
m=audio 49400 RTP/AVP 8 0 96 18 b=AS:80 a=rtpmap:96 G726-32/8000/1 a=ptime:20 a=maxptime:80
In this case, both end-points offer to receive 80 kbps. A resource allocation function would thereby allocate 80 kbps in each direction.
However, if end-point B accepts to use one of the lower rate codecs, for example G.729, but not the PCM codecs, then a likely SDP answer would be:
m=audio 49400 RTP/AVP 18 b=AS:24 a=ptime:20 a=maxptime:80
This means that the offerer has offered to receive 80 kbps while the answerer has offered to receive 24 kbps. In the direction A to B it is clear that a resource allocation function should allocate 24 kbps. However, in the direction B to A it is a little more unclear. On one hand, end-point A has offered to receive 80 kbps. But, on the other hand, end-point B has only indicated support for the G.729 codec and its unknown if B can send with something in addition to G.729 from A's offered set.
A resource allocation may also (incorrectly) conclude that end-point B will also send maximum 24 kbps, since b=AS indicates 24 kbps. But, since maxptime is 80 ms, this means that end-point B could very well use application layer redundancy and encapsulate redundant frames together with non-redundant frames, which would result in a bandwidth exceeding 24 kbps. Even if maxptime would be 20 ms, end-point B could still use application layer redundancy, if the non-redundant and redundant frames are transmitted in different packets. This is possible since end-point A has indicated that it is capable of receiving 80 kbps. Hence, if the resource allocation function uses the codec information and assumes that end-point B will send with only 24 kbps, then this may cause packet losses and/or long delays.
It should be clear with this example that the current bandwidth attribute, b=AS, can create ambiguities related to what bandwidth that will be used in each direction. If the end-points and the resource allocation functions make different interpretations then there is a risk for either poor quality or wasted resources.
To solve this, a new bandwidth negotiation method should enable negotiating different bandwidths for different codecs. If a codec can be configured in several different ways, e.g. G.726 offers the possibility to use four different static bit rates then this would typically be negotiated using different RTP Payload Types. This means that the solution needs to be capable of negotiating different bandwidths for different Payload Types.
This use case describes what might happen when using rate-adaptive codecs in a session, for example AMR [AMR]. The rate adaptation should adapt to a high bitrate when the operating conditions are good, but should adapt to a low bitrate when the operating conditions are degraded, e.g. due to congestion or bad coverage.
One example of the SDP offer-answer negotiation for rate-adaptive codec is shown below.
m=audio 49200 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=ptime:20 a=maxptime:100
The bandwidth attribute in the SDP indicates the bandwidth that the offerer would like to receive, RFC 3264 [RFC3264].
m=audio 49100 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=ptime:20 a=maxptime:100
The bandwidth attribute in the SDP answer indicates the maximum bandwidth that the answerer would like the offerer to use when sending media, RFC 3264 [RFC3264].
In this case, it is clear that both end-points are prepared to receive up to 29 kbps of media. Since AMR can adapt the rate for the encoding, this means that the bandwidth can be reduced, e.g. to the 5.9 kbps mode, if congestion is detected. The existing bandwidth attribute ‘b=AS” is however only used to negotiate the maximum rate. This means that there is nothing in the SDPs that describes how the rate will be adapted. In some cases, usually for speech codec, it might be possible to derive the lowest rate from the codec information. However, there is no guarantee that the end-points will adapt to this rate or whether it will stay at some higher rate. For video codecs, there is usually no codec information at all that could be used to determine how low rate the end-points will use. The lowest usable rate for a video codec is generally not a video codec limitation, but rather some end-user or service consideration on what is the lowest video quality that is still useful or acceptable in the actual scenario.
This means that a resource allocation function has no information which could be used to determine how the end-points will adapt during periods of congestion. Hence the network does not know what to assume from the end-points.
To solve this, a new bandwidth negotiation method should allow for negotiating not only the highest rate but also the minimum rate that is still useful.
Another example is when the originating end-point offers several rate-adaptive codecs, with different bandwidths, and when the answerer only support one or several of the lower-rate configurations but not the configuration that uses the highest bandwidth. With the legacy bandwidth modifier ‘b=AS’ it is only possible to indicate one bandwidth for the whole RTP session, which means that the end-point needs to indicate the highest bandwidth since this is the worst-case scenario. An offer/answer for this case is shown below. The offerer supports both AMR and AMR-WB AMR-WB [AMR-WB] and therefore indicates the bandwidth needed for the AMR-WB configuration since it is higher than for AMR. If the answerer does not support the AMR-WB codec then it will have to remove this configuration from the SDP when creating the SDP answer. This means that the answerer calculates the bandwidth required for AMR instead of AMR-WB.
m=audio 49200 RTP/AVP 96 97 b=AS:41 a=rtpmap:96 AMR-WB/16000/1 a=fmtp:96 mode-change-capability=2; max-red=80 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=ptime:20 a=maxptime:100
m=audio 49100 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=ptime:20 a=maxptime:100
Since the indicated bandwidth is for the receiving direction in this example this means that:
This gives the same problem with ambiguous maximum rate as shown in Section 3.2.1. In addition, since both AMR and AMR-WB are rate-adaptive codecs, with different bit rates, they also have different minimum rates. This means that a resource allocation would be unaware about both the maximum bandwidth and the minimum (required) bandwidth.
To solve this, a new bandwidth attribute should allow for negotiating both maximum and minimum bitrates individually for each payload type.
For speech codecs, it is usually possible to derive the minimum rate from the codec information. However, this is typically not possible for video codecs since they only indicate the maximum encoding level. For example, if end-point A offers to use H.264 level 3.0 H.264 [H.264] but end-point B is only capable of using level 1.2, then this only limits the maximum bandwidth in the direction from A to B. In the other direction, end-point A is still capable of receiving level 3.0.
The session setup for asymmetric streams is not always straight forward. Lets say that one want to set up a session with 600 kbps in the sending direction and 200 kbps in the receiving direction.
m=video 49200 RTP/AVP 96 b=AS:200 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00c a=sendrecv
From this SDP, it can be determined that the end-point wants to receive 200 kbps. There is some implicit information in the level part of the profile-level-id for the H.264 example above, indicating that the end-point can send using a higher bandwidth (up to 768 kbps), but it requires codec-specific knowledge to be able to extract that implicit information. In this example, lets assume that the sender does not even want to utilize the maximum allowed bandwidth for the signaled level, but a slightly lower one, say 600 kbps. So how is the answerer supposed to know that the offerer really wants to send up to 600 kbps, especially since not even the implicit level-related can be used? There could be many reasons to use a lower video bandwidth than the one defined as level maximum; limited terminal performance in the send direction, a known network bandwidth limitation, a bandwidth charging model that makes the user prefer a lower bandwidth, etc.
One way to express the asymmetry is to set up different RTP sessions for sending and receiving directions. An SDP offer for this might be:
m=video 49200 RTP/AVP 96 b=AS:600 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=sendonly m=video 49202 RTP/AVP 97 b=AS:200 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c00c a=recvonly
If the answerer decides to accept this then the SDP answer might be:
m=video 49200 RTP/AVP 96 b=AS:600 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=recvonly m=video 49202 RTP/AVP 97 b=AS:200 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c00c a=sendonly
In this example, it is clear that the offerer can send video with 600 kpbs and receive video with up to 200 kbps. However, if the offer is for different codecs, using different bandwidths, then one have the same problem as described in Section 3.2.3.
Specifically for video, but possibly also for other media, it may happen that different implementations send the media in different ways. Some implementations may try to provide a fairly ”smooth” stream in terms of bandwidth variation over time, while other implementations may give a very ”bursty” stream.
There also exist cases where opening additional RTP sessions just for expressing asymmetric transmission bandwidths are not desirable.
In this part of the analysis, it is assumed that an RTP session is set up for multiple streams. This can be done in several ways and for several reasons, as discussed in RTP Multiplexing Architecture [I-D.westerlund-avtcore-multiplex-architecture].
The assumed usage here is a multi-party session, for example a video conference using an RTP mixer. Some of the attendees are active and their audio and video is distributed to the other users. Some attendees are inactive and thus only receive media. In this example, each end-point sends one video stream, but can receive up to four simultaneous video streams, multiplexed as different SSRC in the same RTP session. One or more central nodes (RTP Mixer) are used to help facilitate the media transport between the participants, and are involved in choosing the streams to be forwarded. Assume that there is an aggregate bandwidth limit of 3 Mbps in the receive direction, and that each received video stream should be limited to max 1 Mbps.
An SDP offer for the setting up a session with one video stream for the sending direction and four video streams for the receiving direction is shown below when using [I-D.westerlund-avtcore-max-ssrc] to explicitly declare capability to handle multiple streams. In this case, only the legacy ‘b=AS’ bandwidth attribute is used, valid only for the aggregate.
m=video 49300 RTP/AVP 96 b=AS:3000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c016 a=max-recv-ssrc:* 4
This example again highlights the asymmetry problem with the existing bandwidth attribute, but it also highlights the lack of per-stream bandwidth specification. This means that it is not possible to declare the 1 Mbps bandwidth limit that should be used for each one of the for streams in the receiving direction, which thus is a desirable property of the new bandwidth attribute. Note also that in this example, the 1 Mbps limit per stream cannot be fully utilized if all four streams are used simultaneously.
Resource allocation is typically a compromise between perceived quality and network utilization. From an end-user perspective, the bandwidth for a service should be as high rate as possible, since this should give the best user experience. However, from a network perspective, one would like to minimize the rate, since this should maximize the number of sessions that can be supported.
For some services, like conversational voice- and/or video-telephony, one needs to ensure that the network is capable of delivering a certain at least required rate, even when the network load is high. This is needed to ensure user satisfaction, both in terms of quality and end-to-end delay. This means that the end-points and the network need to agree on what maximum bandwidth that can be used for the session as well as some lowest useful "least required" bandwidth.
The current bandwidth modifier, ‘b=AS’, is used to negotiate the maximum bandwidth. However, since it only allows for negotiate one bandwidth it cannot be used to also negotiate a lower bandwidth limit.
To solve this, a new bandwidth negotiation method should allow for negotiating not only the highest rate but also the “at least required” rate. To enable a negotiation between the end-point and the network, a reasonable approach is that the end-point requests a lower bandwidth limit and then the network indicate what “least required” rate that was granted.
It should be clear from the above discussion that the current bandwidth attribute is too limited to be used for all use cases and that some extensions are needed.
The current bandwidth attribute, ‘b=AS’, is sufficient for simple sessions but gives ambiguities when negotiating more advanced session types. One of the drawbacks is that ‘b=AS’ only indicates the desired bandwidth for the receiving direction but, if the answering end-point wants to use a lower rate than what is offered, then there is often no way for the resource allocation function to know what bandwidth that will be used in the offerer's sending direction.
Implementers of end-points and resource allocation functions may try to resolve this ambiguity by using other information available in the SDP, e.g. codec-specific information. However, such information is not always easily available, e.g. for video codecs.
End-points may have to perform a second offer/answer negotiation to resolve the ambiguity. This, obviously, has the drawbacks that the SIP traffic is increased and that this takes some extra time. It is also not guaranteed that the end-points will actually initiate a second offer/answer negotiation.
The analysis above has also shown that the current bandwidth attribute is insufficient to properly describe the session for multi-stream scenarios.
The analysis above has also shown that the current bandwidth modifier can be used to negotiate the maximum bit rate in bearers allocated in some wireless networks, but it is insufficient for also negotiating a lower, "least required", bandwidth limit.
Another problem with the existing bandwidth attribute is that the syntax is very limited and does not allow for introducing extensions, only additional identifiers with a single value each.
It is therefore proposed to define a new bandwidth attribute, including a new syntax. The new bandwidth attribute should support:
The existing bandwidth modifier, ‘b=AS’, is widely used today. The existing SDP attributes for directionality, ‘a=sendrecv’, ‘a=recvonly’, ‘a=sendonly’ and ‘a=inactive’, are also widely used. It is therefore important to ensure interworking between the new bandwidth attribute and the mechanisms already existing in SDP.
This section proposes a new bandwidth attribute ‘a=bw’ that can be used either as an extension to the already existing bandwidth attribute ‘b=AS’ or replacing the existing bandwidth attribute. The new bandwidth attribute includes semantics that allows for also replacing the existing bandwidth attribute.
The syntax for the new bandwidth attribute is:
a=bw:<direction> <scope> <semantic>:<value>
where:
The new attribute is designed to allow for future extendability.
The ABNF RFC 5234 [RFC5234] for this attribute is the following:
bw-attrib = "a=bw:" direction SP [req] scope SP [req] semantics ":" values direction = "send" / "recv" / "sendrecv" / direction-ext scope = payloadType / scope-ext payloadType = "pt=" ("*" / (PT-spec) *("," PT-spec)) PT-spec = PT-value / PT-value-range PT-value = 1*3DIGIT PT-value-range = PT-value "-" PT-value req = "!" semantics = "SMT" / "AMT" / "SLT" / "SLTR" / "ALT" / "ALTR" / semantics-ext values = token-bucket / value-ext token-bucket = "tb=" br-value ":" bs-value br-value = "*" / 1*15DIGIT ; Bucket Rate [bps] bs-value = "*" / 1*15DIGIT ; Bucket Size [bytes] direction-ext = token ; As defined in RFC 4566 scope-ext = 1*VCHAR ; As defined in RFC 5234 semantics-ext = token ; As defined in RFC 4566 value-ext = 0*(WSP / VCHAR) ; As defined in RFC 5234
The ‘a=bw‘ attribute defines three possible directionalities for the bandwidth:
The directionality must be specified when the ‘a=bw’ attribute is used. Only one directionality can be specified on each ‘a=bw’ line. Special care must be taken to avoid conflicting definitions. For example, if ‘sendrecv’ has been specified on one ‘a=bw’ line for a scope, e.g. payload number 96, then the direction cannot be set to ‘send’ or ‘recv’ on another ‘a=bw’ line for the same scope. However, it is allowed to specify directionality ‘send’ on one ‘a=bw’ line for a scope and directionality ‘recv’ on another ‘a=bw’ line. This is useful when the bandwidth is different in different directions. Using ‘sendrecv’ as directionality on an ‘a=bw’ line is a shortcut in the sense that it is equivalent to using two separate ‘a=bw’ lines where one uses ‘send’ and the other ‘recv’ but that otherwise are semantically identical.
The scope indicates what is being configured by the bandwidth semantics on this attribute line. Two different scopes are defined based on payload type:
The scope parameter is extensible to allow for adding other scope definitions in the future.
This specification defines six related semantics. All semantics represent either the bandwidth consumption of a single stream or the aggregate of streams as a token bucket defining a transmission profile which the media sender must stay within. The token bucket values are the token rate in bits per second and the bucket size in bytes both provided as integers, see RFC 2212 [RFC2212]. The below semantics includes the whole IP packet, for example IP, UDP, RTP headers and RTP payload, as what shall be metered when determining if the send pattern is within the profile. The token bucket definition allows for wild cards enable to specify that one want a value as token bucket, but has no proposed value.
The definitions of the semantics in more detail are:
The required prefix (“!”) is used when the direction, scope and semantics is required be supported and understood by the SDP consuming end-point.
In declarative usage the SDP attribute is interpreted from the perspective of the end-point being configured by the particular SDP. An interpreter MAY ignore ‘a=bw’ attribute lines that contains unknown scope or semantics that does not start with the required ("!") prefix. If a "required" prefix is present at an unknown scope or semantics, the interpreter SHALL NOT use this SDP to configure the end-point.
The offer/answer negotiation is performed for each ‘a=bw‘ attribute line individually with the scope and semantics immutable.
An offerer may use the ‘a=bw’ attribute(s) for some or all of the offered media types. An answerer may remove the ‘a=bw’ attribute(s) for the media types where it was used in the SDP offer.
The SDP may include an offer for an Aggregated Maximum Token bucket (AMT) without specifying any Stream Token Buckets (SMTs) for any individual streams.
When using the ‘a=bw’ attribute to define the token bucket for a certain scope then the offerer should define token buckets for all scopes of the same type. For example, if the SDP offer includes three payload types, e.g. 96, 97 and 98, and if a token bucket is defined for payload type 96, then the offerer should also define token buckets for the other payload types. This can be done either by defining one token bucket each for payload type 97 and 98 or by defining a common token bucket for payload type 97 and 98.
When the token bucket rate and size are declared in an offer for directionality ‘sendrecv’ then this indicates the token bucket rate and the token bucket sizes are the same in both directions. For example, if the offered bandwidth is 1 Mbps, then the end-point declares that it is capable of sending with a bandwidth up to 1 Mbps and that it is capable of receiving with a bandwidth up to 1 Mbps.
If either the token bucket rate(s) or the token bucket sizes are different in sending and receiving direction then ‘sendrecv’ cannot be used. One should instead include two or more ‘a=bw’ lines with the respective directionality, bandwidths and sizes.
When the token bucket parameters are declared in an SDP offer for directionality ‘send’ then this indicates the token bucket parameters the sender intends to use. The answerer may change this value, both to increase it and to reduce it, see below.
When the token bucket parameters are declared in an SDP offer for directionality ‘recv’ then this indicates that the largest envelope for the token bucket parameters that the offerer thinks the media sender shall use.
An agent understanding the ‘a=bw’ attribute and answering to an offer including the ‘a=bw’ attribute SHOULD include the attribute in the answer for all media types for which it was offered.
An answerer SHOULD ignore ‘a=bw’ attribute lines that contains unknown scope or semantics that does not contain the required ("!") prefix. If a "required" prefix is present at an unknown scope or semantics, then the answerer SHALL reject the media description by setting the port to 0 and copy the 'a=bw' attributes not understood in the answer. In this case, 'a=bw' attributes that are understood SHALL NOT be included in the answer.
If an answerer would like to add additional bandwidth configurations using other directionality, scope, and semantics combination, then it MAY do so by adding such definitions in the SDP answer.
An agent may also divide an ‘a=bw’ offer into several ‘a=bw’ offers. One example is when the SDP offer included an ‘a=bw’ offer with directionality ‘sendrecv’, which indicates that the token bucket parameters are the same in sending and receiving direction. If the answerer would like to change the parameters for one or both directions, so that the parameters are no longer the same for both directions, then the answerer can include two ‘a=bw’ lines in the SDP answer, one for sending direction and another for receiving direction. In case an offered sendrecv media becomes a single direction media then the sendrecv can be modified to that single direction.
An agent responding to an offer will need to consider the directionality and reverse them in the answer when responding to media streams using unicast.
For media stream offers over unicast with directionality send, the answerer SHALL reverse the directionality and indicate its reception bandwidth capability, which may be lower or higher than what the sender has indicated as its intended maximum.
For media stream offers over unicast with directionality receive, the token bucket parameters indicate the upper limits. The answerer SHALL reverse the directionality and may reduce the bandwidth when producing the answer indicating the answerer intended maximum transmission rate.
If the answerer removes one or several RTP Payload Types from the SDP when creating the SDP answer then the corresponding ‘a=bw’ lines SHOULD be removed as well. The answerer MAY however keep an ‘a=bw’ line when the removed RTP Payload Type number is included within an identified range or list of Payload Type numbers.
In SDP bandwidth terms, the bucket size is a new parameter and what value to use for it may be hard to understand for implementers of this specification. This section therefore gives some guidelines on how to set bucket size values.
A token bucket specifies an envelope for a transmission profile where individual measurements have some impact if the media stream or aggregate should be considered within the specified profile. The semantics defined in this document only require that the media stream is within the token bucket specification at the point emitting it into the network. The network may add jitter causing the media stream/aggregate to no longer be within the specified token bucket profile.
A sender SHOULD base the choice of token bucket size on how it plans to send data. That can in turn be decided from e.g. codec configuration, intended number of encoded frames per packet (ptime), network interface, maximum transmission unit (MTU), etc. In practice, for the simplified case where the sender is designed to send all packets with precisely even time spacing, the token bucket size can be set to the maximum packet size and the bit-rate to the long term highest bit-rate intended to be used.
However, for media streams that are more variable the bucket parameters should be chosen so that the emitted traffic is not too bursty measured over a shorter interval. Until the bucket is drained, the media sender will be able to emit packets at or close to the interface's maximum bit-rate. Long burst of packets at interface speed becomes more sensitive to loss due to cross-traffic in switching fabrics with small buffers. Due to this, a sender can consider transmission scheduling to a rate lower than the interface rate but higher than the token bucket average rate.
Let's consider the example of a large video intra frame consisting of 10 full MTU (let's assume 1500 bytes) packets which is 5 times the size of the median frame size of two full MTU packets. The average bit-rate may be 1 Mbps. If the token bucket was to be configured to (1 Mbps, 1500) then that would imply that a new full MTU packet could be emitted no more often than one packet every 12 ms. That would require 120 ms to transmit the intra frame, which for a 25 frames per second video is 3 frame intervals. Thus potentially inducing significant playout jitter at a receiver. A token buffer specification of (1 Mbps, 15000) would allow all 10 packets be sent up to line speed. This could result in them being emitted every 1.2 ms over a 100 Mbps interface if there is no competing traffic. To ensure that a 10 packet burst should be possible to transmit within one frame interval of 40 ms, then the bucket depth needed is burst size in bits, minus time interval times bucket fill rate, and the resulting value converted back into bytes: (15000*8-0.04*1M) / 8 = 10000 bytes. The average bit-rate for this intra frame over a single frame period becomes 4 Mbps. So the question is if bursts up to 4 Mbps should be allowed now and then as long as the average is within 1 Mbps, or if the sender has to transmit the intra using several frame intervals, skipping the next frame(s) and hoping that the receiver doesn't drop the intra frame as being too late. The sender could also consider reducing the quality of the intra frame, resulting in a reduced number of MTU required to transmit it.
A sender SHOULD avoid adding excessive safety margins to the sending bucket size. A sender MAY add bucket size margins if it has knowledge of internal transmission timing variations, or if it knows about packet handling outside the sender itself that will affect the effective bucket size (as seen from a receiver) that is otherwise not reflected in the conveyed bucket size figure.
With the semantics specified in this document, the intended media receiver gets to provide token bucket parameters that specifies how the sender should behave. The traffic received by the receiver (or intermediate nodes) may no longer conform to the token bucket due to jitter introduced by the network path between the sender and the receiver. This document assumes that the receiver will have receiver buffers for de-jittering that are significantly larger than the token bucket parameters. This due to that a media unit like a video frame may be transmitted over time using more data than the bucket depth provides and instead spread it in time, transmitting each fragment when the bucket is refilled enough for the next fragment to be sent.
A receiver's input to the sender's bit-rate limitation should be based on known limitations such as the networks, decoding capabilities etc. The bucket depth will control how bursty the traffic can be beyond the long term average specified by the bucket refill rate.
When there are media aware middle nodes on the media path between the sender and receiver, those middle nodes may have to or want to apply similar considerations as the original media sender and receiver. If those middle nodes are aware of SDP and the new bandwidth attribute from this specification, and have in-path SDP adjustment capabilities, they could benefit from modifying the values to better fit the actually available end-to-end media path capabilities. For example, an RTP Media Translator can express what it actually is going to deliver of the far end-point's media to an end-point instead of that far end-point's provided values.
As the token bucket specified for the semantics in this document is based on what the sender emit into the network, a policer should have some margin allowing for network introduced jitter. The amount will of course be dependent on the policer's location in relation to the media sender.
If the media uses RTP and when the media has been transmitted for some time, the sender should have received a fair amount of RTCP receiver reports from the receiver. The sender can from RTCP estimate the observed network jitter at the receiver and may be able to dynamically adjust the sender behavior such that the aggregate of the sender behavior and the reported network jitter are fulfilling the senders token bucket profile.
These SDP examples show how the new bandwidth attribute can be used. The benefits, compared to the legacy bandwidth attribute, are also highlighted.
The SDP examples included below are intentionally not complete. Only the parts that are relevant for this description are included.
This example shows the SDP offer for several fixed-rate codecs, mu-law and A-law PCM, G.726 and G.728.
m=audio 49200 RTP/AVP 8 0 96 18 b=AS:80 a=rtpmap:96 G726-32/8000/1 a=bw:sendrecv pt=0,8 SMT:tb=80000:1000 a=bw:sendrecv pt=96 SMT:tb=48000:1000 a=bw:sendrecv pt=18 SMT:tb=24000:1000 a=ptime:20 a=maxptime:20
The new bandwidth attribute offers the possibility to negotiate the bandwidth individually for each codec. If the answerer removes a codec when creating the answer then it is still known how much bandwidth the other codecs will use. This means that the ambiguities listed in Section 3.2.1 can be avoided.
This example shows the SDP negotiation for offering using the AMR codec, AMR [AMR].
m=audio 49200 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=bw:sendrecv pt=97 SMT:tb=28800:200 a=bw:sendrecv pt=97 SLTR:tb=22400:200 a=ptime:20 a=maxptime:100
m=audio 49100 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=bw:sendrecv pt=97 SMT:tb=28800:200 a=bw:sendrecv pt=97 SLT:tb=22400:200 a=ptime:20 a=maxptime:100
Since the new bandwidth attribute offers a possibility to negotiate both the maximum and the at least required bandwidth, it is possible for both the other end-point and any resource allocation function to know how the end-points will adapt when congestion is detected.
This example shows how the new bandwidth attribute, ‘a=bw’, can be used to negotiate the maximum and the least required bandwidths for multiple rate-adaptive codecs, in this case for AMR and AMR-WB, AMR-WB [AMR-WB]. For AMR, the highest codec mode is 12.2 kbps, giving a maximum bandwidth of 28.8 kbps, and the at least required mode is selected to be 5.9 kbps, giving a least required bandwidth of 22.4 kbps. For AMR-WB, the highest codec mode is 23.85 kbps, giving a maximum bandwidth of 40.4 kbps, and the least required mode is 8.85 kbps, giving a least required bandwidth of 25.6 kbps.
m=audio 49200 RTP/AVP 96 97 b=AS:41 a=rtpmap:96 AMR-WB/16000/1 a=fmtp:96 mode-change-capability=2; max-red=80 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=bw:sendrecv pt=96 SMT:tb=40400: 350 a=bw:sendrecv pt=96 SLTR:tb=25600:350 a=bw:sendrecv pt=97 SMT:tb=28800:200 a=bw:sendrecv pt=97 SLTR:tb=22400:200 a=ptime:20 a=maxptime:100
m=audio 49100 RTP/AVP 97 b=AS:29 a=rtpmap:97 AMR/8000/1 a=fmtp:97 mode-change-capability=2; max-red=80 a=bw:sendrecv pt=97 SMT:tb=28800:200 a=bw:sendrecv pt=97 SLT:tb=22400:200 a=ptime:20 a=maxptime:100
In this case, it is clear when the answer is received that the bandwidth needed for AMR applies to both directions. There is no need for a send offer/answer negotiation to clarify that the bandwidth applies also to end-point A’s receiving direction. Thereby, the issues listed in Section 3.2.3 are resolved.
The following SDP example shows how to use the new bandwidth attribute to offer asymmetric streams. In this case, the end-point offers to send H.264 video with 1 Mbps while it is capable of receiving H.264 with up to 3 Mbps. Note that this example does not make use of the codec-specific H.264 level asymmetry signaling as defined in RFC 6184 [RFC6184].
m=video 50324 RTP/AVP 96 b=AS:3000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c016 a=bw:send pt=96 SMT:tb=1000000:8192 a=bw:recv pt=96 SMT:tb=3000000:16384
It should be clear from this example that the new bandwidth attribute is useful when negotiating asymmetric sessions since it offers the possibility to define the token bucket parameters for both sending and receiving directions separately.
This SDP example shows how the new bandwidth attribute, ‘a=bw’, can be used for negotiating the bandwidth when the RTP Retransmission Payload Format RFC 4588 [RFC4588] is used.
m=video 49170 RTP/AVPF 96 97 b=AS:500 a=rtpmap:96 MP4V-ES/90000 a=rtcp-fb:96 nack a=fmtp:96 profile-level-id=8; config=01010000012000884006682C2090A21F a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96;rtx-time=3000 a=bw:send pt=* AMT:tb=500000:4096 a=bw:recv pt=* AMT:tb=500000:8192
In this case, it is beneficial to use the Aggregate Maximum Token bucket semantics to allow the end-points to adapt the bandwidths used for the original stream and for the retransmission stream during the session. The end-point can send more original packets when the packet loss rate is low. When the packet loss rate is high then the end-point can use less bandwidth for the original packets and instead allow for more retransmissions. It would also be possible to specify separate limits for the original stream and the retransmission stream by using a separate set of ‘a=bw’-lines for pt=96 and pt=97.
The example below is based on the use case described in Section 3.3.1. Only the negotiation for video is shown here.
m=video 49300 RTP/AVP 96 b=AS:3000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c01f a=bw:send pt=* SMT:tb=1000000:1000 a=bw:recv pt=* SMT:tb=1000000:2000 a=bw:send pt=* AMT:tb=1000000:1000 a=bw:recv pt=* AMT:tb=3000000:6000 a=max-recv-ssrc:* 4
With the new bandwidth attribute, it is possible to define the bandwidth for each received stream independently from each other. In this case, the SDP shows that the end-point is prepared to send maximum 1 Mbps, and that the end-point is prepared to receive maximum 1 Mbps per stream. The SDP also shows that the end-point is prepared to receive maximum 3 Mbps, aggregated for the up to four streams in the receiving direction. Note that this implies that to receive more than three streams, each stream’s bandwidth must be reduced to comply with the maximum aggregate.
This example shows a declarative usage of the new bandwidth attribute.
m=video 50324 RTP/AVP 96 97 98 a=rtpmap:96 H264/90000 a=rtpmap:97 H263-2000/90000 a=rtpmap:98 MP4V-ES/90000 a=max-recv-ssrc:96 2 a=max-recv-ssrc:* 5 a=bw:send pt=* SMT:tb=1200000:16384 a=bw:recv pt=96 SMT:tb=1500000:16384 a=bw:recv pt=97,98 SMT:tb=2500000:16384 a=bw:recv pt=* AMT:tb=8000000:65535
In the above example, the outgoing single stream is limited to bucket rate of 1.2 Mbps and bucket size of 16384 bytes. The up to 5 incoming streams can in total use maximum 8 Mbps bucket rate and with a bucket size of 65535 bytes. However, the individual streams maximum rate is depending on payload type. Payload type 96 (H.264) is limited to 1.5 Mbps with a bucket size of 16384 bytes, while the Payload types 97 (H.263) and 98 (MPEG-4) may use up top 2.5 Mbps with a bucket size of 16384 bytes.
The proposed new bandwidth attribute obviously has connections to the bandwidth modifier ‘b=AS’ and the attributes defined for directionality (’a=sendrecv’, ’a=sendonly’, ’a=recvonly’ and ’a=inactive’) defined in RFC 4566 [RFC4566]. It is therefore important to properly analyze these relationships so that any interoperability issues can be avoided.
If the SDP includes both the ‘b=AS’ bandwidth modifier and ‘a=bw’ bandwidth attribute then alignment may be necessary to avoid confusion. This section gives some guidelines for such alignment. It may however happen that some usage needs other alignments than what is discussed below. If so, then those alignments need to be considered on a case-by-case. The discussion below should therefore not be seen as an exhaustive list.
In general, the bandwidths offered with ‘b=AS’ and ‘a=bw’ should be aligned for the direction that applies for the ‘b=AS’ bandwidth modifier. For ‘sendrecv’ and ‘recvonly’ sessions, ‘b=AS’ indicates the bandwidth for the receiving direction. The b=AS is closest in interpretation to the AMT semantic. If the stream maximum semantic (SMT) is used then the sum of the bandwidths in the receive direction may exceed the ‘b=AS’ bandwidth but the AMT should not exceed the b=AS value.
If the session includes multiple streams, but if not all of the streams will be active simultaneously, then ‘b=AS’ should indicate the maximum bandwidth that will be used for the combinations of streams that are active simultaneously, the same way AMT could be used in such a session. This also means that the bandwidths offered with ‘a=bw’ are accumulated for the combination of streams that are active, and this aggregated bandwidth should not exceed the bandwidth defined with ‘b=AS’. Note however that it is possible and feasible to specify an aggregate that is less than the sum of the maximum bandwidth for the maximum amount of available streams. It may be possible to use the maximum number of active streams with a lower bandwidth than the maximum, or it may be possible to reduce the active number of streams to stay within the bandwidth limit.
The SDP below gives an example of how this is done. In this example, the intention is to use either the payload type pair (96, 97) or the payload type pair (98, 99). The intention is however to, for example, not pair payload types 96 and 98.
m=video 50000 RTP/AVP 96 97 98 99 100 b=AS:1000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c00c a=rtpmap:98 H264/90000 a=fmtp:98 profile-level-id=42c00d a=rtpmap:99 H264/90000 a=fmtp:99 profile-level-id=42c00c a=rtpmap:100 H264/90000 a=fmtp:100 profile-level-id=42c00c a=bw:sendrecv 96 SMT:tb=700000:4000 a=bw:recv 97 SMT:tb=300000:3000 a=bw:sendrecv 98 SMT:tb=500000:3000 a=bw:recv 99 SMT:tb=200000:2000 a=bw:send 100 SMT:tb=300000:1400 a=sendrecv
This session is bi-directional, as shown with the ‘a=sendrecv’ attribute. The bandwidth offered with ‘b=AS’ therefore applies to the receive direction. The ‘b=AS’ is then set based on the combination of streams that gives the highest bandwidth, i.e. the payload type pair (96, 97).
This means that the bandwidths offered with ‘a=bw’ are aligned with the bandwidth offered with ‘b=AS’.
If, on the other hand, the intention would be to use another combination of payload types, for example (96, 98), then this would add up to 1200 kbps, which would mean that the stream bandwidths would not be aligned with the ‘b=AS’ bandwidth.
This shows that bandwidths for ‘sendrecv’ and ‘recv’ directions are added together when determining the bandwidth for the combined streams.
If the offer is “complex”, for example offering multiple streams for both speech and video, possibly with many different codecs, (and therefore uses ‘a=bw’ together with the ‘b=AS’ bandwidth modifier) and if the answerer wants to change this into a “simple” session (e.g. plain simple VoIP with only one RTP payload type for codec X) then the answerer may remove the ‘a=bw’ lines when creating the answer. It may therefore happen that the answer includes only ‘b=AS’ bandwidth modifier in the SDP answer. However, if the offer does not include any ‘b=AS’ line then it is recommended to maintain the ‘a=bw’ lines also in the answer, even for “simple” sessions. This means that the offerer cannot rely on the existence of ‘a=bw’ in the answer.
Since the ‘a=bw’ attribute includes a parameter for directionality it is important to clarify the relationship to the already existing directional attributes in SDP (‘sendrecv’, ‘sendonly’, ‘recvonly’ and ‘inactive’). In general, one can say that:
At session setup time, it is therefore acceptable to define streams with other directionality than what is shown with the SDP attribute for directionality. However, when media is transmitted, then the SDP attribute for directionality has to be followed. An example of this is shown below.
m=video 5000 RTP/AVP 96 97 98 b=AS:1000 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=42c00d a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42c00c a=bw:sendrecv 96 SMT:tb=700000:4000 a=bw:recv 97 SMT:tb=200000:3000 a=bw:send 97 SMT:tb=300000:1400 a=recvonly
This means that three bandwidths are defined at session setup:
However, since ‘a=recvonly’ is defined then this means that the end-point is, at the session setup time, only willing to receive media even though the SDP contains bandwidth declarations also for the sending direction. This allows for setting up streams that are effectively inactive in one or both directions from the beginning of the session and then enabling them later in the session.
This can be compared with the case when one defines one or more codecs, even if the session starts up as ‘inactive’.
The a=bw attribute is defined to be extensible and this section discusses the extension points that are available.
The current specification defines send, recv and sendrecv. In case some new directionality behavior is needed that doesn't match the existing, a new one could be defined. This should be avoided unless a clear need for a new directionality is found.
It is expected that there will be a need to extend the bandwidth scope. This document only defines two scope types, session and payload type, and there is very likely other desirable scopes that will be defined in the future. Possible examples of scopes are those applying to a specific SSRC, a particular end-point, or a class of end-points.
This is the extension point that is expected to be frequently used in the future. A major proliferation of semantics is not good for interoperability, but it is likely that bandwidth shortcomings or missing functionalities will be discovered in the future. Thus defining new semantics gives maximum flexibility to define the meaning of the provided value(s), the format of the values and how to interpret the directionality and scope values.
This document only defines token buckets as values. In case fewer or more parameters are needed to express a particular semantics, new value formats can be defined. Defining new value formats should be done with some consideration of generality and reuse so that future semantics can also use the new value format, with the target to try to minimize the number of different formats.
This document contain a few open issues:
Following the guidelines in RFC 4566 [RFC4566] and in RFC 3550 [RFC3550], the IANA is requested to register:
This section will be filled out in future versions of this document.
Excessive bandwidth allocation can consume all the resources, much more than what the end-point(s) intend to use. So, if a session allocates an unnecessarily high bandwidth then this will likely mean that some other users cannot be admitted, or that they cannot get QoS guaranteed resources that they requested and have to use best effort. It can also happen that the session itself is rejected, if the end-points try to allocate resources that are not available. Allocating too little bandwidth is likely to negatively impact the perceived media quality or entirely prevent reception of requested media.
The above shows that the bandwidth attribute is a potential vector for attacks both from malicious end-points or third party attackers that attempts to modify the attribute to impact the system to allocate unnecessary resources, deny end-points service, reduce quality for end-points or incur cost on users.
To prevent third party attacks the signalling should be source authenticated and integrity protected to prevent any on or off-path attacker from injecting or modifying the SDP. Malicious end-points can't as easily be protected against using crypto, instead behavior analysis and preventing such a malicious end-point from having serious impact on other end-points are needed.