MMUSIC Working Group T. Frankkila
Internet-Draft M. Westerlund
Intended status: Standards Track B. Burman
Expires: October 24, 2012 Ericsson
April 24, 2012

Extensible Bandwidth Attribute for SDP
draft-westerlund-mmusic-sdp-bw-attribute-01

Abstract

Knowledge of what bandwidths the end-points intend to use is important both for the other end-point and for resource allocation in various types of networks. This is especially important for wireless access networks which typically have quite limited resources. The bandwidth attribute in Session Description Protocol (SDP), ‘b=AS’, is today quite widely used to define the bandwidth that the end-points intends to use, in various types of sessions. This document will show that the existing bandwidth attribute, such as ‘b=AS’, although widely used in todays scenarios, has limitations that make it hard or even impossible for the end-points to express their intentions accurately when it comes to bandwidth usage. To solve the identified problems, this document defines a new extensible SDP bandwidth attribute ‘a=bw’ which enables more detailed control over the bandwidth declarations, request, and allocations. With the new bandwidth attribute it is possible to define different scopes in the session setup and then negotiate the bandwidth individually for each scope.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 24, 2012.

Copyright Notice

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document looks at the issues of non-basic usage of RTP [RFC3550] and analyzes how well the existing SDP [RFC4566] attribute ‘b=AS’ for bandwidth negotiation performs in different scenarios.

This analysis is done by defining a number of use cases, containing sessions with:

It is shown that the existing bandwidth attributes ‘b=AS’ [RFC4566] and 'b=TIAS' [RFC3890] has limitations which make it unclear or even impossible for end-points and for resource allocation functions in the network to determine how much bandwidth the service will use. The analysis also provides the design rationale for the new bandwidth attribute.

This document then proposes a general and extensible mechanism for bandwidth negotiation that can be used for any type of session. Interoperability with the existing mechanisms for bandwidth negotiation is especially important since the existing bandwidth attribute has a wide-spread usage.

This document also presents several examples for how the new bandwidth attribute can be used in the session setup phase for various types of sessions. The examples are derived for IP/UDP/RTP transport although nothing should prevent using the new bandwidth attribute also for other transport protocols.

2. Definitions

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2.2. Terminology

The following terms and abbreviations are used in this document:

Bandwidth:
In this document, the bandwidth is defined as the IP level bandwidth, i.e. including the network protocol (IPv4 or IPv6) and transport protocol (TCP, UDP, RTP, etc) overhead. When RTP is used then the RTCP bandwidth is handled separately from the bandwidth used for RTP packets. Bandwidth in this context is in the unit bits per second, not Hz.
Encoding:
A particular encoding is the choice of the media encoder (codec) that has been used to compress the media. Different encodings result in the fidelity of that encoding through the choice of sampling, bit-rate and other configuration parameters.
End-point:
A single entity sending and/or receiving RTP packets. It may be decomposed into several functional blocks, but as long as it behaves a single RTP stack entity it is classified as a single end-point.
Media stream:
A sequence of RTP packets using a single SSRC that together carry carries part or all of the content of a specific Media Type from a specific sender source within a given RTP session.
RTP session:
An RTP session consists of one or more media streams that have the same purpose. The typical example is to have one RTP session per media type, i.e. that voice and video use different RTP sessions (different ports) since they have different purpose. It is however possible to have multiple streams in an RTP session, for example when having both a stream for non-redundant audio and another stream for re-transmissions of audio packets. The fundamental definition of an RTP session is a single SSRC space.

3. Use Cases and Design Rationale

This section describes a number of use cases where the existing bandwidth attribute ‘b=AS’ is used for bandwidth definition. It also discusses why the limitations of the existing bandwidth attribute makes it hard for other end-points and resource allocation functions to know or estimate how much bandwidth that will be used in the ongoing session.

The analysis is made by defining a set of use cases. The first use cases include fairly simple session types, i.e. point-to-point sessions with or without asymmetry. A few more complex use cases are then analyzed. The last set up use cases reflect fairly advanced session types, e.g. various variants of multiplexing and usage of multiple media streams.

The discussion is then summarized and the design rationales for the new bandwidth attribute are outlined.

3.1. Existing Bandwidth Attribute

The existing bandwidth modifier ’b=’ defined in RFC 4566 [RFC4566] is reviewed in this section.

3.1.1. Attribute Definition

The existing bandwidth attribute ‘b=’ is defined in Section 5.8 of RFC 4566 [RFC4566]. The syntax is:

where:

Bandwidth types have been defined for the negotiation of the RTCP bandwidth using ‘b=RS’ and ‘b=RR’, RFC 3556 [RFC3556].

There is also a bandwidth type for negotiating the transport independent application specific maximum bandwidth, ‘b=TIAS’, RFC 3890 [RFC3890]. This bandwidth type is similar to the ‘b=AS’ bandwidth type, except that the overhead caused by the transport protocol headers is not included.

One issue with the existing bandwidth attribute is that the syntax is very limited since it only allows for defining new bandwidth types (<bwtype>) and their respective single numerical value. This limitation needs to be considered in the discussion below.

3.1.2. Offer/answer Procedure for the Existing Bandwidth Attribute

"An Offer/Answer Model with the Session Description Protocol (SDP)" [RFC3264] describes the offer/answer procedures for the existing bandwidth attribute. For the SDP offer, it describes that the bandwidth attribute indicates the desired bandwidth that the offerer would like to receive. For the SDP answer, it describes that the bandwidth attribute indicates the bandwidth that the answerer would like the offerer to use when sending media. Thus, for offer/answer negotiations, the bandwidth attribute indicates the bandwidth for the receive direction of each end-point.

The solution presented in this document focuses primarily on clarifying and assisting the Application Specific (AS) bandwidth.

[It is an open question to decide if and how to handle the RTCP bandwidth negotiation, e.g. corresponding to b=RS and b=RR.]

[It is an open question to develop semantics for the transport independent bandwidth negotiation, e.g. corresponding to b=TIAS.]

3.1.3. End-point Behavior when Generating Traffic

When an end-point is sending media then this can be done in many different ways, depending on the choices the implementers have made.

Some end-points may send it’s data in a fairly “nice and smooth” media stream, which means that both the packet sizes and the packet rates are more or less constant all the time. An example of a smooth stream is when the end-point is encoding speech and is sending one packet every 20 ms and when the packets are of equal size.

Other end-points may generate bursty streams, which have a large peak-to-average ratio. An example of a bursty stream is when an end-point is encoding video. Most of the time, the end-point is sending packets with almost the same size and with constant packet rate. However, it happens occasionally that the encoder generates much more data for a frame, which may give a very large packet size. It may even happen that the sender has to segment the data into several packets, which may be transmitted in a burst, thereby causing a very high peak rate.

Whether the stream is smooth or bursty makes a big difference for the network and the policy control that usually applies in QoS controlled networks. If the stream is too bursty, then a policy control function may decide to drop packets that exceed the granted rate. This will lead to degraded quality and reduced user satisfaction.

The existing bandwidth attribute offers no mechanism to negotiate what temporal variations that can be allowed for a stream. The only available mechanism is to negotiate the maximum bandwidth, but there is nothing that defines any kind of averaging window (or something similar) that can be used to control the bandwidth variations from the transmitted stream.

It is therefore proposed to use a Token Bucket model to describe the bandwidth with two parameters, the token bucket rate and the bucket size, see RFC 2212 [RFC2212].

3.2. Point-to-point Sessions using SDP offer/answer

The existing modifier for the application specific bandwidth ‘b=AS’ is frequently used in the SDP offer/answer negotiation RFC 3264 [RFC3264] for setting up point-to-point sessions, for example for bi-directional point-to-point VoIP or video telephony sessions. In this section, the use of the legacy bandwidth modifier is reviewed for the use in point-to-point sessions using SDP offer/answer.

3.2.1. Symmetric Point-to-point Sessions, Fixed-rate Codecs

This example below shows the SDP offer from end-point A for several fixed-rate codecs, mu-law and A-law PCM/G.711 [G.711], AD-PCM/G.726 [G.726] and CS-ACELP/G.729 [G.729]. The codecs have different bit rates. PCM encodes speech at 64 kbps. G.726 can encode speech at four different rates, 64, 32, 24 and 16 kbps, but in this case it is assumed that the 32 kbps variant is used. G.729 encodes speech at 8 kbps. The IP/UDP/RTP overhead with 20 ms packetization and IPv4 becomes 16 kbps in all cases giving 80, 48 and 24 kbps, respectively.

m=audio 49200 RTP/AVP 8 0 96 18
b=AS:80
a=rtpmap:96 G726-32/8000/1
a=ptime:20
a=maxptime:80

If end-point B accepts to use this codec then a likely SDP answer would be:

m=audio 49400 RTP/AVP 8 0 96 18
b=AS:80
a=rtpmap:96 G726-32/8000/1
a=ptime:20
a=maxptime:80

In this case, both end-points offer to receive 80 kbps. A resource allocation function would thereby allocate 80 kbps in each direction.

However, if end-point B accepts to use one of the lower rate codecs, for example G.729, but not the PCM codecs, then a likely SDP answer would be:

m=audio 49400 RTP/AVP 18
b=AS:24
a=ptime:20
a=maxptime:80

This means that the offerer has offered to receive 80 kbps while the answerer has offered to receive 24 kbps. In the direction A to B it is clear that a resource allocation function should allocate 24 kbps. However, in the direction B to A it is a little more unclear. On one hand, end-point A has offered to receive 80 kbps. But, on the other hand, end-point B has only indicated support for the G.729 codec and its unknown if B can send with something in addition to G.729 from A's offered set.

A resource allocation may also (incorrectly) conclude that end-point B will also send maximum 24 kbps, since b=AS indicates 24 kbps. But, since maxptime is 80 ms, this means that end-point B could very well use application layer redundancy and encapsulate redundant frames together with non-redundant frames, which would result in a bandwidth exceeding 24 kbps. Even if maxptime would be 20 ms, end-point B could still use application layer redundancy, if the non-redundant and redundant frames are transmitted in different packets. This is possible since end-point A has indicated that it is capable of receiving 80 kbps. Hence, if the resource allocation function uses the codec information and assumes that end-point B will send with only 24 kbps, then this may cause packet losses and/or long delays.

It should be clear with this example that the current bandwidth attribute, b=AS, can create ambiguities related to what bandwidth that will be used in each direction. If the end-points and the resource allocation functions make different interpretations then there is a risk for either poor quality or wasted resources.

To solve this, a new bandwidth negotiation method should enable negotiating different bandwidths for different codecs. If a codec can be configured in several different ways, e.g. G.726 offers the possibility to use four different static bit rates then this would typically be negotiated using different RTP Payload Types. This means that the solution needs to be capable of negotiating different bandwidths for different Payload Types.

3.2.2. Symmetric Point-to-Point Sessions with Rate-Adaptive Codec

This use case describes what might happen when using rate-adaptive codecs in a session, for example AMR [AMR]. The rate adaptation should adapt to a high bitrate when the operating conditions are good, but should adapt to a low bitrate when the operating conditions are degraded, e.g. due to congestion or bad coverage.

One example of the SDP offer-answer negotiation for rate-adaptive codec is shown below.

m=audio 49200 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=ptime:20
a=maxptime:100

The bandwidth attribute in the SDP indicates the bandwidth that the offerer would like to receive, RFC 3264 [RFC3264].

m=audio 49100 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=ptime:20
a=maxptime:100

The bandwidth attribute in the SDP answer indicates the maximum bandwidth that the answerer would like the offerer to use when sending media, RFC 3264 [RFC3264].

In this case, it is clear that both end-points are prepared to receive up to 29 kbps of media. Since AMR can adapt the rate for the encoding, this means that the bandwidth can be reduced, e.g. to the 5.9 kbps mode, if congestion is detected. The existing bandwidth attribute ‘b=AS” is however only used to negotiate the maximum rate. This means that there is nothing in the SDPs that describes how the rate will be adapted. In some cases, usually for speech codec, it might be possible to derive the lowest rate from the codec information. However, there is no guarantee that the end-points will adapt to this rate or whether it will stay at some higher rate. For video codecs, there is usually no codec information at all that could be used to determine how low rate the end-points will use. The lowest usable rate for a video codec is generally not a video codec limitation, but rather some end-user or service consideration on what is the lowest video quality that is still useful or acceptable in the actual scenario.

This means that a resource allocation function has no information which could be used to determine how the end-points will adapt during periods of congestion. Hence the network does not know what to assume from the end-points.

To solve this, a new bandwidth negotiation method should allow for negotiating not only the highest rate but also the minimum rate that is still useful.

3.2.3. Symmetric Point-to-Point Sessions with Several Rate-Adaptive Codecs

Another example is when the originating end-point offers several rate-adaptive codecs, with different bandwidths, and when the answerer only support one or several of the lower-rate configurations but not the configuration that uses the highest bandwidth. With the legacy bandwidth modifier ‘b=AS’ it is only possible to indicate one bandwidth for the whole RTP session, which means that the end-point needs to indicate the highest bandwidth since this is the worst-case scenario. An offer/answer for this case is shown below. The offerer supports both AMR and AMR-WB AMR-WB [AMR-WB] and therefore indicates the bandwidth needed for the AMR-WB configuration since it is higher than for AMR. If the answerer does not support the AMR-WB codec then it will have to remove this configuration from the SDP when creating the SDP answer. This means that the answerer calculates the bandwidth required for AMR instead of AMR-WB.

m=audio 49200 RTP/AVP 96 97
b=AS:41
a=rtpmap:96 AMR-WB/16000/1
a=fmtp:96 mode-change-capability=2; max-red=80
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=ptime:20
a=maxptime:100
m=audio 49100 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=ptime:20
a=maxptime:100

Since the indicated bandwidth is for the receiving direction in this example this means that:

This gives the same problem with ambiguous maximum rate as shown in Section 3.2.1. In addition, since both AMR and AMR-WB are rate-adaptive codecs, with different bit rates, they also have different minimum rates. This means that a resource allocation would be unaware about both the maximum bandwidth and the minimum (required) bandwidth.

To solve this, a new bandwidth attribute should allow for negotiating both maximum and minimum bitrates individually for each payload type.

For speech codecs, it is usually possible to derive the minimum rate from the codec information. However, this is typically not possible for video codecs since they only indicate the maximum encoding level. For example, if end-point A offers to use H.264 level 3.0 H.264 [H.264] but end-point B is only capable of using level 1.2, then this only limits the maximum bandwidth in the direction from A to B. In the other direction, end-point A is still capable of receiving level 3.0.

3.2.4. Asymmetric Point-to-Point Sessions

The session setup for asymmetric streams is not always straight forward. Lets say that one want to set up a session with 600 kbps in the sending direction and 200 kbps in the receiving direction.

m=video 49200 RTP/AVP 96
b=AS:200
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00c
a=sendrecv

From this SDP, it can be determined that the end-point wants to receive 200 kbps. There is some implicit information in the level part of the profile-level-id for the H.264 example above, indicating that the end-point can send using a higher bandwidth (up to 768 kbps), but it requires codec-specific knowledge to be able to extract that implicit information. In this example, lets assume that the sender does not even want to utilize the maximum allowed bandwidth for the signaled level, but a slightly lower one, say 600 kbps. So how is the answerer supposed to know that the offerer really wants to send up to 600 kbps, especially since not even the implicit level-related can be used? There could be many reasons to use a lower video bandwidth than the one defined as level maximum; limited terminal performance in the send direction, a known network bandwidth limitation, a bandwidth charging model that makes the user prefer a lower bandwidth, etc.

One way to express the asymmetry is to set up different RTP sessions for sending and receiving directions. An SDP offer for this might be:

m=video 49200 RTP/AVP 96
b=AS:600
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=sendonly
m=video 49202 RTP/AVP 97
b=AS:200
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c00c
a=recvonly

If the answerer decides to accept this then the SDP answer might be:

m=video 49200 RTP/AVP 96
b=AS:600
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=recvonly
m=video 49202 RTP/AVP 97
b=AS:200
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c00c
a=sendonly

In this example, it is clear that the offerer can send video with 600 kpbs and receive video with up to 200 kbps. However, if the offer is for different codecs, using different bandwidths, then one have the same problem as described in Section 3.2.3.

Specifically for video, but possibly also for other media, it may happen that different implementations send the media in different ways. Some implementations may try to provide a fairly ”smooth” stream in terms of bandwidth variation over time, while other implementations may give a very ”bursty” stream.

There also exist cases where opening additional RTP sessions just for expressing asymmetric transmission bandwidths are not desirable.

3.3. Sessions with Multiple Streams

In this part of the analysis, it is assumed that an RTP session is set up for multiple streams. This can be done in several ways and for several reasons, as discussed in RTP Multiplexing Architecture [I-D.westerlund-avtcore-multiplex-architecture].

3.3.1. Multiple Streams

The assumed usage here is a multi-party session, for example a video conference using an RTP mixer. Some of the attendees are active and their audio and video is distributed to the other users. Some attendees are inactive and thus only receive media. In this example, each end-point sends one video stream, but can receive up to four simultaneous video streams, multiplexed as different SSRC in the same RTP session. One or more central nodes (RTP Mixer) are used to help facilitate the media transport between the participants, and are involved in choosing the streams to be forwarded. Assume that there is an aggregate bandwidth limit of 3 Mbps in the receive direction, and that each received video stream should be limited to max 1 Mbps.

An SDP offer for the setting up a session with one video stream for the sending direction and four video streams for the receiving direction is shown below when using [I-D.westerlund-avtcore-max-ssrc] to explicitly declare capability to handle multiple streams. In this case, only the legacy ‘b=AS’ bandwidth attribute is used, valid only for the aggregate.

m=video 49300 RTP/AVP 96
b=AS:3000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c016
a=max-recv-ssrc:* 4

This example again highlights the asymmetry problem with the existing bandwidth attribute, but it also highlights the lack of per-stream bandwidth specification. This means that it is not possible to declare the 1 Mbps bandwidth limit that should be used for each one of the for streams in the receiving direction, which thus is a desirable property of the new bandwidth attribute. Note also that in this example, the 1 Mbps limit per stream cannot be fully utilized if all four streams are used simultaneously.

3.4. User Experience and Bandwidth Negotiation

Resource allocation is typically a compromise between perceived quality and network utilization. From an end-user perspective, the bandwidth for a service should be as high rate as possible, since this should give the best user experience. However, from a network perspective, one would like to minimize the rate, since this should maximize the number of sessions that can be supported.

For some services, like conversational voice- and/or video-telephony, one needs to ensure that the network is capable of delivering a certain at least required rate, even when the network load is high. This is needed to ensure user satisfaction, both in terms of quality and end-to-end delay. This means that the end-points and the network need to agree on what maximum bandwidth that can be used for the session as well as some lowest useful "least required" bandwidth.

The current bandwidth modifier, ‘b=AS’, is used to negotiate the maximum bandwidth. However, since it only allows for negotiate one bandwidth it cannot be used to also negotiate a lower bandwidth limit.

To solve this, a new bandwidth negotiation method should allow for negotiating not only the highest rate but also the “at least required” rate. To enable a negotiation between the end-point and the network, a reasonable approach is that the end-point requests a lower bandwidth limit and then the network indicate what “least required” rate that was granted.

3.5. Summary of Findings

It should be clear from the above discussion that the current bandwidth attribute is too limited to be used for all use cases and that some extensions are needed.

The current bandwidth attribute, ‘b=AS’, is sufficient for simple sessions but gives ambiguities when negotiating more advanced session types. One of the drawbacks is that ‘b=AS’ only indicates the desired bandwidth for the receiving direction but, if the answering end-point wants to use a lower rate than what is offered, then there is often no way for the resource allocation function to know what bandwidth that will be used in the offerer's sending direction.

Implementers of end-points and resource allocation functions may try to resolve this ambiguity by using other information available in the SDP, e.g. codec-specific information. However, such information is not always easily available, e.g. for video codecs.

End-points may have to perform a second offer/answer negotiation to resolve the ambiguity. This, obviously, has the drawbacks that the SIP traffic is increased and that this takes some extra time. It is also not guaranteed that the end-points will actually initiate a second offer/answer negotiation.

The analysis above has also shown that the current bandwidth attribute is insufficient to properly describe the session for multi-stream scenarios.

The analysis above has also shown that the current bandwidth modifier can be used to negotiate the maximum bit rate in bearers allocated in some wireless networks, but it is insufficient for also negotiating a lower, "least required", bandwidth limit.

Another problem with the existing bandwidth attribute is that the syntax is very limited and does not allow for introducing extensions, only additional identifiers with a single value each.

It is therefore proposed to define a new bandwidth attribute, including a new syntax. The new bandwidth attribute should support:

Directionality:
One need to be able to have different sets of attributes values depending on direction.
Payload specific:
With the new bandwidth attribute it should be possible to specify different bandwidth values for different RTP Payload types. This is because some codecs have different characteristics and one may want to limit a specific codec and payload configuration to a particular bandwidth. Especially combined with codec negotiation there is a need to express intentions and limitations on usage for that particular codec. In addition, payload agnostic information is also needed.
Multiple streams:
The new bandwidth attribute should support bandwidth negotiation both for single streams and for multiple streams. When multiple streams are used, the new bandwidth attribute should allow for declaring both the bandwidth per stream and the aggregated bandwidth.
Bandwidth specification method:
To have a clear specification of what any bit-rate values mean we propose that Token bucket parameters should be used, i.e. bucket depth and bucket fill rate, where appropriate for the semantics. If single values are to be specified, a clear definition on how to derive that value must be specified, including averaging intervals etc.
Bandwidth semantics:
It should be possible to negotiate different types of bandwidths for each scope, including several bandwidth properties in the same negotiation. It should, at least, be possible to negotiate the highest bandwidth and a lower bandwidth limit that indicates the lowest useful bandwidth to use the related media. The least required bandwidth limit should ideally, but need not necessarily, be guaranteed by the network and the remote end-point(s).
Extensibility:
The semantics need to be extensible, so that new semantics can be defined in the future.

The existing bandwidth modifier, ‘b=AS’, is widely used today. The existing SDP attributes for directionality, ‘a=sendrecv’, ‘a=recvonly’, ‘a=sendonly’ and ‘a=inactive’, are also widely used. It is therefore important to ensure interworking between the new bandwidth attribute and the mechanisms already existing in SDP.

4. Attribute Specification

This section proposes a new bandwidth attribute ‘a=bw’ that can be used either as an extension to the already existing bandwidth attribute ‘b=AS’ or replacing the existing bandwidth attribute. The new bandwidth attribute includes semantics that allows for also replacing the existing bandwidth attribute.

The syntax for the new bandwidth attribute is:

a=bw:<direction> <scope> <semantic>:<value>

where:

The new attribute is designed to allow for future extendability.

4.1. SDP Grammar

The ABNF RFC 5234 [RFC5234] for this attribute is the following:

bw-attrib      = "a=bw:" direction SP [req] scope SP
                 [req] semantics ":" values
direction      = "send" / "recv" / "sendrecv" / direction-ext
scope          = payloadType / scope-ext
payloadType    = "pt=" ("*" / (PT-spec) *("," PT-spec))
PT-spec        = PT-value / PT-value-range
PT-value       = 1*3DIGIT
PT-value-range = PT-value "-" PT-value
req            = "!"
semantics      = "SMT" / "AMT" / "SLT" / "SLTR" / "ALT" / "ALTR" /
                 semantics-ext
values         = token-bucket / value-ext
token-bucket   = "tb=" br-value ":" bs-value
br-value       = "*" / 1*15DIGIT   ; Bucket Rate [bps]
bs-value       = "*" / 1*15DIGIT   ; Bucket Size [bytes]

direction-ext  = token             ; As defined in RFC 4566
scope-ext      = 1*VCHAR           ; As defined in RFC 5234
semantics-ext  = token             ; As defined in RFC 4566
value-ext      = 0*(WSP / VCHAR)   ; As defined in RFC 5234

The ‘a=bw‘ attribute defines three possible directionalities for the bandwidth:

send:
In the send direction for SDP Offer/Answer agent or in case of declarative use in relation to the device that is being configured by the SDP.
recv:
In the receiving direction for the SDP Offer/Answer agent providing the SDP or in case of declarative use in relation to the device that is being configured by the SDP.
sendrecv:
The provided bandwidth values apply equally in send and receive directions, i.e. the values configures the directions symmetrically.

The directionality must be specified when the ‘a=bw’ attribute is used. Only one directionality can be specified on each ‘a=bw’ line. Special care must be taken to avoid conflicting definitions. For example, if ‘sendrecv’ has been specified on one ‘a=bw’ line for a scope, e.g. payload number 96, then the direction cannot be set to ‘send’ or ‘recv’ on another ‘a=bw’ line for the same scope. However, it is allowed to specify directionality ‘send’ on one ‘a=bw’ line for a scope and directionality ‘recv’ on another ‘a=bw’ line. This is useful when the bandwidth is different in different directions. Using ‘sendrecv’ as directionality on an ‘a=bw’ line is a shortcut in the sense that it is equivalent to using two separate ‘a=bw’ lines where one uses ‘send’ and the other ‘recv’ but that otherwise are semantically identical.

The scope indicates what is being configured by the bandwidth semantics on this attribute line. Two different scopes are defined based on payload type:

Payload Type:
The bandwidth configuration applies to the specific payload type value(s).
pt=*:
Applies to all payload types being used.

The scope parameter is extensible to allow for adding other scope definitions in the future.

This specification defines six related semantics. All semantics represent either the bandwidth consumption of a single stream or the aggregate of streams as a token bucket defining a transmission profile which the media sender must stay within. The token bucket values are the token rate in bits per second and the bucket size in bytes both provided as integers, see RFC 2212 [RFC2212]. The below semantics includes the whole IP packet, for example IP, UDP, RTP headers and RTP payload, as what shall be metered when determining if the send pattern is within the profile. The token bucket definition allows for wild cards enable to specify that one want a value as token bucket, but has no proposed value.

The definitions of the semantics in more detail are:

SMT (Stream Maximum Token bucket):
The maximum intended or allowed bandwidth usage, including protocol overhead, for each individual source (each SSRC) in an RTP session at the sender side specified by a token bucket. The token bucket wild cards (“*”) should not be used for the SMT semantics since it should always be possible to estimate the maximum bandwidth. This semantics is possible to use with the scope for any payload type (pt=*) where it applies independent of encoding and packetization, or for a specific or a set of payload type(s).
AMT (Aggregate Maximum Token bucket):
The maximum intended or allowed bandwidth usage for the sum of all sources (SSRCs) in an RTP session according to the specified directionality at the media sender specified by a token bucket. The ‘sendrecv’ directionality parameter indicates equal token buckets in both directions, i.e. the aggregate of streams sent to an end-point shall be within the token bucket defined transmission profile, and the aggregate of streams sent from that end-point shall also be within the same token bucket profile at the sender. It can be used either to express the maximum for one particular payload type, for a set of payload types or for any payload type (pt=*). The token bucket wild card (“*”) should not be used for the AMT semantics since it should always be possible to estimate the maximum bandwidth.
SLT (Stream Least required Token bucket):
The least required bandwidth, including IP protocol overhead, needed for the stream for each individual source (each SSRC) in an RTP session as specified by a token bucket at the sender. When using the SLT semantic, the SMT semantic SHOULD also be specified for the same direction and scope. If the SLT semantics is not defined then this means that the least required bandwidth limit is zero. The least required bandwidth is the minimum bandwidth that is necessary for the service to work with usable quality.
SLTR (Stream Least required Token bucket Request):
The request for establishing the least required bandwidth, including protocol overhead, needed for the stream for each individual source (each SSRC) in an RTP session, as specified by a token bucket at the stream sender. An end-point may use the SLTR semantics to request to establish a least required bandwidth. An end-point using the SLTR semantics may set the token bucket rate and/or the token bucket size to “*” to indicate that the end-point has no preference, but that it expects some network node or the answering end-point to define the value(s). A network node answering to the SLTR SHALL replace this with the SLT semantics to indicate the least required bandwidth it sees necessary and which it has attempted to guarantee. If the request is for certain specified payload types, a network node that cannot grant bandwidth based on payload types MAY replace those requested payload types with “*” in the SLT response to indicate a payload type agnostic grant. An end-point receiving an SDP with SLTR, i.e. where the network has not replaced the SLTR semantics with any SLT semantics, SHOULD NOT assume that the requested bandwidth is guaranteed.
ALT (Aggregated Least required Token bucket):
The least required bandwidth, including protocol overhead, needed for the sum of all sources (all SSRCs) in an RTP session as specified by a token bucket at the stream sender. When using the ALT semantic the AMT semantic SHOULD also be specified for the same direction and scope. The directionality and payload type considerations for ALT are the same as for AMT. If the ALT semantics is not defined then this means that the least required bandwidth is zero.
ALTR (Aggregated Least required Token bucket Request):
The request for establishing a least required bandwidth, including protocol overhead, needed for the sum of all sources (all SSRCs) in an RTP session as specified by a token bucket at the media sender side. The directionality and payload type considerations for ALTR are the same as for SLTR. The ALTR semantics MUST only be used together with AMT.

The required prefix (“!”) is used when the direction, scope and semantics is required be supported and understood by the SDP consuming end-point.

4.2. Declarative Use

In declarative usage the SDP attribute is interpreted from the perspective of the end-point being configured by the particular SDP. An interpreter MAY ignore ‘a=bw’ attribute lines that contains unknown scope or semantics that does not start with the required ("!") prefix. If a "required" prefix is present at an unknown scope or semantics, the interpreter SHALL NOT use this SDP to configure the end-point.

4.3. Usage in Offer/Answer

The offer/answer negotiation is performed for each ‘a=bw‘ attribute line individually with the scope and semantics immutable.

An offerer may use the ‘a=bw’ attribute(s) for some or all of the offered media types. An answerer may remove the ‘a=bw’ attribute(s) for the media types where it was used in the SDP offer.

The SDP may include an offer for an Aggregated Maximum Token bucket (AMT) without specifying any Stream Token Buckets (SMTs) for any individual streams.

When using the ‘a=bw’ attribute to define the token bucket for a certain scope then the offerer should define token buckets for all scopes of the same type. For example, if the SDP offer includes three payload types, e.g. 96, 97 and 98, and if a token bucket is defined for payload type 96, then the offerer should also define token buckets for the other payload types. This can be done either by defining one token bucket each for payload type 97 and 98 or by defining a common token bucket for payload type 97 and 98.

When the token bucket rate and size are declared in an offer for directionality ‘sendrecv’ then this indicates the token bucket rate and the token bucket sizes are the same in both directions. For example, if the offered bandwidth is 1 Mbps, then the end-point declares that it is capable of sending with a bandwidth up to 1 Mbps and that it is capable of receiving with a bandwidth up to 1 Mbps.

If either the token bucket rate(s) or the token bucket sizes are different in sending and receiving direction then ‘sendrecv’ cannot be used. One should instead include two or more ‘a=bw’ lines with the respective directionality, bandwidths and sizes.

When the token bucket parameters are declared in an SDP offer for directionality ‘send’ then this indicates the token bucket parameters the sender intends to use. The answerer may change this value, both to increase it and to reduce it, see below.

When the token bucket parameters are declared in an SDP offer for directionality ‘recv’ then this indicates that the largest envelope for the token bucket parameters that the offerer thinks the media sender shall use.

An agent understanding the ‘a=bw’ attribute and answering to an offer including the ‘a=bw’ attribute SHOULD include the attribute in the answer for all media types for which it was offered.

An answerer SHOULD ignore ‘a=bw’ attribute lines that contains unknown scope or semantics that does not contain the required ("!") prefix. If a "required" prefix is present at an unknown scope or semantics, then the answerer SHALL reject the media description by setting the port to 0 and copy the 'a=bw' attributes not understood in the answer. In this case, 'a=bw' attributes that are understood SHALL NOT be included in the answer.

If an answerer would like to add additional bandwidth configurations using other directionality, scope, and semantics combination, then it MAY do so by adding such definitions in the SDP answer.

An agent may also divide an ‘a=bw’ offer into several ‘a=bw’ offers. One example is when the SDP offer included an ‘a=bw’ offer with directionality ‘sendrecv’, which indicates that the token bucket parameters are the same in sending and receiving direction. If the answerer would like to change the parameters for one or both directions, so that the parameters are no longer the same for both directions, then the answerer can include two ‘a=bw’ lines in the SDP answer, one for sending direction and another for receiving direction. In case an offered sendrecv media becomes a single direction media then the sendrecv can be modified to that single direction.

An agent responding to an offer will need to consider the directionality and reverse them in the answer when responding to media streams using unicast.

For media stream offers over unicast with directionality send, the answerer SHALL reverse the directionality and indicate its reception bandwidth capability, which may be lower or higher than what the sender has indicated as its intended maximum.

For media stream offers over unicast with directionality receive, the token bucket parameters indicate the upper limits. The answerer SHALL reverse the directionality and may reduce the bandwidth when producing the answer indicating the answerer intended maximum transmission rate.

If the answerer removes one or several RTP Payload Types from the SDP when creating the SDP answer then the corresponding ‘a=bw’ lines SHOULD be removed as well. The answerer MAY however keep an ‘a=bw’ line when the removed RTP Payload Type number is included within an identified range or list of Payload Type numbers.

4.4. Bucket Size Estimation

In SDP bandwidth terms, the bucket size is a new parameter and what value to use for it may be hard to understand for implementers of this specification. This section therefore gives some guidelines on how to set bucket size values.

A token bucket specifies an envelope for a transmission profile where individual measurements have some impact if the media stream or aggregate should be considered within the specified profile. The semantics defined in this document only require that the media stream is within the token bucket specification at the point emitting it into the network. The network may add jitter causing the media stream/aggregate to no longer be within the specified token bucket profile.

4.4.1. Sender Specified Token Bucket

A sender SHOULD base the choice of token bucket size on how it plans to send data. That can in turn be decided from e.g. codec configuration, intended number of encoded frames per packet (ptime), network interface, maximum transmission unit (MTU), etc. In practice, for the simplified case where the sender is designed to send all packets with precisely even time spacing, the token bucket size can be set to the maximum packet size and the bit-rate to the long term highest bit-rate intended to be used.

However, for media streams that are more variable the bucket parameters should be chosen so that the emitted traffic is not too bursty measured over a shorter interval. Until the bucket is drained, the media sender will be able to emit packets at or close to the interface's maximum bit-rate. Long burst of packets at interface speed becomes more sensitive to loss due to cross-traffic in switching fabrics with small buffers. Due to this, a sender can consider transmission scheduling to a rate lower than the interface rate but higher than the token bucket average rate.

Let's consider the example of a large video intra frame consisting of 10 full MTU (let's assume 1500 bytes) packets which is 5 times the size of the median frame size of two full MTU packets. The average bit-rate may be 1 Mbps. If the token bucket was to be configured to (1 Mbps, 1500) then that would imply that a new full MTU packet could be emitted no more often than one packet every 12 ms. That would require 120 ms to transmit the intra frame, which for a 25 frames per second video is 3 frame intervals. Thus potentially inducing significant playout jitter at a receiver. A token buffer specification of (1 Mbps, 15000) would allow all 10 packets be sent up to line speed. This could result in them being emitted every 1.2 ms over a 100 Mbps interface if there is no competing traffic. To ensure that a 10 packet burst should be possible to transmit within one frame interval of 40 ms, then the bucket depth needed is burst size in bits, minus time interval times bucket fill rate, and the resulting value converted back into bytes: (15000*8-0.04*1M) / 8 = 10000 bytes. The average bit-rate for this intra frame over a single frame period becomes 4 Mbps. So the question is if bursts up to 4 Mbps should be allowed now and then as long as the average is within 1 Mbps, or if the sender has to transmit the intra using several frame intervals, skipping the next frame(s) and hoping that the receiver doesn't drop the intra frame as being too late. The sender could also consider reducing the quality of the intra frame, resulting in a reduced number of MTU required to transmit it.

A sender SHOULD avoid adding excessive safety margins to the sending bucket size. A sender MAY add bucket size margins if it has knowledge of internal transmission timing variations, or if it knows about packet handling outside the sender itself that will affect the effective bucket size (as seen from a receiver) that is otherwise not reflected in the conveyed bucket size figure.

4.4.2. Receiver Specified Token Bucket

With the semantics specified in this document, the intended media receiver gets to provide token bucket parameters that specifies how the sender should behave. The traffic received by the receiver (or intermediate nodes) may no longer conform to the token bucket due to jitter introduced by the network path between the sender and the receiver. This document assumes that the receiver will have receiver buffers for de-jittering that are significantly larger than the token bucket parameters. This due to that a media unit like a video frame may be transmitted over time using more data than the bucket depth provides and instead spread it in time, transmitting each fragment when the bucket is refilled enough for the next fragment to be sent.

A receiver's input to the sender's bit-rate limitation should be based on known limitations such as the networks, decoding capabilities etc. The bucket depth will control how bursty the traffic can be beyond the long term average specified by the bucket refill rate.

4.4.3. Bucket Adjustment in Middle Nodes

When there are media aware middle nodes on the media path between the sender and receiver, those middle nodes may have to or want to apply similar considerations as the original media sender and receiver. If those middle nodes are aware of SDP and the new bandwidth attribute from this specification, and have in-path SDP adjustment capabilities, they could benefit from modifying the values to better fit the actually available end-to-end media path capabilities. For example, an RTP Media Translator can express what it actually is going to deliver of the far end-point's media to an end-point instead of that far end-point's provided values.

4.4.4. Network Policing

As the token bucket specified for the semantics in this document is based on what the sender emit into the network, a policer should have some margin allowing for network introduced jitter. The amount will of course be dependent on the policer's location in relation to the media sender.

4.4.5. Utilizing Network Feedback

If the media uses RTP and when the media has been transmitted for some time, the sender should have received a fair amount of RTCP receiver reports from the receiver. The sender can from RTCP estimate the observed network jitter at the receiver and may be able to dynamically adjust the sender behavior such that the aggregate of the sender behavior and the reported network jitter are fulfilling the senders token bucket profile.

4.5. SDP Examples for Point-to-point Sessions

These SDP examples show how the new bandwidth attribute can be used. The benefits, compared to the legacy bandwidth attribute, are also highlighted.

The SDP examples included below are intentionally not complete. Only the parts that are relevant for this description are included.

4.5.1. Symmetric Fixed-rate Codecs

This example shows the SDP offer for several fixed-rate codecs, mu-law and A-law PCM, G.726 and G.728.

m=audio 49200 RTP/AVP 8 0 96 18
b=AS:80
a=rtpmap:96 G726-32/8000/1
a=bw:sendrecv pt=0,8 SMT:tb=80000:1000
a=bw:sendrecv pt=96 SMT:tb=48000:1000
a=bw:sendrecv pt=18 SMT:tb=24000:1000
a=ptime:20
a=maxptime:20

The new bandwidth attribute offers the possibility to negotiate the bandwidth individually for each codec. If the answerer removes a codec when creating the answer then it is still known how much bandwidth the other codecs will use. This means that the ambiguities listed in Section 3.2.1 can be avoided.

4.5.2. Symmetric Rate-Adaptive Codec

This example shows the SDP negotiation for offering using the AMR codec, AMR [AMR].

m=audio 49200 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=bw:sendrecv pt=97 SMT:tb=28800:200
a=bw:sendrecv pt=97 SLTR:tb=22400:200
a=ptime:20
a=maxptime:100
m=audio 49100 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=bw:sendrecv pt=97 SMT:tb=28800:200
a=bw:sendrecv pt=97 SLT:tb=22400:200
a=ptime:20
a=maxptime:100

Since the new bandwidth attribute offers a possibility to negotiate both the maximum and the at least required bandwidth, it is possible for both the other end-point and any resource allocation function to know how the end-points will adapt when congestion is detected.

4.5.3. Symmetric Several Rate-Adaptive Codecs

This example shows how the new bandwidth attribute, ‘a=bw’, can be used to negotiate the maximum and the least required bandwidths for multiple rate-adaptive codecs, in this case for AMR and AMR-WB, AMR-WB [AMR-WB]. For AMR, the highest codec mode is 12.2 kbps, giving a maximum bandwidth of 28.8 kbps, and the at least required mode is selected to be 5.9 kbps, giving a least required bandwidth of 22.4 kbps. For AMR-WB, the highest codec mode is 23.85 kbps, giving a maximum bandwidth of 40.4 kbps, and the least required mode is 8.85 kbps, giving a least required bandwidth of 25.6 kbps.

m=audio 49200 RTP/AVP 96 97
b=AS:41
a=rtpmap:96 AMR-WB/16000/1
a=fmtp:96 mode-change-capability=2; max-red=80
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=bw:sendrecv pt=96 SMT:tb=40400: 350
a=bw:sendrecv pt=96 SLTR:tb=25600:350
a=bw:sendrecv pt=97 SMT:tb=28800:200
a=bw:sendrecv pt=97 SLTR:tb=22400:200
a=ptime:20
a=maxptime:100
m=audio 49100 RTP/AVP 97
b=AS:29
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-change-capability=2; max-red=80
a=bw:sendrecv pt=97 SMT:tb=28800:200
a=bw:sendrecv pt=97 SLT:tb=22400:200
a=ptime:20
a=maxptime:100

In this case, it is clear when the answer is received that the bandwidth needed for AMR applies to both directions. There is no need for a send offer/answer negotiation to clarify that the bandwidth applies also to end-point A’s receiving direction. Thereby, the issues listed in Section 3.2.3 are resolved.

4.5.4. Asymmetric Session

The following SDP example shows how to use the new bandwidth attribute to offer asymmetric streams. In this case, the end-point offers to send H.264 video with 1 Mbps while it is capable of receiving H.264 with up to 3 Mbps. Note that this example does not make use of the codec-specific H.264 level asymmetry signaling as defined in RFC 6184 [RFC6184].

m=video 50324 RTP/AVP 96
b=AS:3000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c016
a=bw:send pt=96 SMT:tb=1000000:8192
a=bw:recv pt=96 SMT:tb=3000000:16384

It should be clear from this example that the new bandwidth attribute is useful when negotiating asymmetric sessions since it offers the possibility to define the token bucket parameters for both sending and receiving directions separately.

4.5.5. Session with Retransmission

This SDP example shows how the new bandwidth attribute, ‘a=bw’, can be used for negotiating the bandwidth when the RTP Retransmission Payload Format RFC 4588 [RFC4588] is used.

m=video 49170 RTP/AVPF 96 97
b=AS:500
a=rtpmap:96 MP4V-ES/90000
a=rtcp-fb:96 nack
a=fmtp:96 profile-level-id=8; config=01010000012000884006682C2090A21F
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96;rtx-time=3000
a=bw:send pt=* AMT:tb=500000:4096
a=bw:recv pt=* AMT:tb=500000:8192

In this case, it is beneficial to use the Aggregate Maximum Token bucket semantics to allow the end-points to adapt the bandwidths used for the original stream and for the retransmission stream during the session. The end-point can send more original packets when the packet loss rate is low. When the packet loss rate is high then the end-point can use less bandwidth for the original packets and instead allow for more retransmissions. It would also be possible to specify separate limits for the original stream and the retransmission stream by using a separate set of ‘a=bw’-lines for pt=96 and pt=97.

4.6. SDP Examples with Sessions with Multiple Streams

4.6.1. Multiple Streams

The example below is based on the use case described in Section 3.3.1. Only the negotiation for video is shown here.

m=video 49300 RTP/AVP 96
b=AS:3000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=bw:send pt=* SMT:tb=1000000:1000
a=bw:recv pt=* SMT:tb=1000000:2000
a=bw:send pt=* AMT:tb=1000000:1000
a=bw:recv pt=* AMT:tb=3000000:6000
a=max-recv-ssrc:* 4

With the new bandwidth attribute, it is possible to define the bandwidth for each received stream independently from each other. In this case, the SDP shows that the end-point is prepared to send maximum 1 Mbps, and that the end-point is prepared to receive maximum 1 Mbps per stream. The SDP also shows that the end-point is prepared to receive maximum 3 Mbps, aggregated for the up to four streams in the receiving direction. Note that this implies that to receive more than three streams, each stream’s bandwidth must be reduced to comply with the maximum aggregate.

4.6.2. Declarative Example with Stream Asymmetry

This example shows a declarative usage of the new bandwidth attribute.

m=video 50324 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H263-2000/90000
a=rtpmap:98 MP4V-ES/90000
a=max-recv-ssrc:96 2
a=max-recv-ssrc:* 5
a=bw:send pt=* SMT:tb=1200000:16384
a=bw:recv pt=96 SMT:tb=1500000:16384
a=bw:recv pt=97,98 SMT:tb=2500000:16384
a=bw:recv pt=* AMT:tb=8000000:65535

In the above example, the outgoing single stream is limited to bucket rate of 1.2 Mbps and bucket size of 16384 bytes. The up to 5 incoming streams can in total use maximum 8 Mbps bucket rate and with a bucket size of 65535 bytes. However, the individual streams maximum rate is depending on payload type. Payload type 96 (H.264) is limited to 1.5 Mbps with a bucket size of 16384 bytes, while the Payload types 97 (H.263) and 98 (MPEG-4) may use up top 2.5 Mbps with a bucket size of 16384 bytes.

4.7. Interoperability Issues

The proposed new bandwidth attribute obviously has connections to the bandwidth modifier ‘b=AS’ and the attributes defined for directionality (’a=sendrecv’, ’a=sendonly’, ’a=recvonly’ and ’a=inactive’) defined in RFC 4566 [RFC4566]. It is therefore important to properly analyze these relationships so that any interoperability issues can be avoided.

4.7.1. Interoperability with Existing Bandwidth Attribute

If the SDP includes both the ‘b=AS’ bandwidth modifier and ‘a=bw’ bandwidth attribute then alignment may be necessary to avoid confusion. This section gives some guidelines for such alignment. It may however happen that some usage needs other alignments than what is discussed below. If so, then those alignments need to be considered on a case-by-case. The discussion below should therefore not be seen as an exhaustive list.

In general, the bandwidths offered with ‘b=AS’ and ‘a=bw’ should be aligned for the direction that applies for the ‘b=AS’ bandwidth modifier. For ‘sendrecv’ and ‘recvonly’ sessions, ‘b=AS’ indicates the bandwidth for the receiving direction. The b=AS is closest in interpretation to the AMT semantic. If the stream maximum semantic (SMT) is used then the sum of the bandwidths in the receive direction may exceed the ‘b=AS’ bandwidth but the AMT should not exceed the b=AS value.

If the session includes multiple streams, but if not all of the streams will be active simultaneously, then ‘b=AS’ should indicate the maximum bandwidth that will be used for the combinations of streams that are active simultaneously, the same way AMT could be used in such a session. This also means that the bandwidths offered with ‘a=bw’ are accumulated for the combination of streams that are active, and this aggregated bandwidth should not exceed the bandwidth defined with ‘b=AS’. Note however that it is possible and feasible to specify an aggregate that is less than the sum of the maximum bandwidth for the maximum amount of available streams. It may be possible to use the maximum number of active streams with a lower bandwidth than the maximum, or it may be possible to reduce the active number of streams to stay within the bandwidth limit.

The SDP below gives an example of how this is done. In this example, the intention is to use either the payload type pair (96, 97) or the payload type pair (98, 99). The intention is however to, for example, not pair payload types 96 and 98.

m=video 50000 RTP/AVP 96 97 98 99 100
b=AS:1000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c00c
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42c00d
a=rtpmap:99 H264/90000
a=fmtp:99 profile-level-id=42c00c
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c00c
a=bw:sendrecv 96 SMT:tb=700000:4000
a=bw:recv 97 SMT:tb=300000:3000
a=bw:sendrecv 98 SMT:tb=500000:3000
a=bw:recv 99 SMT:tb=200000:2000
a=bw:send 100 SMT:tb=300000:1400
a=sendrecv

This session is bi-directional, as shown with the ‘a=sendrecv’ attribute. The bandwidth offered with ‘b=AS’ therefore applies to the receive direction. The ‘b=AS’ is then set based on the combination of streams that gives the highest bandwidth, i.e. the payload type pair (96, 97).

This means that the bandwidths offered with ‘a=bw’ are aligned with the bandwidth offered with ‘b=AS’.

If, on the other hand, the intention would be to use another combination of payload types, for example (96, 98), then this would add up to 1200 kbps, which would mean that the stream bandwidths would not be aligned with the ‘b=AS’ bandwidth.

This shows that bandwidths for ‘sendrecv’ and ‘recv’ directions are added together when determining the bandwidth for the combined streams.

If the offer is “complex”, for example offering multiple streams for both speech and video, possibly with many different codecs, (and therefore uses ‘a=bw’ together with the ‘b=AS’ bandwidth modifier) and if the answerer wants to change this into a “simple” session (e.g. plain simple VoIP with only one RTP payload type for codec X) then the answerer may remove the ‘a=bw’ lines when creating the answer. It may therefore happen that the answer includes only ‘b=AS’ bandwidth modifier in the SDP answer. However, if the offer does not include any ‘b=AS’ line then it is recommended to maintain the ‘a=bw’ lines also in the answer, even for “simple” sessions. This means that the offerer cannot rely on the existence of ‘a=bw’ in the answer.

4.7.2. Interoperability with Existing Directional Attribute

Since the ‘a=bw’ attribute includes a parameter for directionality it is important to clarify the relationship to the already existing directional attributes in SDP (‘sendrecv’, ‘sendonly’, ‘recvonly’ and ‘inactive’). In general, one can say that:

At session setup time, it is therefore acceptable to define streams with other directionality than what is shown with the SDP attribute for directionality. However, when media is transmitted, then the SDP attribute for directionality has to be followed. An example of this is shown below.

m=video 5000 RTP/AVP 96 97 98
b=AS:1000
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c00c
a=bw:sendrecv 96 SMT:tb=700000:4000
a=bw:recv 97 SMT:tb=200000:3000
a=bw:send 97 SMT:tb=300000:1400
a=recvonly

This means that three bandwidths are defined at session setup:

However, since ‘a=recvonly’ is defined then this means that the end-point is, at the session setup time, only willing to receive media even though the SDP contains bandwidth declarations also for the sending direction. This allows for setting up streams that are effectively inactive in one or both directions from the beginning of the session and then enabling them later in the session.

This can be compared with the case when one defines one or more codecs, even if the session starts up as ‘inactive’.

5. Rules and Recommendations for Extensions

The a=bw attribute is defined to be extensible and this section discusses the extension points that are available.

5.1. Directionality

The current specification defines send, recv and sendrecv. In case some new directionality behavior is needed that doesn't match the existing, a new one could be defined. This should be avoided unless a clear need for a new directionality is found.

5.2. Scope

It is expected that there will be a need to extend the bandwidth scope. This document only defines two scope types, session and payload type, and there is very likely other desirable scopes that will be defined in the future. Possible examples of scopes are those applying to a specific SSRC, a particular end-point, or a class of end-points.

5.3. Semantics

This is the extension point that is expected to be frequently used in the future. A major proliferation of semantics is not good for interoperability, but it is likely that bandwidth shortcomings or missing functionalities will be discovered in the future. Thus defining new semantics gives maximum flexibility to define the meaning of the provided value(s), the format of the values and how to interpret the directionality and scope values.

5.4. Values

This document only defines token buckets as values. In case fewer or more parameters are needed to express a particular semantics, new value formats can be defined. Defining new value formats should be done with some consideration of generality and reuse so that future semantics can also use the new value format, with the target to try to minimize the number of different formats.

6. Open Issues

This document contain a few open issues:

  1. Multicast behavior needs to be specified.
  2. It is an open question to decide if and how to handle the RTCP bandwidth negotiation, e.g. corresponding to b=RS and b=RR.
  3. It is an open question to develop semantics for the transport independent bandwidth negotiation, e.g. corresponding to b=TIAS.
  4. It is an open question what rules and recommendations there should be for extensions to this memo.

7. IANA Considerations

Following the guidelines in RFC 4566 [RFC4566] and in RFC 3550 [RFC3550], the IANA is requested to register:

  1. The bw attribute as defined in Section 4.1.
  2. The bw attribute directionality registry rules
  3. The bw attribute scope registry rules.
  4. The bw attribute semantics registry rules.
  5. The bw attribute values registry rules.

This section will be filled out in future versions of this document.

8. Security Considerations

Excessive bandwidth allocation can consume all the resources, much more than what the end-point(s) intend to use. So, if a session allocates an unnecessarily high bandwidth then this will likely mean that some other users cannot be admitted, or that they cannot get QoS guaranteed resources that they requested and have to use best effort. It can also happen that the session itself is rejected, if the end-points try to allocate resources that are not available. Allocating too little bandwidth is likely to negatively impact the perceived media quality or entirely prevent reception of requested media.

The above shows that the bandwidth attribute is a potential vector for attacks both from malicious end-points or third party attackers that attempts to modify the attribute to impact the system to allocate unnecessary resources, deny end-points service, reduce quality for end-points or incur cost on users.

To prevent third party attacks the signalling should be source authenticated and integrity protected to prevent any on or off-path attacker from injecting or modifying the SDP. Malicious end-points can't as easily be protected against using crypto, instead behavior analysis and preventing such a malicious end-point from having serious impact on other end-points are needed.

9. Acknowledgements

10. References

10.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2212] Shenker, S., Partridge, C. and R. Guerin, "Specification of Guaranteed Quality of Service", RFC 2212, September 1997.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.
[RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, July 2003.
[RFC3890] Westerlund, M., "A Transport Independent Bandwidth Modifier for the Session Description Protocol (SDP)", RFC 3890, September 2004.
[RFC4566] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.

10.2. Informative References

, ", ", ", ", ", "
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V. and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, July 2006.
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T. and R. Jesup, "RTP Payload Format for H.264 Video", RFC 6184, May 2011.
[I-D.westerlund-avtcore-max-ssrc] Westerlund, M, Burman, B and F Jansson, "Multiple Synchronization sources (SSRC) in RTP Session Signaling", Internet-Draft draft-westerlund-avtcore-max-ssrc-00, October 2011.
[I-D.westerlund-avtcore-multiplex-architecture] Westerlund, M, Burman, B and C Perkins, "RTP Multiplexing Architecture", Internet-Draft draft-westerlund-avtcore-multiplex-architecture-00, October 2011.
[G.711] ITU-T Recommendation G.711, "Pulse Code Modulation (PCM) of Voice Frequencies".", November 1988.
[G.726] ITU-T Recommendation G.726, "40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)".", December 1990.
[G.729] ITU-T Recommendation G.729, "Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)".", March 1996.
[AMR] 3GPP TS 26.090, "Adaptive Multi-Rate (AMR) speech codec; Transcoding functions".", June 1999.
[AMR-WB] 3GPP TS 26.190, "Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions".", April 2001.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for generic audiovisual services".", May 2003.

Authors' Addresses

Tomas Frankkila Ericsson Laboratoriegrand 11 SE-971 28 Lulea, Sweden Phone: +46 10 714 30 20 EMail: tomas.frankkila@ericsson.com
Magnus Westerlund Ericsson Farogatan 6 SE-164 80 Kista, Sweden Phone: +46 10 714 82 87 EMail: magnus.westerlund@ericsson.com
Bo Burman Ericsson Farogatan 6 SE-164 80 Kista, Sweden Phone: +46 10 714 13 11 EMail: bo.burman@ericsson.com