AVTCORE WG | M. Westerlund |
Internet-Draft | Ericsson |
Updates: 3550 (if approved) | C. . Perkins |
Intended status: Standards Track | University of Glasgow |
Expires: April 09, 2013 | J. Lennox |
Vidyo | |
October 8, 2012 |
Multiple Media Types in an RTP Session
draft-ietf-avtcore-multi-media-rtp-session-00
This document specifies how an RTP session can contain media streams with media from multiple media types such as audio, video, and text. This has been restricted by the RTP Specification, and thus this document updates RFC 3550 to enable this behavior for applications that satisfy the applicability for using multiple media types in a single RTP session.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 09, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
When the Real-time Transport Protocol (RTP) [RFC3550] was designed, close to 20 years ago, IP networks were very different compared to the ones in 2012 when this is written. The almost ubiquitous deployment of Network Address Translators (NAT) and Firewalls has increased the cost and likely-hood of communication failure when using many different transport flows. Thus there exists a pressure to reduce the number of concurrent transport flows.
RTP [RFC3550] as defined recommends against having multiple media types, like audio and video, in the same RTP session. The motivation for this is dependent on particular usage or dependencies on lower layer Quality of Service (QoS). When these aren't present, there are no strong RTP reasons for not allowing multiple media types in one RTP session. However, the Session Description Protocol (SDP) [RFC4566], as one of the dominant signalling method for establishing RTP session, has enforced this rule, by not allowing multiple media types for a given receiver destination or set of ICE candidates, which is the most common method to determine which RTP session the packets are intended for.
The fact that these limitations have been in place for so long a time, in addition to RFC 3550 being written without fully considering multiple media types in an RTP session, does result in a number of considerations being needed. This document provides such considerations regarding applicability as well as functionality, including normative specification of behavior.
First, some basic definitions are provided. This is followed by a background that discusses the motivation in more detail. A overview of the solution of how to provide multiple media types in one RTP session is then presented. Next is the formal applicability this specification have followed by the normative specification. This is followed by a discussion how some RTP/RTCP Extensions should function in the case of multiple media types in one RTP session. A specification of the requirements on signalling from this specification and a look how this is realized in SDP using Bundle [I-D.ietf-mmusic-sdp-bundle-negotiation]. The document ends with the security considerations.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The following terms are used with supplied definitions:
This section discusses in more detail the main motivations why allowing multiple media types in the same RTP session is suitable.
The existence of NATs and Firewalls at almost all Internet access has had implications on protocols like RTP that were designed to use multiple transport flows. First of all, the NAT/FW traversal solution one uses needs to ensure that all these transport flows are established. This has three different impacts:
Using fewer transport flows reduces the risk of communication failure, improved establishment behavior and less load on NAT and Firewalls.
Many RTP-using applications don't utilize any network level Quality of Service functions. Nor do they expect or desire any separation in network treatment of its media packets, independent of whether they are audio, video or text. When an application has no such desire, it doesn't need to provide a transport flow structure that simplifies flow based QoS.
For applications that don't desire any type of different treatment, neither on the transport level nor in RTP or RTCP reporting, using the same RTP session for both media types appears a reasonable choice. The architecture should be neutral to media type, rather look at what it provides based on the application users choice. Therefore this bias should be removed and let the application designer make the choice if they need multiple RTP sessions or not based on other aspects.
The goal of the solution is to enable having one or more RTP sessions, where each RTP session may contain two or more media types. This includes having multiple RTP sessions containing a given media type, for example having three sessions containing video and audio.
The solution is quite straightforward. The first step is to override the SHOULD and SHOULD NOT language of the RTP specification [RFC3550]. This is done by appropriate exception clauses given that this specification is followed.
Within an RTP session where multiple media types have been configured for use, a SSRC may only send one media type during its lifetime. Different SSRCs must be used for the different media sources, the same way multiple media sources of the same media type already have to do. The payload type will inform a receiver which media type the SSRC is being used for. Thus the payload type must be unique across all of the payload configurations independent of media type that may be used in the RTP session.
Some few extra considerations within the RTP sessions also needs to be considered. RTCP bandwidth and regular reporting suppression (AVPF and SAVPF) should be considered to be configured. Certain payload types like FEC also need additional rules.
The final important part of the solution to this is to use signalling and ensure that agreement on using multiple media types in an RTP session exists, and how that then is configured. Thus document documents some existing requirements, while an external reference defines how this is accomplished in SDP.
This specification has limited applicability and any one intending to use must ensure that their application and usage meets the below criteria for usage.
Before choosing to use this specification, an application implementer needs to ensure that they don't have a need for different RTP sessions between the media types for some reason. The main rule is that if one expects to have equal treatment of all media packets, then this specification might be suitable. The equal treatment include anything from network level up to RTCP reporting and feedback. The document Guidance on RTP Multiplexing Architecture [I-D.westerlund-avtcore-multiplex-architecture] gives more detailed guidance on aspects to consider when choosing how to use RTP and specifically sessions. RTP-using applications that need or would prefer multiple RTP sessions, but do not require the functionalities or behaviors that multiple transport flows give, can consider using Multiple RTP Sessions on a Single Lower-Layer Transport [I-D.westerlund-avtcore-transport-multiplexing].
Usage of this specification is not compatible with anyone following RFC 3550 and intending to have different RTP sessions for each media type. Therefore there must be mutual agreement to use multiple media types in one RTP session by all participants within an RTP session. This agreement must in most cases be determined using signalling.
This requirement can be a problem for signalling solutions that can't negotiate with all participants. For declarative signalling solutions, mandating that the session is using multiple media types in one RTP session can be a way of attempting to ensure that all participants in the RTP session follow the requirement. However, for signalling solutions that lack methods for enforcing that a receiver supports a specific feature, this can still cause issues.
In multiparty communication scenarios it is important to separate two different cases. One case is where the RTP session contains multiple participants in a common RTP session. This occurs for example in Any Source Multicast (ASM) and Transport Translator topologies as defined in RTP Topologies [RFC5117]. It may also occur in some implementations of RTP mixers that share the same SSRC/CSRC space across all participants. The second case is when the RTP session is terminated in a middlebox and the other participants sources are projected or switched into each RTP session and rewritten on RTP header level including SSRC mappings.
For the first case, with a common RTP session or at least shared SSRC/CSRC values, all participants in multiparty communication are required to support multiple media types in an RTP session. An participant using two or more RTP sessions towards a multiparty session can't be collapsed into a single session with multiple media types. The reason is that in case of multiple RTP sessions, the same SSRC value can be use in both RTP sessions without any issues, but when collapsed to a single session there is an SSRC collision. In addition some collisions can't be represented in the multiple separate RTP sessions. For example, in a session with audio and video, an SSRC value used for video will not show up in the Audio RTP session at the participant using multiple RTP sessions, and thus not trigger any collision handling. Thus any application using this type of RTP session structure must have a homogeneous support for multiple media types in one RTP session, or be forced to insert a translator node between that participant and the rest of the RTP session.
For the second case of separate RTP sessions for each multiparty participant and a central node it is possible to have a mix of single RTP session users and multiple RTP session users as long as one is willing to remap the SSRCs used by a participant with multiple RTP sessions into non-used values in the single RTP session SSRC space for each of the participants using a single RTP session with multiple media types. It can be noted that this type of implementation is required to understand any type of RTP/RTCP extension being used in the RTP sessions to correctly be able to translate them between the RTP sessions.
An RTP session with multiple media types in it have only a single 7-bit Payload Type range for all its payload types. Within the 128 available values, only 96 or less if "Multiplexing RTP Data and Control Packets on a Single Port" [RFC5761] is used, all the different RTP payload configurations for all the media types must fit. For most applications this will not be a real problem, but the limitation exists and could be encountered.
If network level differentiation of the media streams of different media types are desired using this specification can cause severe limitations. All media streams in an RTP session, independent of the media type, will be sent over the same underlying transport flow. Any flow-based Quality of Service (QoS) mechanism will be unable to provide differentiated treatment between different media types, e.g. to prioritize audio over video. If that is desired, separate RTP sessions over different underlying transport flows needs to be used. Any marking-based QoS scheme like DiffServ is not affected unless a network ingress marks based on flows.
There exist some RTP and RTCP extensions that rely on the existence of multiple RTP sessions. If the goal of using an RTP session with multiple media types is to have only a single RTP session, then these extensions can't be used. If one has no need to have different RTP sessions for the media types but is willing to have multiple RTP sessions, one for the main media transmission and one for the extension, they can be used. It should be noted that this assumes that it is possible to get the extension working when the related RTP session contains multiple media types.
Identified RTP/RTCP extensions that require multiple RTP Sessions are:
This section defines what needs to be done or avoided to make an RTP session with multiple media types function without issues.
Section 5.2 of "RTP: A Transport Protocol for Real-Time Applications" [RFC3550] states:
This specification changes both of these sentences. The first sentence is changed to:
The second sentence is changed to:
RFC-Editor Note: Please replace RFCXXXX with the RFC number of this specification when assigned.
TBD: Discussion of the motivations in Section 5.2 of the RTP Specification [RFC3550].
A SSRC in the RTP session MUST only send one media type (audio, video, text etc.) during the SSRC's lifetime. The main motivation is that a given SSRC has its own RTP timestamp and sequence number spaces. The same way that you can't send two streams of encoded audio on the same SSRC, you can't send one audio and one video encoding on the same SSRC. Each media encoding when made into an RTP stream needs to have the sole control over the sequence number and timestamp space. If not, one would not be able to detect packet loss for that particular stream. Nor can one easily determine which clock rate a particular SSRCs timestamp shall increase with.
Most Payload Types have a native media type, like an audio codec is natural belonging to the audio media type. However, there exist a number of RTP payload types that don't have a native media type. For example, transport robustification mechanisms like RTP Retransmission [RFC4588] and Generic FEC [RFC5109] inherit their media type from what they protect. RTP Retransmission is explicitly bound to the payload type it is protecting, and thus will inherit it. However Generic FEC is a excellent example of an RTP payload type that has no natural media type. The media type for what it protects is not relevant as it is the recovered RTP packets that have a particular media type, and thus Generic FEC is best categorized as an application media type.
The above discussion is relevant to what limitations exist for RTP payload type usage within an RTP session that has multiple media types. When it comes to Generic FEC, is an configured payload type allowed to be used to protect both audio SSRCs and Video SSRCs? Note a particular SSRC carrying Generic FEC will clearly only protect a specific SSRC and thus that instance is bound to the SSRC's media type. For this specific case, it appears possible to have one be applicable to both. However, in cases when the signalling is setup to enable fallback to using separate RTP sessions, then using a different media type, e.g. application, than the media being protected can create issues.
TBD: What recommendations are needed here?
All SSRCs in an RTP session fall under the same set of RTCP configuration parameters, such as the RR and RS bandwidth and the trr-int parameter if AVPF or SAVPF is used. This means that at least the regular reporting period by, and on, a source will be equal, independent of the media type for that source. This should in most cases not be an issue, but it may result in more frequent reporting than is considered necessary for a particular media type or set of media sources. Having multiple media types in one RTP session also results in more SSRCs being present in this RTP session. This increasing the amount of cross reporting between the SSRCs. From an RTCP perspective, two RTP sessions with half the number of SSRCs in each will be slightly more efficient. If someone needs either the higher efficiency due to the lesser number of SSRCs or the fact that one can't tailor RTCP usage per media type, they need to use independent RTP sessions.
When it comes to handling multiple SSRCs in an RTP session there is a clarification under discussion in Real-Time Transport Protocol (RTP) Considerations for Multi-Stream Endpoints [I-D.lennox-avtcore-rtp-multi-stream]. When it comes to configuring RTCP the need for regular periodic reporting needs to be weighted against any feedback or control messages being sent. The applications using AVPF or SAVPF are RECOMMENDED to consider setting trr-int parameter to a value suitable for the applications needs, thus potentially reducing the need for regular reporting and thus releasing more bandwidth for use for feedback or control.
Another aspect of an RTP session with multiple media types is that the used RTCP packets, RTCP Feedback Messages, or RTCP XR metrics used may not be applicable to all media types. Instead all RTP/RTCP endpoints need to correlate the media type of the SSRC being referenced in an messages/packet and only use those that apply to that particular SSRC and its media type. Signalling solutions may have shortcomings when it comes to indicate that a particular set of RTCP reports or feedback messages only apply to a particular media type within an RTP session.
This section discusses the impact on some RTP/RTCP extensions due to usage of multiple media types in on RTP session. Only extensions where something worth noting has been included.
SSRC-multiplexed RTP retransmission [RFC4588] is actually very straightforward. Each retransmission RTP payload type is explicitly connected to an associated payload type. If retransmission is only to be used with a subset of all payload types, this is not a problem, as it will be evident from the retransmission payload types which payload types that have retransmission enabled for them.
Session-multiplexed RTP retransmission is also possible to use where an retransmission session contains the retransmissions of the associated payload types in the source RTP session. The only difference to previously is that the source RTP session is one which contains multiple media types. Thus it is even more likely that only a subset of the source RTP session's payload types and SSRCs are actually retransmitted.
Open Issue: When using SDP to signal retransmission for one RTP session with multiple media types and one RTP session for the retransmission data will cause a situation where one will have multiple m= lines grouped using FID and the ones belonging to respective RTP session being grouped using BUNDLE. This usage may contradict both the FID semantics [RFC5888] and an assumption in the RTP retransmission specification [RFC4588].
TBW:
The Signalling requirements
Establishing an RTP session with multiple media types requires signalling. This signalling needs to fulfill the following requirements:
The signalling of multiple media types in one RTP session in SDP is specified in "Multiplexing Negotiation Using Session Description Protocol (SDP) Port Numbers" [I-D.ietf-mmusic-sdp-bundle-negotiation].
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an RFC.
Having an RTP session with multiple media types doesn't change the methods for securing a particular RTP session. One possible difference is that the different media have often had different security requirements. When combining multiple media types in one session, their security requirements must also be combined by selecting the most demanding for each property. Thus having multiple media types may result in increased overhead for security for some media types to ensure that all requirements are meet.
Otherwise, the recommendations for how to configure and RTP session do not add any additional requirements compared to normal RTP, except for the need to be able to ensure that the participants are aware that it is a multiple media type session. If not that is ensured it can cause issues in the RTP session for both the unaware and the aware one. Similar issues can also be produced in an normal RTP session by creating configurations for different end-points that doesn't match each other.
The authors would like to thank Christer Holmberg for the feedback on the document.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC3550] | Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. |
[I-D.ietf-mmusic-sdp-bundle-negotiation] | Holmberg, C and H Alvestrand, "Multiplexing Negotiation Using Session Description Protocol (SDP) Port Numbers", Internet-Draft draft-ietf-mmusic-sdp-bundle-negotiation-00, February 2012. |
[I-D.lennox-avtcore-rtp-multi-stream] | Lennox, J and M Westerlund, "Real-Time Transport Protocol (RTP) Considerations for Endpoints Sending Multiple Media Streams", Internet-Draft draft-lennox-avtcore-rtp-multi-stream-00, July 2012. |