Network Working Group | M. Westerlund |
Internet-Draft | Ericsson |
Intended status: Informational | C. . Perkins |
Expires: April 23, 2013 | University of Glasgow |
October 22, 2012 |
Options for Securing RTP Sessions
draft-ietf-avtcore-rtp-security-options-01
The Real-time Transport Protocol (RTP) is used in a large number of different application domains and environments. This hetrogeneity implies that different security mechanisms are needed to provide services such as confidentiality, integrity and source authentication of RTP/RTCP packets suitable for the various environments. The range of solutions makes it difficult for RTP-based application developers to pick the most suitable mechanism. This document provides an overview of a number of security solutions for RTP, and gives guidance for developers on how to choose the appropriate security mechanism.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 23, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Real-time Transport Protocol (RTP) [RFC3550] is widely used in a large variety of multimedia applications, including Voice over IP (VoIP), centralized multimedia conferencing, sensor data transport, and Internet television (IPTV) services. These applications can range from point-to-point phone calls, through centralised group teleconferences, to large-scale television distribution services. The types of media can vary significantly, as can the signalling methods used to establish the RTP sessions.
This multi-dimensional heterogeneity has so far prevented development of a single security solution that meets the needs of the different applications. Instead significant number of different solutions have been developed to meet different sets of security goals. This makes it difficult for application developers to know what solutions exist, and whether their properties are appropriate. This memo gives an overview of the available RTP solutions, and provides guidance on their applicability for different application domains. The guidance provided is not exhaustive, and this memo does not provide normative recommendations.
It is important that application developers consider the security goals and requirements for their application. The IETF considers it important that protocols implement, and makes available to the user, secure modes of operation [RFC3365]. Because of the heterogeneity of RTP applications and use cases, however, a single security solution cannot be mandated. Instead, application developers need to select mechanisms that provide appropriate security for their environment. It is strongly encouraged that common mechanisms are used by related applications in common environments. The IETF publishes guidelines for specific classes of applications, so it worth searching for such guidelines.
The remainder of this document is structured as follows. Section 2 provides additional background. Section 3 outlines the available security mechanisms at the time of this writing, and lists their key security properties and constraints. That is followed by guidelines and important aspects to consider when securing an RTP application in Section 4. Finally, we give some examples of application domains where guidelines for security exist in Section 5.
RTP can be used in a wide variety of topologies, and combinations of topologies, due to it's support for unicast, multicast groups, and broadcast topologies, and the existence of different types of RTP middleboxes. In the following we review the different topologies supported by RTP to understand their implications for the security properties and trust relations that can exist in RTP sessions.
The most basic use case is two directly connected end-points, shown in Figure 1, where A has established an RTP session with B. In this case the RTP security is primarily about ensuring that any third party can't compromise the confidentiality and integrity of the media communication. This requires confidentiality protection of the RTP session, integrity protection of the RTP/RTCP packets, and source authentication of all the packets to ensure no man-in-the-middle attack is taking place.
The source authentication can also be tied to a user or an end-points verifiable identity to ensure that the peer knows who they are communicating with. Here the combination of the security protocol protecting the RTP session and its RTP and RTCP traffic and the key-management protocol becomes important in which security statements one can do.
+---+ +---+ | A |<------->| B | +---+ +---+
Figure 1: Point to Point Topology
An RTP mixer is a an RTP session level middlebox that one can build an multi-party RTP based conference around. The RTP mixer might actually perform media mixing, like mixing audio or compositing video images into a new media stream being sent from the mixer to a given participant; or it might provide a conceptual stream, for example the video of the current active speaker. From a security point of view, the important featurs of an RTP mixer is that it generates a new media stream, and has its own source identifier, and does not simply forward the original media.
An RTP session using a mixer might have a topology like that in Figure 2. In this examples, participants A-D each send unicast RTP traffic between themselves and the RTP mixer, and receive a RTP stream from the mixer, comprising a mixture of the streams from the other participants.
+---+ +------------+ +---+ | A |<---->| |<---->| B | +---+ | | +---+ | Mixer | +---+ | | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+
Figure 2: Example RTP Mixer topology
A consequence of an RTP mixer having its own source identifier, and acting as an active participant towards the other end-points, is that the RTP mixer needs to be a trusted device that is part of the security context(s) established. The RTP mixer can also become a security enforcing entity. For example, a common approach to secure the topology in Figure 2 is to establish a security context between the mixer and each participant independently, and have the mixer source authenticate each peer. The mixer then ensures that one participant cannot impersonsate another.
RTP translators are middleboxes that provide various levels of in-network media translation and transcoding. Their security properties vary widely, depending on which type of operations they attempt to perform. We identify three different categories of RTP translator: transport translators, gateways, and media transcoders. We discuss each in turn.
A transport translator [RFC5117] operates on a level below RTP and RTCP. It relays the RTP/RTCP traffic from one end-point to one or more other addresses. This can be done based only on IP addresses and transport protocol ports, with each receive port on the translator can have a very basic list of where to forward traffic. Transport translators should also implement ingress filtering to prevent random traffic from being forwarded that isn't coming from a participant in the conference.
Figure 3 shows an example transport translator, where traffic from any one of the four participants will be forwarded to the other three participants unchanged. The resulting topology is very similar to Any source Multicast (ASM) session (as discussed in Section 2.4), but implemented at the application layer.
+---+ +------------+ +---+ | A |<---->| |<---->| B | +---+ | Relay | +---+ | Translator | +---+ | | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+
Figure 3: RTP relay translator topology
A transport translator can often operate without needing to be in the security context, as long as the security mechanism does not provide protection over the transport-layer information. A transport translator does, however, make the group communication visible, and so can complicate keying and source authentication mechanisms. This is further discussed in Section 2.4.
Gateways are deployed when the endpoints are not fully compatible. Figure 4 shows an example topology. The functions a gateway provides can be diverse, and range from transport layer relaying between two domains not allowing direct communication, via transport or media protocol function initiation or termination, to protocol or media encoding translation. The supported security protocol might even be one of the reasons a gateway is needed.
+---+ +-----------+ +---+ | A |<---->| Gateway |<---->| B | +---+ +-----------+ +---+
Figure 4: RTP Gateway Topology
The choice of security protocol and the details of the gateway function will determine if the gateway needs to be a trusted part of the application security context or not. Many gateways need to be trusted by all peers to perform the translation; in other cases some or all peers might not be aware of the presence of the gateway. The security protocols have different properties depending on the degree of trust and visibility needed. Ensuring communication is possible without trusting the gateway can be strong incentive for accepting different security properties. Some security solutions will be able to detect the gateways as manipulating the media stream, unless the gateway is a trusted device.
A Media transcoder is a special type of gateway device that changes the encoding of the media being transported by RTP. The discussion in Section 2.3.2 applies. A media transcoder alters the media data, and so almost certainly needs to be trusted device that is part of the security context.
Any Source Multicast [RFC1112] is the original multicast model where any multicast group participant can send to the multicast group, and get their packets delivered to all group members (see Figure 5). This form of communication has interesting security properties, due to the many-to-many nature of the group. Source authentication is important, but all participants in the group security context will have access to the necessary secrets to decrypt and verify integrity of the traffic. Thus use of any symmetric security functions fails if the goal is to separate individual sources within the security context; alternate solutions are needed.
+-----+ +---+ / \ +---+ | A |----/ \---| B | +---+ / Multi- \ +---+ + Cast + +---+ \ Network / +---+ | C |----\ /---| D | +---+ \ / +---+ +-----+
Figure 5: Any Source Multicast Group
In addition the potential large size of multicast groups creates some considerations for the scalability of the solution and how the key-management is handled.
Source Specific Multicast [RFC4607] allows only a specific end-point to send traffic to the multicast group. That end-point is labelled the Distribution Source in Figure 6. It distributes traffic from a number of RTP media sources, MS1 to MSm. Figure 6 also depicts the feedback part of the SSM RTP session using unicast feedback [RFC5760] from a number of receivers R1..Rn that sends feedback to a Feedback Target (FT) and eventually aggregated and distributed to the group.
The use of SSM makes it more difficult to inject traffic into the multicast group, but not impossible. Source authentication requirements apply for SSM sessions too, and a non-symmetric verification of who sent the RTP and RTCP packets is needed.
The SSM communication channel needs to be securely established and keyed. In addition one also have the individual unicast feedback that also needs to be secured.
+-----+ +-----+ +-----+ | MS1 | | MS2 | .... | MSm | +-----+ +-----+ +-----+ ^ ^ ^ | | | V V V +---------------------------------+ | Distribution Source | +--------+ | | FT Agg | | +--------+------------------------+ ^ ^ | : . | : +...................+ : | . : / \ . +------+ / \ +-----+ | FT1 |<----+ +----->| FT2 | +------+ / \ +-----+ ^ ^ / \ ^ ^ : : / \ : : : : / \ : : : : / \ : : : ./\ /\. : : /. \ / .\ : : V . V V . V : +----+ +----+ +----+ +----+ | R1 | | R2 | ... |Rn-1| | Rn | +----+ +----+ +----+ +----+
Figure 6: SSM-based RTP session with Unicast Feedback
This section provides an overview of a number of currently defined security mechanisms that can be used with RTP.
The Secure RTP (SRTP) protocol [RFC3711] is one of the most commonly used mechanisms to provide confidentiality, integrity protection and source authentication for RTP. SRTP was developed with RTP header compression and third party monitors in mind. Thus the RTP header is not encrypted in RTP data packets, and the first 8 bytes of the first RTCP packet header in each compound RTCP packet are not encrypted. The entirity of RTP packets and compound RTCP packets are integrity protected. This allows RTP header compression to work, and lets third party monitors determine what RTP traffic flows exist based on the SSRC fields, but protects the sensitive content.
The source authentication guarantees provided by SRTP are highly dependent on the cryptographic transform and key-management scheme used. In some cases all a receiver can determine is whether the packets come from someone in the group security context, and not what group member send the packets. Thus, the source authentication guarantees depend also on the session topology. Some cryptographic transform have stronger authentication properties which can guarantee a given source, even over a multi-party session, e.g. those based on TESLA [RFC4383].
SRTP can easily be extended with additional cryptographic transforms. At the time of this writing, the following transforms are defined or under definition:
[RFC4771] defines a variant of the authentication tag that enables a receiver to obtain the Roll over Counter for the RTP sequence number that is part of the Initialization vector (IV) for many cryptographic transforms. This enables quicker and easier options for joining a long lived secure RTP group, for example a broadcast session.
RTP header extensions are in normally carried in the clear and only integrity protected in SRTP. This can be problematic in some cases, so [I-D.ietf-avtcore-srtp-encrypted-header-ext] defines an extension to also encrypt selected header extensions.
SRTP does not contain an integrated key-management solution, and instead relies on an external key management protocol. There are several protocols that can be used. The following sections outline some popular schemes.
A Datagram Transport Layer Security extension exists for establishing SRTP keys [RFC5763][RFC5764]. This extension provides secure key-exchange between two peers, including perfect forward secrecy and enabling binding strong identity verification to an end-point. The default key generation will generate a key that contains material contributed by both peers. The key-exchange happens in the media plane directly between the peers. The common key-exchange procedures will take two round trips assuming no losses. TLS resumption can be used when establishing additional media streams with the same peer, used reducing the setup time to one RTT.
DTLS-SRTP key management can use the signalling protocol in three ways. First, to agree on using DTLS-SRTP for media security. Secondly, to determine the network location (address and port) where each side is running an DTLS listener to let the parts perform the key-management handshakes that generate the keys used by SRTP. Finally, to exchange hashes of each sides certificates to enable each side to verify that they have connected to the by signalling indicated port and not a man in the middle. That way enabling some binding between the key-exchange and the signalling. This usage is well defined for SIP/SDP in [RFC5763], and in most cases can be adopted for use with other bi-directions signalling solutions.
Multimedia Internet Keying (MIKEY) [RFC3830] is a keying protocol that has several modes with different properties. MIKEY can be used in point-to-point applications using SIP and RTSP (e.g., VoIP calls), but is also suitable for use in broadcast and multicast applications, and centralized group communications.
MIKEY can establish multiple security contexts or cryptographic sessions with a single message. It is possible to use in scenarios where one entity generates the key and needs to distribute the key to a number of participants. The different modes and the resulting properties are highly dependent on the cryptographic method used to establish the Traffic Generation Key (TGK) that is used to derive the keys actually used by the security protocol, like SRTP.
MIKEY has the following modes of operation:
MIKEY messages has several different defined transports. [RFC4567] defines how MIKEY messages can be embedded in general SDP for usage with the signalling protocols SIP, SAP and RTSP. There also exist an 3GPP defined usage of MIKEY that sends MIKEY messages directly over UDP to key the receivers of Multimedia Broadcast and Multicast Service (MBMS) [3GPP.33.246].
Based on the many choices it is important to consider the properties needed in ones solution and based on that evaluate which modes that are candidates for ones usage. More information on the applicability of the different MIKEY modes can be found in [RFC5197].
[RFC4568] provides a keying solution based on sending plain text keys in SDP [RFC4566]. It is primarily used with SIP and SDP Offer/Answer, and is well-defined in point to point sessions where each side declares its own unique key. Using Security Descriptions to establish group keys is less well defined, and can have security issues as the SSRC uniqurness property can't be guaranteed.
Since keys are transported in plain text in SDP, they can easily be intercepted unless the SDP carrying protocol provides strong end-to-end confidentiality and authentication guarantees. This is not the common use of security descriptions with SIP, where instead hop by hop security is provided between signalling nodes using TLS. This still leaves the keying material sensitive to capture by the traversed signalling nodes. Thus in most cases the security properties of security description are weak.
Encrypted Key Transport (EKT) [I-D.ietf-avtcore-srtp-ekt] is an SRTP extension that enables group keying despite using a keying mechanism that can't support group keys, like DTLS-SRTP. It is designed for centralized conferencing, but can also be used in sessions where an end-points connect to a conference bridge or a gateway, and need to be provisioned with the keys each participant on the bridge or gateway uses to avoid decryption encryption cycles on the bridge or gateway.
The mechanism is based on establishing an additional EKT key which everyone uses to protect their actual session key. The actual session key is sent in a expanded authentication tag to the other session participants. This key are only sent occasionally or periodically depending on use cases depending on what requirements exist for timely delivery or notification on when the key is needed by someone.
There exist at least one additional SRTP key-management system, namely ZRTP [RFC6189]. This was a candidate for IETF standardization that wasn't chosen, and was published for information instead. Its properties are somewhat similar to DTLS.
There might exist additional non-IETF defined solutions.
Section 9 of the RTP standard [RFC3550] defines a DES or 3DES based encryption of RTP and RTCP packets. This mechanism is keyed using plain text keys in SDP [RFC4566] using the "k=" SDP field. This method of providing confidentiality has extremely weak security properties and is not to be used.
IPsec [RFC4301] can be used independent of mode to protect RTP and RTCP packets in transit from one network interface to another. This can be sufficient when the network interfaces have a direct relation, or in a secured environment where it can be controlled who can read the packets from those interfaces.
The main concern with using IPsec to protect RTP traffic is that in most cases using a VPN approach that terminates the security association at some node prior to the RTP end-point leaves the traffic vulnerable to attack between the VPN termination node and the end-point. Thus usage of IPsec requires careful thought and design of its usage so that it really meets the security goals. A important question is how one ensure the IPsec terminating peer and the ultimate destination is the same.
IPsec with RTP is more commonly used as security solution between central nodes in an infrastructure that exchanges many RTP sessions and media streams between the peers. The establishment of a secure tunnel between these peers minimizes the key-management overhead between these two boxes.
Datagram Transport Layer Security (DTLS) [RFC6347] can provide point to point security for RTP flows. The two peers would establish an DTLS association between each other, including the possibility to do certificate-based source authentication when establishing the association. All RTP and RTCP packets flowing will be protected by this DTLS association.
Note: using DTLS is different to using DTLS-SRTP key management. DTLS-SRTP has the core key-management steps in common with DTLS, but DTLS-SRTP uses SRTP for the per packet security operations, while DTLS uses the normal datagram TLS data protection. When using DTLS, RTP and RTCP packets are completely encrypted with no headers in the clear, while DTLS-SRTP leaves the headers in the clear.
DTLS can use similar techniques to those available for DTLS-SRTP to bind a signalling side agreement to communicate to the certificates used by the end-point when doing the DTLS handshake. This enables use without having a certificate based trust chain to a trusted certificate root.
When RTP is sent over TCP [RFC4571] it can also be sent over TLS over TCP [RFC4572], using TLS to provide point to point security services. The security properties TLS provides are confidentiality, integrity protection and possible source authentication if the client or server certificates are verified and provide a usable identity. When used in multi-party scenarios using a central node for media distribution, the security provide is only between then central node and the peers, so the security properties for the whole session are dependent on what trust one can place in the central node.
Mechanisms have been defined that encrypt only the payload of the RTP packets, and leave the RTP headers and RTCP in the clear. There are several reasons why this might be appropriate, but a common rationale is to ensure that the content stored in RTP hint tracks in RTSP streaming servers has the media content in a protected format that cannot be read by the streaming server (this is mostly done in the context of Digital Rights Management). These approaches then uses a key-management solution between the rights provider and the consuming client to deliver the key used to protect the content, usually after the appropriate method for charging has happened, and do not include the media server in the security context. Such methods have several security weaknesses such the fact that the same key is handed out to a potentially large group of receiving clients, increasing the risk of a leak.
Use of this type of solution can be of interest in environments that allow middleboxes to rewrite the RTP headers and select what streams that are delivered to an end-point (e.g., some types of centralised video conference systems). The advantage of encrypting and possibly integrity protecting the payload but not the headers is that the middlebox can't eavesdrop on the media content, but can still provide stream switching functionality. The downside of such a system is that it likely needs two levels of security: the payload level solution to provide confidentiality and source authentication, and a second layer with additional transport security ensuring source authentication and integrity of the RTP headers associated with the encrypted payloads. This can also results in the need to have two different key-management systems as the entity protecting the packets and payloads are different with different set of keys.
The aspect of two tiers of security are present in ISMAcryp (see Section 3.6.1) and the deprecated 3GPP Packet Based Streaming Service Annex.K [3GPP.23.234] solution.
The Internet Streaming Media Alliance (ISMA) has defined ISMA Encryption and Authentication 2.0 [ISMACrypt2]. This specification defines how one encrypts and packetizes the encrypted application data units (ADUs) in an RTP payload using the MPEG-4 Generic payload format [RFC3640]. The ADU types that are allowed are those that can be stored as elementary streams in an ISO Media File format based file. ISMAcryp uses SRTP for packet level integrity and source authentication from a streaming server to the receiver.
Key-management for a ISMACryp based system can be achieved through Open Mobile Alliance (OMA) Digital Rights Management 2.0 [OMADRMv2], for example.
In the following we provide guidelines for how to choose appropriate security mechanisms for RTP applications.
This section discusses a number of application requirements that need be considered. An application designer choosing security solutions requires a good understanding of what level of security is needed and what behaviour they strive to achieve.
When it comes to confidentiality of an RTP session there are several aspects to consider:
As can be seen the actual confidentiality level has likely more to do with the application's usage of centralized nodes, and the details of the key-management solution chosen, than with the actual choice of encryption algorithm (although, of course, the encryption algorithm needs to be chosen appropriately for the desired security level).
Protection against modification of content by a third party, or due to errors in the network, is another factor to consider. The first aspect that one consider is what resilience one has against modifications to the content. This can affect what cryptographic algorithm is used, and the length of the integrity tags. However equally, important is to consider who is providing the integrity assertion, what is the source of the integrity tag, and what are the risks of modifications happening prior to that point where protection is applied? RTP applications that rely on central nodes need to consider if hop-by-hop integrity is acceptable, or if true end-to-end integrity protection is needed? Is it important to be able to tell if a middlebox has modified the data? There are some uses of RTP that require trusted middleboxes that can modify the data in a way that doesn't break integrity protection as seen by the receiver, for example local advertisment insertion in IPTV systems; there are also uses where it is essential that such in-network modification be detectable. RTP can support both, with appropriate choices of security mechanisms.
Integrity of the data is commonly closely tied to the question of source authentication. That is, it becomes important to know who makes an integrity assertion for the data.
Source authentication is about determining who sent a particular RTP or RTCP packet. It is normally closely tied with integrity, since you also want to ensure that what you received is what the claimed source really sent, so source authentication without integrity is not particularly useful. In similar way, although not as definitive, is that integrity without source authentication is also not particular useful: you need to know who claims this packet wasn't changed.
Source authentication can be asserted in several different ways:
As seen in the previous section, having an identity provider system can benefit the applications by enabling them to do strong assertion between identity and the actual media source. Therefore, the need for identity needs to be considered. However, having identity systems might not be suitable for all types of application, since they require trusted infrastructure.
RTP applications need to consider what privacy goals they have. As RTP applications communicate directly between peers in many cases, the IP addresses of any communication peer will be available. The main privacy concern with IP addresses is related to geographical location and the possibility to track a user of an end-point. The main way of avoid such concerns is the introduction of relay or centralized media mixers or forwarders that hides the address of a peer from any other peer. The security and trust placed in these relays obviously needs to be carefully considered.
RTP itself can contribute to enabling a particular user to be tracked between communication sessions if the CNAME is generated according to the RTP specification in the form of user@host. Such RTCP CNAMEs are likely long term stable over multiple sessions, allowing tracking of users. This can be desirable for long-term fault tracking and diagnosis, but clearly has privacy implications. Instead cryptographically random ones could be used as defined by Random algorithm for RTP CNAME generation [I-D.rescorla-avtcore-random-cname].
If there exist privacy goals, these need to be considered, and the system designed with them in mind. In addition certain RTP features might have to be configured to safeguard privacy, or have requirements on how the implementation is done.
When it comes to RTP security, the most appropriate solution is often highly dependent on the topology of the communication session. The signalling also impacts what information can be provided, and if this can be instance specific, or common for a group. In the end the key-management system will highly affect the security properties achieved by the application. At the same time, the communication structure of the application limits what key management methods are applicable. As different key-management have different requirements on underlying infrastructure it is important to take that aspect into consideration early in the design.
Few RTP applications exist as independent applications that never interoperate with anything else. Rather, they enable communication with a potentially large number of other systems. To minimize the number of security mechanisms that need to be implemented, it is important to consider if one can use the same security mechanisms as other applications. This can also reduce the problems of determining what security level is actually negotiated in a particular session.
The desire to be interoperable can in some cases be in conflict with the security requirements determined for an application. To meet the security goals, it might be necessary to sacrifice interoperability. Alternatively, one can implement multiple security mechanisms, but then end up with an issue of ensuring that the user understands what it means to use a particular security level. In addition, the application can then become vulnerable to bid-down attack.
In the following we describe a number of example security solutions for RTP using applications, services or frameworks. These examples are provided to show the choices that can be made. They are not normative recommendations for security.
The IETF evaluated media security for RTP sessions established using point-to-point SIP sessions in 2009. A number of requirements were determined, and based on those, the existing solutions for media security and especially the keying methods were analysed, and the resulting requirements and analysis were published in [RFC5479]. Based on this analysis, and the working group discussion, DTLS-SRTP was determined to be the best solution, and the specifications were finalized.
The security solution for SIP using DTLS-SRTP is defined in the Framework for Establishing a Secure Real-time Transport Protocol (SRTP) Security Context Using Datagram Transport Layer Security (DTLS) [RFC5763]. On a high level it uses SIP with SDP offer/answer procedures to exchange the network addresses where the server end-point will have a DTLS-SRTP enable server running. The SIP signalling is also used to exchange the fingerprints of the certificate each end-point will use in the DTLS establishment process. When the signalling is sufficiently completed the DTLS-SRTP client performs DTLS handshakes and establishes SRTP session keys. The clients also verify the fingerprints of the certificates to verify that no man in the middle has inserted themselves into the exchange.
At the basic level DTLS has a number of good security properties. For example, to enable a man in the middle someone in the signalling path needs to perform an active action and modify the signalling message. There also exist a solution that enables the fingerprints to be bound to identities established by the first proxy for each user [RFC4916]. That reduces the number of nodes the connecting user UA has to trust to the first hop proxy, rather than the full signalling path.
Web Real-Time Communication [I-D.ietf-rtcweb-overview] is solution providing web-application with real-time media directly between browsers. The RTP transported real-time media is protected using a mandatory to use application of SRTP. The keying of SRTP is done using DTLS-SRTP. The security configuration is further defined in the WebRTC Security Architecture [I-D.ietf-rtcweb-security-arch].
The peers hash of their certificates are provided to a Javascript application that is part of a client server system providing rendezvous services for the ones a given peer wants to communicate with. Thus the handling of the hashes between the peers is not well defined. It becomes a matter of trust in the application. But unless the application and its server is intending to compromise the communication security they can provide a secure and integrity protected exchange of the certificate hashes thus preventing any man-in-the-middle (MITM) to insert itself in the key-exchange.
The web application still have the possibility to insert a MITM. That unless one uses a Identity provider and the proposed identity solution [I-D.rescorla-rtcweb-generic-idp]. In this solution the Identity Provider which is a third party to the web-application signs the DTLS-SRTP hash combined with a statement on which user identity that has been used to sign the hash. The receiver of such a Identity assertion then independently verifies the user identity to ensure that it is the identity it intended to communicate and that the cryptographic assertion holds. That way a user can be certain that the application also can't perform an MITM and that way acquire the keys to the media communication.
In the development of WebRTC there has also been high attention on privacy question. The main concerns that has been raised and are at all related to RTP are:
Note: The above cases are focused on providing privacy towards other parties than the web service.
To be written:
To be written:
This document makes no request of IANA.
Note to RFC Editor: this section can be removed on publication as an RFC.
This entire document is about security. Please read it.
We thank the IESG for their careful review of [I-D.ietf-avt-srtp-not-mandatory] which led to the writing of this memo.