Network Working Group | O.O. Ohlsson |
Internet-Draft | Ericsson |
Intended status: Informational | August 20, 2012 |
Expires: February 19, 2013 |
Support of SDES in WebRTC
draft-ohlsson-rtcweb-sdes-support-01
Which key management protocols to support has been lively debated in WebRTC on several occasions. This document explains the benefits of SDES and argues why allowing it as an alternative option has little impact on security.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 19, 2013.
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Which key management protocols to support has been lively debated in WebRTC on several occasions. The main question is the following: Should applications be restricted to DTLS-SRTP or could SDES be allowed as an alternative option?
In this document we identify and address the issues that have been raised. We explain the benefits of SDES and argue why allowing it as an alternative option has little impact on security.
Being able to communicate from WebRTC applications to existing SIP/RTP endpoints is a highly desirable use case. The SIP installed base is huge and contains millions of devices and a large number of applications (e.g. conferencing and voicemail). Even more important, nearly all mobile phones and landlines are reachable through SIP/RTP gateways deployed in service provider networks. The same can also be said for other signaling protocols, such as XMPP or H.323. As a sidenote, the recent work on the DTMF tone API in WebRTC proves that many members consider legacy interworking to be important.
Communication between the Browser and SIP/RTP endpoint will most likely require some form om media-plane gateway (due to the need to terminate ICE). The development and testing costs for such gateways are typically very high since they need to handle a large number of users and often contain special purpose hardware. It is definitely worthwhile to try to reduce costs by lowering the complexity and removing functionality that is not strictly required. This would result in lower prices which will lead to a higher degree of interconnectivity between WebRTC and existing SIP deployments.
Already today there are Session Border Controllers (SBC) that perform SRTP termination on behalf of endpoints with SDES based keying (there are SBCs that support DTLS-SRTP but this is uncommon). If the browser also supported SDES, the WebRTC gateway could simply forward all SRTP packets to the SBC and let it decide whether to terminate encryption or not (depending on the capabilities of the receiving endpoint).
A large part of modern SIP/RTP devices support SRTP and most of them that do, use SDES based keying. This is confirmed in the report from the latest [SIPit] event which stated that:
Although these figures may not be entirely accurate, they at least provide an indication of the current situation.
The 3rd Generation Partnership Project (3GPP) has also selected SDES for key management in the IP Multimedia Subsystem (IMS) [3GPP.33.328]. We can therefore expect the number of SDES capable devices to increase as Voice over LTE (VoLTE) and other IMS based systems become more widely deployed.
Provided SDES is included in browsers, calls between the WebRTC and SIP domains do not need to be encrypted/decrypted by an intermediate gateway when the SIP endpoint supports SDES. This leads to a substantial reduction in processing cost for the gateway in SIP domains where a large part of the devices support SDES. Another benefit is that for those endpoints that support SDES the call will be protected end-to-end for free. Achieving this with DTLS-SRTP would require the gateway to first decrypt and then re-encrypt traffic.
Note that the important question is whether the gateway needs to terminate SRTP at all. Processing wise there is probably not that much difference in terminating an SRTP + SDES or an SRTP + DTLS-SRTP call.
DTLS-SRTP with Encrypted Key Transport (EKT) [I-D.ietf-avtcore-srtp-ekt] has been suggested as an alternative to avoid expensive encryption/decryption in gateways. If browsers support DTLS-SRTP with EKT, a gateway can force the browser and the SDES endpoint to agree on the same set of SRTP keys and algorithm settings. Once this is done, the gateway will simply forward the SRTP (and SRTCP) packets in both directions. The downside of using this approach is the increased complexity of the gateway (new protocols and additional signaling are required) and the lack of implementation experience.
With SDES a peer can begin to send media as soon as an ICE candidate pair has been nominated for use and the connectivity check for that pair has succeeded. If DTLS-SRTP is being used the peer would also need to wait for the DTLS-SRTP handshake to complete, which requires two additional roundtrips.
Obviously, being able to start sending media quickly is not very useful unless the receiver knows how to process the incoming packets. One common argument against SDES is its inability to handle early media (i.e. media that arrives at the SDP offerer before the SDP answer arrives). However, this problem cannot occur if the offerer is ICE full. To see why, recall that sending media requires that a candidate pair has been nominated for use by the ICE controlling agent, which is always the offerer when the offerer is ICE full. Since nomination is done by sending a connectivity check (with the nomination flag set) which requires the password provided in the SDP answer, no pair gets nominated at the answerer and no media is sent before the SDP answer has arrived at the offerer.
If the offerer is ICE lite or if multiplexing is used (i.e. all media streams are sent over a single ICE candidate pair) and an additional media stream is added later in time via an updated offer, then the problem with early media could arise when SDES is used (but never with DTLS-SRTP).
At this point most readers should agree that SDES is favourable from an interworking point of view. It is also clear that implementing SDES in WebRTC is a relatively straight forward task. What remains to be considered are its impacts on security.
We distinguish between the following two types of attackers:
By requiring that signaling is secured using TLS, an outside attacker that monitors network traffic will not be able to extract the SDES keys. Therefore, in this scenario both SDES and DTLS-SRTP provide a sufficient level of protection.
The two other types of attacks that have been mentioned in this context are extraction of log data and code injection, each of which are considered below.
In this scenario the attacker manages to decrypt a previously recorded call by attacking the signaling server and extracting the SDES keys from the server log.
First of all, if the attacker gets as far as reading the logging data then eavesdropping of past calls is probably not the only problem. The effort required to break into the server is also related to the amount of trust the user assigns to the web application: well trusted sites often have well protected servers.
Secondly, it can be questioned how common this type of extensive logging really is. Storing passwords and other sensitive information in log files is an implementation mistake that can easily be avoided.
Finally, SDES will primarily be used when interworking with existing SIP systems deployed within enterprises or service providers. These have been using SDES for a long time and know that it is critical to protect the plain text keys.
In this scenario the attacker manages to inject his own piece of JavaScript into the WebRTC application. The next time a user downloads the application and places a call, the script will execute and start eavesdropping on the conversation.
There are three major ways in which code can be injected into a web application:
Modification en route is prevented by requiring HTTPS to be used for all content. Whether the two other injection techniques are feasible or not largely depends on the application.
If script injection occurs then there are other methods to intercept a call, like establishing additional PeerConnection objects or use a recording interface and send the data using WebSocket. As long as these methods are available it does not matter much whether the application uses SDES or DTLS-SRTP.
In general, if an attacker manages to execute even a small piece of JavaScript then he has effectively gained full control of the application (additional code can be included and HTML elements removed/inserted). Since this situation is exactly the same as the situation with an inside attacker, script injection will not be discussed further.
First of all, it can be questioned if we really want to protect ourselves against an inside attacker. If consent is required every time the application wants to record or forward media then the user experience will suffer. One could also imagine future applications that want to use their own codecs or filters (for example a voice scrambler or face detection software), something which is difficult to achieve without access to the underlying bitstreams.
We ignore this problem for now and simply assume that the application cannot access the media from within the browser. In other words, we only consider protection of the media during transport.
The major argument against SDES is that it would make it trivial for the application to perform interception. Let us compare what would be required in both cases.
Interception of SDES call:
Interception of DTLS-SRTP call:
Putting the modified TURN server into place is the hardest part of intercepting a DTLS-SRTP call. Once this is done however, the remaining steps are fairly straightforward. This shows that neither DTLS-SRTP nor SDES provides any significant protection against an inside attacker.
There is one benefit of DTLS-SRTP that is not directly apparent from the above description. If both users read their respective fingerprint values over the voice channel then they can detect if the conversation is being intercepted. However, it is very unlikely that the average user would bother doing this.
The comparison in the previous section is somewhat simplified since it does not consider DTLS-SRTP key continuity. The way this mechanism works is that the browser will notify the user whenever it receives a certificate which has not previously been seen (i.e. not present in the browser cache). Since the user will receive this notification every time he calls someone new and whenever someone changes browser, it is very likely that he/she will simply ignore it.
Reuse of public keys also has privacy implications as it enables user tracking. A user that wants to remain anonymous towards a service provider would need to generate a fresh key for each interaction. Furthermore, in order to avoid colluding service providers (e.g. medical clinics and insurance agencies) from linking a user's activities, separate certificates are needed for different domains. However, storing domain names together with the certificates might allow the next browser user (e.g. a family member) to see which sites the previous user visited. All of this leads to more certificates being generated which in turn results in even more "new key" notifications.
It is also important to understand that the cached certificates are not bound to any identity (the certificates are simple containers for the public key without any additional information). This means that if just one of the cached keys is compromised any user call can be intercepted without causing the "new key" notification to be displayed. Note that the risk of this happening is directly related to the size of the cache, which grows over time.
[I-D.rescorla-rtcweb-generic-idp] suggests a way to strengthen the security of DTLS-SRTP by validating the received fingerprint via an identity provider. At the time of writing there are still some details missing from the proposal (for example, it is not clear how the identity provider is selected in practice or how the peer identity is displayed to the user) but it definitely seems promising. Such a mechanism (including the necessary browser chrome) would make it significantly harder for the application to act as man-in-the-middle.
The question is whether the identity mechanism is optional or not, i.e. will it be possible for an application to use "plain" DTLS-SRTP. The answer is most likely "yes" due to the following reasons:
Note that allowing an application to be its own identity provider is effectively the same as allowing plain DTLS-SRTP (the user trusts the application) only more complicated.
We are not looking to replace DTLS-SRTP with SDES. The 20-line WebRTC developer will continue to use the default option which is DTLS-SRTP, while others who are interested in interworking will select SDES. The latter group will be required to use HTTPS for all content and can be informed of the necessary precautions (secure storage of log files or otherwise no extensive logging).
The main issue that appears to concern members is the application's ability to downgrade security. But as we have seen it is not significantly harder for the application to attack DTLS-SRTP. The main advantage of DTLS-SRTP is the possibility to detect when a call is being intercepted. However, doing so requires an effort from the user and a certain degree of technical skill.
It has been suggested that additional identity mechanisms could prevent the application from listening in on calls. While this is certainly true, any such mechanism would most likely be made optional. If that is the case or if an application can be its own identity provider, then we are back at the situation where the user has to decide which sites to trust.
It can also be questioned to what extent the application should be restricted from accessing media since this limits usability and innovativity. The W3C would need to update its specifications and ensure that a web application cannot record or forward a MediaStream without permission from the user.