Network Working Group | E. Ivov |
Internet-Draft | Jitsi |
Intended status: Informational | H. Kaplan |
Expires: December 19, 2014 | Oracle |
D. Wing | |
Cisco | |
June 17, 2014 |
Latching: Hosted NAT Traversal (HNT) for Media in Real-Time Communication
draft-ietf-mmusic-latching-08
This document describes behavior of signaling intermediaries in Real-Time Communication (RTC) deployments, sometimes referred to as Session Border Controllers (SBCs), when performing Hosted NAT Traversal (HNT). HNT is a set of mechanisms, such as media relaying and latching, that such intermediaries use to enable other RTC devices behind NATs to communicate with each other.
This document is non-normative, and is only written to explain HNT in order to provide a reference to the IETF community, as well as an informative description to manufacturers, and users.
Latching, which is one of the components of the HNT components, has a number of security issues covered here. Because of those, and unless all security considerations explained here are taken into account and solved, the IETF advises against use of latching mechanism over the Internet and recommends other solutions such as the Interactive Connectivity Establishment (ICE) protocol.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 19, 2014.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Network Address Translators (NATs) are widely used in the Internet by consumers and organizations. Although specific NAT behaviors vary, this document uses the term "NAT" for devices that map any IPv4 or IPv6 address and transport port number to another IPv4 or IPv6 address and transport port number. This includes consumer NATs, Firewall-NATs, IPv4-IPv6 NATs, Carrier-Grade NATs (CGNs) [RFC6888], etc.
The Session Initiation Protocol (SIP) [RFC3261], and others that try to use a more direct path for media than with signaling, are difficult to use across NATs. These protocols use IP addresses and transport port numbers encoded in bodies such as the Session Description Protocol (SDP) [RFC4566] and, in the case of SIP, various header fields. Such addresses and ports are unusable unless all peers in a session are located behind the same NAT.
Mechanisms such as Session Traversal Utilities for NAT (STUN) [RFC5389], Traversal Using Relays around NAT (TURN) [RFC5766], and Interactive Connectivity Establishment (ICE) [RFC5245] did not exist when protocols like SIP began being deployed. Some mechanisms, such as the early versions of STUN [RFC3489], had started appearing but they were unreliable and suffered a number of issues typical for UNilateral Self-Address Fixing (UNSAF) and described in [RFC3424]. For these and other reasons, Session Border Controllers (SBCs) that were already being used by SIP domains for other SIP and media-related purposes began to use proprietary mechanisms to enable SIP devices behind NATs to communicate across the NATs. These mechanisms are often transparent to endpoints and rely on a dynamic address and port discovery technique called "latching".
The term often used for this behavior is Hosted NAT Traversal (HNT), although a number of manufacturers sometimes use other names such as "Far-end NAT Traversal" or "NAT assist" instead. The systems which perform HNT are frequently SBCs as described in [RFC5853], although other systems such as media gateways and "media proxies" sometimes perform the same role. For the purposes of this document, all such systems are referred to as SBCs, and the NAT traversal behavior is called HNT.
As of this document's creation time, a vast majority of SIP domains use HNT to enable SIP devices to communicate across NATs, despite the publication of ICE. There are many reasons for this, but those reasons are not relevant to this document's purpose and will not be discussed. It is however worth pointing out that the current deployment levels of HNT and NATs make the complete extinction of this practice highly unlikely in the foreseeable future.
The purpose of this document is to describe the mechanisms often used for HNT at the SDP and media layer, in order to aid understanding the implications and limitations imposed by it. Although the mechanisms used in HNT are well known in the community, publication in an IETF document is useful as a means of providing common terminology and a reference for related documents.
This document does not attempt to make a case for HNT or present it as a solution that is somehow better than alternatives such as ICE. Due to the security issues presented in Section 5, the latching mechanism is considered inappropriate for general use on the Internet unless all security considerations are taken into account and solved. The IETF is instead advising for the use of the Interactive Connectivity Establishment [RFC5245] and Traversal Using Relays around NAT (TURN) [RFC5766] protocols.
It is also worth mentioning that there are purely signaling-layer components of HNT as well. One such component is briefly described for SIP in [RFC5853], but that is not the focus of this document. SIP uses numerous expressive primitives for message routing. As a result, the HNT component for SIP is typically more implementation-specific and deployment-specific than the SDP and media components. For the purposes of this document it is hence assumed that signaling intermediaries handle traffic in a way that allows protocols such as SIP to function correctly across the NATs.
The rest of this document is going to focus primarily on use of HNT for SIP. However, the mechanisms described here are relatively generic and are often used with other protocols, such as XMPP [RFC6120], Media Gateway Control Protocol (MGCP) [RFC3435], H.248/MEGACO [RFC5125], and H.323 [H.323].
The general problems with NAT traversal for protocols such as SIP are:
In order to overcome these issues, signaling intermediaries such as SIP SBCs on the public side of the NATs perform HNT for both signaling and media. An example deployment model of HNT and SBCs is shown in Figure 1.
+-----+ +-----+ | SBC |-------| SBC | +-----+ +-----+ / \ / Public Net \ / \ +-----+ +-----+ |NAT-A| |NAT-B| +-----+ +-----+ / \ / Private Net Private Net \ / \ +------+ +------+ | UA-A | | UA-B | +------+ +------+
Figure 1: Signaling and Media Flows in a Common Deployment Scenario
Along with codec and other media-layer information, session establishment signaling also conveys, potentially private and non-globally routable addressing information. Signaling intermediaries would hence modify such information so that peer UAs are given the (public) addressing information of a media relay controlled by the intermediary.
In typical deployments, the media relay and signaling intermediary (i.e., the SBC) are co-located, thereby sharing the same IP address. Also, the address of the media relay would typically belong to the same IP address family as the one used for signaling, as it is known to work for that UA. In other words, signalling and media would either both travel over IPv4 or IPv6.
The port numbers introduced in the signaling by the intermediary are typically allocated dynamically. Allocation strategies are entirely implementation dependent and they often vary from one product to the next.
The offer/answer media negotiation model [RFC3264] is such that once an offer is sent, the generator of the offer needs to be prepared to receive media on the advertised address/ports. In practice such media may or may not be received, depending on the implementations participating in a given session, local policies, and call scenario. For example if a SIP SDP Offer originally came from a UA behind a NAT, the SIP SBC cannot send media to it until an SDP Answer is given to the UA and latching [sec-latching] occurs. Another example is when a SIP SBC sends an SDP Offer in a SIP INVITE to a residential customer's UA and receives back SDP in a 18x response, the SBC may decide, for policy reasons, not to send media to that customer UA until a SIP 200 response has been received (e.g., to prevent toll-fraud).
An UA that is behind a NAT would stream media from an address and a port number (an address:port tuple) that are only valid in its local network. Once packets cross the NAT, that address:port tuple will be mapped to a public one. The UA however is not typically aware of the public mapping and would often advertise the private address:port tuple in signaling. This way, while a session is still being setup, the signaling intermediary is not yet aware what addresses and ports the caller and the callee would end up using for media traffic: it has only seen them advertise the private addresses they use behind their respective NATs. Therefore media relays used in HNT would often use a mechanism called "latching".
Historically, "latching" only referred to the process by which SBCs "latch" onto UDP packets from a given UA for security purposes, and "symmetric-latching" is when the latched address:port tuples are used to send media back to the UA. Today most people talk about them both as "latching", and thus this document does as well.
The latching mechanism works as follows:
Figure 2 describes how latching occurs for SIP where HNT is provided by an SBC connected to two networks: 203.0.113/24 facing towards the User Agent Client (UAC) network and 198.51.100/24 facing towards the User Agent Server (UAS) network.
192.0.2.1 192.0.2.9/203.0.113.4 198.51.100.33 Alice NAT 203.0.113.9-SBC-198.51.100.2 Bob ------- --- --- ------- | | | | 1. |--SIP INVITE+offer c=192.0.2.1--->| | | | | | 2. | | (SBC allocates 198.51.100.2:22007 | | | for inbound RTP from Bob) | | | | | 3. | | |-----INVITE+offer----->| | | | c=198.51.100.2:22007 | | | | | 4. | | |<------180 Ringing-----| | | | | | | | | 5. |<------180 Ringing----------------| | | | | | 6. | | |<------200+answer------| | | | | 7. | | (SBC allocates 203.0.113.9:36010 | | | for inbound RTP from Alice) | | | | | 8. |<-200+answer,c=203.0.113.9:36010--| c=198.51.100.33 | | | | | 9. |------------ACK------------------>| | 10. | | |----------ACK--------->| | | | | 11. |=====RTP,dest=203.0.113.9:36010==>| | | | | | 12. | | (SBC latches to | | | source IP address and | | | port seen at (11)) | | | | | 13. | | |<======= RTP ==========| | | |dest:198.51.100.2:22007| 14. |<=====RTP, to latched address=====| | | | | |
Figure 2: Latching by a SIP SBC across two interfaces
While XMPP implementations often rely on ICE to handle NAT traversal, there are some that also support a non-ICE transport called XMPP Jingle Raw UDP Transport Method [XEP-0177]. Figure 3 describes how latching occurs for one such XMPP implementation where HNT is provided by an XMPP server on the public internet.
192.0.2.1 192.0.2.9/203.0.113.4 203.0.113.9 198.51.100.8 Romeo NAT XMPP Server Juliet ----- --- --- ----- | | | | 1. |----session-initiate cand=192.0.2.1--->| | | | | | 2. |<------------ack-----------------------| | | | | | 3. | | (Server allocates 203.0.113.9:2200 | | | for inbound RTP from Juliet) | | | | | 4. | | |--session-initiate-->| | | |cand=203.0.113.9:2200| | | | | 5. | | |<--------ack---------| | | | | | | | | 6. | | |<---session-accept---| | | | cand=198.51.100.8 | | | | | 7. | | |---------ack-------->| | | | | 8. | | (Server allocates 203.0.113.9:3300 | | | for inbound RTP from Romeo) | | | | | 9. |<-session-accept cand=203.0.113.9:3300-| | | | | | 10. |-----------------ack------------------>| | | | | | | | | | 11. |======RTP, dest=203.0.113.9:3300======>| | | | | | 12. | | (XMPP server latches to | | | src IP 203.0.113.4 and | | | src port seen at (11)) | | | | | 13. | | |<======= RTP ========| | | |dest=203.0.113.9:2200| 14. |<======RTP, to latched address=========| | | | | |
Figure 3: Latching by an XMPP server across two interfaces
The above is a general description, and some details vary between implementations or configuration settings. For example, some intermediaries perform additional logic before latching on received packet source information to prevent malicious attacks or latching erroneously to previous media senders - often called "rogue-rtp" in the industry.
It is worth pointing out that latching is not an exclusively "server affair" and some clients may also use it in cases where they are configured with a public IP address and they are contacted by a NATed client with no other NAT traversal means.
In order for latching to function correctly, the UA behind the NAT needs to support symmetric RTP. That is, it needs to use the same ports for sending data as the ones it listens on for inbound packets. Today this is the case for with, for example, almost all SIP and XMPP clients. Also UAs need to make sure they can begin sending media packets independently and without waiting for packets to arrive first. In theory, it is possible that some UAs would not send packets out first; for example if a SIP session begins in 'inactive' or 'recvonly' SDP mode from the UA behind the NAT. In practice, however, SIP sessions from regular UAs (the kind that one could find behind a NAT) virtually never begin an inactive or 'recvonly' mode, for obvious reasons. The media direction would also be problematic if the SBC side indicated 'inactive' or 'sendonly' modes when it sent SDP to the UA. However SBCs providing HNT would always be configured to avoid this.
Given that, in order for latching to work properly, media relays need to begin receiving media before they start sending, it is possible for deadlocks to occur. This can happen when the UAC and the UAS in a session are connected to different signaling intermediaries that both provide HNT. In this case the media relays controlled by the signaling servers could end up each waiting upon the other to initiate the streaming. To prevent this relays would often attempt to start streaming toward the address:port tuples provided in the offer/answer even before receiving any inbound traffic. If the entity they are streaming to is another HNT performing server it would have provided its relay's public address and ports and the early stream would find its target.
Although many SBCs only support UDP-based media latching, and in particular RTP/RTCP, many SBCs support TCP-based media latching as well. TCP-based latching is more complicated, and involves forcing the UA behind the NAT to be the TCP client and sending the initial SYN-flagged TCP packet to the SBC (i.e., be the 'active' mode side of a TCP-based media session). If both UAs of a TCP-based media session are behind NATs, then SBCs typically force both UAs to be the TCP clients, and the SBC splices the TCP connections together. TCP splicing is a well-known technique, as described in [tcp-splicing].
HNT and latching in particular are generally found to be working reliably but they do have obvious caveats. The first one usually raised by IETF participants is that UAs are not aware of it occurring. This makes it impossible for the mechanism to be used with protocols such as ICE that try various traversal techniques in an effort to choose the one that best suits a particular situation. Overwriting address information in offers and answers may actually completely prevent UAs from using ICE because of the ice-mismatch rules described in [RFC5245]
The second issue raised by IETF participants is that it causes media to go through a relay instead of directly over the IP-routed path between the two participating UAs. While this adds obvious drawbacks such as reduced scalability and often increased latency, it is also considered a benefit by SBC administrators: if a customer pays for "phone" service, for example, the media is what is truly being paid for, and the administrators usually like to be able to detect that media is flowing correctly, evaluate its quality, know if and why it failed, etc. Also in some cases routing media through operator controlled relays may route media over paths explicitly optimized for media and hence offer better performance than regular Internet routing.
A common concern is that an SBC (or an XMPP server, all security considerations apply to both) that implements HNT may latch to incorrect and possibly malicious sources. The ICE [RFC5245] protocol for example, provides authentication tokens (conveyed in the ice-ufrag and ice-pwd attributes) that allow the identity of a peer to be confirmed before engaging in media exchange with her. Without such authentication, a malicious source could, for example, attempt a resource exhaustion attack by flooding all possible media-latching UDP ports on the SBC in order to prevent calls from succeeding. SBCs have various mechanisms to prevent this from happening, or alert an administrator when it does. Still, a sufficiently sophisticated attacker may be able to bypass them for some time. The most common example is typically referred to as "restricted-latching", whereby the SBC will not latch to any packets from a source public IP address other than the one the SIP UA uses for SIP signaling. This way the SBC simply ignores and does not latch onto packets coming from the attacker. In some cases the limitation may be loosened to allow media from a range of IP addresses belonging to the same network in order to allow for use cases such as decomposed UAs and various forms of third party call control. However, since relaxing the restrictions in such a way may provide attackers with a larger attack surface, such configurations are generally performed only on a case-by-case basis so that the specifics of individual deployments would be taken into account.
All of the above problems would still arise if the attacker knows the public source IP of the UA that is actually making the call. This would allow them to still flood all of the SBC's public IP addresses and ports with packets spoofing that SIP UA's public source IP address. However, this would only impact media from that IP (or range of IP addresses) rather than all calls that the SBC is servicing.
A malicious source could send media packets to an SBC media-latching UDP port in the hopes of being latched-to for the purpose of receiving media for a given SIP session. SBCs have various mechanisms to prevent this as well. Restricted latching for example would also help in this case since the attacker can't make the SBC send media packets back to themselves since the SBC will not latch onto the attacker's media packets, not having seen the corresponding signaling packets first. There could still be an issue if the attacker happens to be either (1) in the IP routing path and thus can spoof the same IP as the real UA and get the media coming back, in which case the attacker hardly needs to attack at all to begin with, or (2) the attacker is behind the same NAT as the legitimate SIP UA, in which case the attacker's packets will be latched-to by the SBC and the SBC will send media back to the attacker. In this latter case, which may be of particular concern with Carrier-Grade NATs, the legitimate SIP UA will likely end the call anyway when a human user who does not hear anything hangs up. In the case of a non-human call participant, such as an answering machine, this may not happen (although many such automated UAs would also hang up when they do not receive any media). The attacker could also redirect all media to the real SIP UA after receiving it, in which case the attack would likely remain undetected and succeed. Again, this would be of particular concern with larger scale NATs serving many different endpoints such as Carrier-Grade NATs. The larger the number of devices fronted by a NAT is, the more use cases would vary and the more the number of possible attack vectors would grow.
Naturally, SRTP [RFC3711] would help mitigate such threats and, if used with the appropriate key negotiation mechanisms, would protect the media from monitoring while in transit. It should therefore be used independently of HNT. [RFC3261] Section 26 provides an overview of additional threats and solutions on monitoring and session interception.
With SRTP, if the SBC that performs the latching is actually participating in the SRTP key exchange, then it would simply refuse to latch onto a source unless it can authenticate it. Failing to implement and use SRTP would represent a serious threat to users connecting from behind Carrier-Grade NATs [RFC6888] and is considered a harmful practice.
For SIP clients, HNT is usually transparent in the sense that the SIP UA does not know it occurs. In certain cases it may be detectable, such as when ICE is supported by the SIP UA and the SBC modifies the default connection address and media port numbers in SDP, thereby disabling ICE due to the mismatch condition. Even in that case, however, the SIP UA only knows a middle box is relaying media, but not necessarily that it is performing latching/HNT.
In order to perform HNT, the SBC has to modify SDP to and from the SIP UA behind a NAT, and thus the SIP UA cannot use S/MIME [RFC5751], and it cannot sign a sending request or verify a received request using [RFC4474] unless the SBC re-signs the request. However, neither S/MIME or [RFC4474] are widely deployed, thus not being able to sign/verify requests appear not to be a concern at this time.
From a privacy perspective, media relaying is sometimes seen as a way of protecting one's IP address and not revealing it to the remote party. That kind of IP address masking is often perceived as important. However, this is no longer an exclusive advantage of HNT since it can also be accomplished by client-controlled relaying mechanisms such as TURN [RFC5766], if the client explicitly wishes to do so.
This document has no actions for IANA.
Note to the RFC-Editor: please remove this section prior to publication as an RFC.
The authors would like to thank Flemming Andreasen, Miguel A. Garcia, Alissa Cooper, Vijay K. Gurbani, Ari Keranen and Paul Kyzivat for their reviews and suggestions on improving this document.