Internet DRAFT - draft-kaplan-rtcweb-sip-interworking-requirements
draft-kaplan-rtcweb-sip-interworking-requirements
Network Working Group H. Kaplan
Internet Draft Acme Packet
Intended status: Informational November 22, 2011
Expires: May 21, 2011
Requirements for Interworking WebRTC
with Current SIP Deployments
draft-kaplan-rtcweb-sip-interworking-requirements-02
Abstract
The IETF RTCWEB WG has been discussing how to interwork WebRTC with
deployed SIP equipment and domains. Doing so may require an
Interworking Function middlebox in the media-plane. This document
lists some WebRTC-to-SIP use-cases, the WebRTC requirements to
support such, and the complexity involved in interworking if the
requirements cannot be met.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 22, 2012.
Kaplan, et al Expires May 24, 2012 [Page 1]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
Copyright and License Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the BSD License.
Table of Contents
1. Terminology...................................................3
2. Introduction..................................................3
3. Existing SIP/RTP Devices......................................4
3.1. SIP/RTP Devices in Enterprises...........................4
3.2. SIP/RTP Devices in Service Providers.....................5
3.3. The Need for an Interworking Function....................6
4. WebRTC-SIP Interworking Architecture..........................7
4.1. Interworking Function Goal: Lower Cost...................7
4.2. Interworking Function Goal: Knowing when it is needed....8
4.3. Potential Interworking Functions and Complexity..........8
4.3.1 ICE Termination......................................8
4.3.2 SRTP Termination.....................................9
4.3.3 RTP/RTCP Stream Multiplexing.........................9
4.3.4 Multi-media Stream Multiplexing......................9
4.3.5 RFC-4733 DTMF Generation.............................9
4.3.6 RTCP Generation......................................10
4.3.7 Transcoding and Transrating..........................10
5. WebRTC-SIP Interworking Use-cases............................11
5.1. Basic Audio-Telephony Call..............................11
5.2. Secure Basic Calls......................................11
5.3. Conference Call in SIP Domain...........................12
5.4. Call Hold and Mute in WebRTC and SIP Domains............13
5.4.1 Legacy Call-Hold Devices Impacting RTCP..............13
5.4.2 RTP Generation when on Hold or Mute..................13
5.4.3 Clipping with Off-hold/off-mute......................14
5.5. Call Transfer in SIP Domain.............................14
5.6. Audio/Video Call Transfer...............................15
5.7. Find-Me-Follow-Me in SIP Domain.........................16
5.8. Video in SIP Domain.....................................17
5.8.1 Video and SIP/SDP....................................17
5.8.2 Video Codec Compatibility............................17
5.8.3 Separate Video RTP Stream............................17
Kaplan Expires - May 2011 [Page 2]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
5.8.4 Video RTP Packet Size................................17
6. Signaling-plane Interworking Requirements....................18
7. Media-plane Interworking Requirements........................19
7.1. RFC 4733 DTMF Requirements..............................20
8. Security Considerations......................................22
9. IANA Considerations..........................................23
10. Acknowledgments.............................................23
11. References..................................................23
11.1. Informative References..................................23
Author's Address.................................................24
1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119. The
terminology in this document conforms to RFC 2828, "Internet
Security Glossary".
Browser: an Internet World-Wide-Web/HTTP browser capable of
executing JavaScript/ECMAScript, with an RTCWeb RTP Library and
associated WebRTC API.
Web-Server: an HTTP/S server capable of hosting JavaScript to
Browsers, as well as execute local code (e.g., PHP).
WebRTC Client: the combination of Browser and JavaScript on the
user's host system.
RTP-Peer: another device communicating RTP/RTCP directly with the
local Client.
2. Introduction
One of the desired use-cases for the WebRTC architecture is to be
able to communicate from WebRTC applications to existing deployed
SIP/RTP-based Voice/Video-over-IP devices in the signaling and
media-planes. This document assumes such deployed devices
communicate using SIP at a signaling layer, but other protocols may
be possible such as XMPP or H.323.
For the signaling layer, it is assumed the Web-Server will have to
play a role in interworking with the SIP world, either using an
integrated Web Server module or separate signaling gateway. In
either case it should be possible to communicate with deployed SIP
devices at a SIP and SDP layer.
Kaplan Expires - May 2011 [Page 3]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
For the media-plane, however, the preference expressed thus far in
this WG is that direct communication at an IP layer between the
Browser and existing SIP devices be possible, without requiring a
media-plane gateway. Doing so with most deployed SIP devices might
be impossible, depending on what requirements are imposed on WebRTC
Browsers. An Interworking Function in the media-plane might be
required, deployed by either the WebRTC domain or the SIP domain.
The goal of this document is to summarize the use-cases for
communicating with deployed SIP devices and domains, and capture the
requirements necessary to do so without using an Interworking
Function, or to minimize its cost/complexity. The impacts or
difficulties with various Interworking Function needs are also
discussed, in order to try to minimize the cost and complexity of
using them.
For those readers wishing to skip the background, the requirements
can be found in sections 6 and 7. Note that some of the
requirements are already documented and achieved in current IETF
RTCWEB and W3C WEBRTC Working Group drafts; some are likely
unachievable. This document simply lists what must be done, so that
the Working Groups can discuss and decide if and how they can be
done.
3. Existing SIP/RTP Devices
This document covers two large groups of existing SIP and RTP
devices that the Working Group should focus on communicating with:
those in Enterprises, and those in Service Providers.
It is extremely difficult, and undoubtedly contentious, to
generalize existing SIP devices as having a common set of
capabilities - they do not. Some SIP devices implement ICE and
iLBC, for example, while others do not even generate RTCP and only
support G.711. For example, there are several software-based SIP
User Agents (i.e., softphones) which implement ICE, but virtually no
PSTN/TDM Gateways do, very few PBXs do, very few media servers do,
etc.
3.1. SIP/RTP Devices in Enterprises
The Enterprise market includes PBXs, desk-phones, conference
bridges, conference phones, soft-phones, PRI gateways, voicemail
servers, IVR systems, and recording systems. There are millions of
RTP devices already deployed in Enterprises today; some are
upgradeable, some are not.
Even for those devices that are upgradeable, it is difficult to
require upgrading them all at once; or require upgrading devices
Kaplan Expires - May 2011 [Page 4]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
that are already working today, simply in order to communicate with
WebRTC-enabled Browsers. An Enterprise that uses WebRTC-based Web
Applications itself would be more incented to do so, or be willing
to deploy an Interworking Function to do so, but not an Enterprise
that just happens to be the far-end peer for a voice/video call of a
WebRTC Application provided by someone else.
If an Interworking Function is required to communicate with deployed
Enterprise SIP devices, it is likely that the Enterprises that
deploy WebRTC-enabled applications, or WebRTC Application providers
wishing to communicate with SIP Enterprises, be the ones to deploy
the Interworking Functions - not the SIP Enterprises with deployed
SIP devices. Therefore, it is beneficial for the RTCWEB WG to
minimize the cost of such Interworking Functions, or not need any to
begin with.
3.2. SIP/RTP Devices in Service Providers
The SIP Service Provider market represents an enormous population of
users and applications reachable through SIP and RTP. There are
over 100 Million deployed RTP devices in Service Providers, but more
importantly approximately 5 Billion mobile phones, 1.5 Billion
landlines, and an untold number of PRI PBX trunks, all reachable
through SIP/RTP gateways or hosts in SIP Service Providers. When
compared to only about 2 Billion IP hosts on the public Internet, it
becomes clear why connecting to existing RTP devices through SIP
Service Providers is desirable.
Unfortunately, many of the deployed RTP devices are not upgradeable
to change behavior to match WebRTC: some of them are from
manufacturers that no longer exist or have stopped providing
enhancements for them; some are incapable of performing new codecs,
ICE, or RTCP due to hardware limitations; and in many cases a SIP
call will transit through the Service Provider to another Provider
or to an Enterprise, and the final RTP endpoint is not under the
local Service Provider's control to upgrade.
If an Interworking Function is required to communicate with deployed
Service Provider SIP devices, it is likely that the Service
Providers that deploy WebRTC-enabled applications, or WebRTC
Application providers wishing to communicate with SIP Service
Providers, be the ones to deploy the Interworking Functions - not
the SIP Service Providers with deployed SIP devices. Therefore, it
is beneficial for the RTCWEB WG to minimize the cost of such
Interworking Functions, or not need any to begin with.
Kaplan Expires - May 2011 [Page 5]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
3.3. The Need for an Interworking Function
While the best-case scenario is one in which no Interworking
Function is needed, it is likely one will be needed for many SIP
deployments based on the current requirements and limitations in
both WebRTC and SIP-based devices.
For example, because the Javascript in Browsers cannot be fully
trusted, a means of peer-consent must be used in the media-plane
before the Browser can be allowed to send RTP packets. The
currently proposed means of establishing such peer-consent is ICE
using the STUN connectivity checks, whereby the STUN responses
implicitly prove peer consent. A WebRTC Browser cannot allow
session media to be used unless the peer uses ICE. Since many SIP-
based devices do not support ICE, and will not be upgraded to do so
for the reasons described previously, an ICE-interworking device is
needed.
Kaplan Expires - May 2011 [Page 6]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
4. WebRTC-SIP Interworking Architecture
Due to the issues described in section 3, there will likely be a
need for an interworking function in the signaling or media-plane.
Therefore, this document assumes a WebRTC-SIP interworking
architecture similar to Figure 1 below:
WebRTC domain | SIP domain
+-----------+ +-----------+ +-----------+
| | | | | |
| Web | SIP | SIP | SIP | SIP |
| |-----| Inter- |-----| |
| Server | | working | | User-Agent|
| | | Function | | |
+-----------+ +-----------+ +-----------+
/ | \
/ | \
/ | \
/ | \
/ Proprietary over | Logical or \Logical or
/ HTTP/Websockets | Physical API \Physical API
/ | \
+-----------+ | \
|JS/HTML/CSS| | \
+-----------+ | \
+-----------+ +-----------+ +-----------+
| | | Media- | | |
| | | plane | | Media |
| Browser | -----------| Inter- |-------------| Agent |
| | | working | | |
| | | Function | | |
+-----------+ +-----------+ +-----------+
Figure 1: WebRTC-SIP Interworking Architecture
Note that the "SIP Interworking Function" is a logical function; it
may be a separate physical device, or it may be built into the Web
Server or the SIP User Agent (UA). Likewise, "Media-plane
Interworking Function" is a logical function which may be a physical
device or built into the Media Agent, and the vertical lines may be
logical internal APIs or external physical protocols.
The SIP and Media-plane Interworking Functions may be deployed by
the WebRTC domain administrator or the SIP domain administrator.
4.1. Interworking Function Goal: Lower Cost
One of the main goals of this document is to provide requirements
for interworking based on the desire for the least cost and
Kaplan Expires - May 2011 [Page 7]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
complexity. Determining cost is difficult because it depends in
large extent on device implementation specifics and other cost
factors that are not universally applicable. Even seemingly
unrelated costs, such as cost of space or power, have an impact on
the costs of interworking WebRTC and SIP. This document, however,
makes assumptions regarding cost that the author believes to be
generally accurate, based on the assumption that the more
complicated a function is, the more it costs.
Even if one uses free software to perform all of the interworking
functions, there is a cost burden tied to CPU, memory, and
potentially bandwidth uses. If a function takes more CPU
instructions to perform, for example, then it will take more CPUs to
perform it for the same number of sessions. Thus it is more
expensive.
4.2. Interworking Function Goal: Knowing when it is needed
If an Interworking Function is needed, it is important that such a
function can be used only when it is truly needed. This will not
only reduce the impact of it, but also enable it to be eventually
disabled. The Interworking Function needs to detect when it needs
to perform various interworking functions, on a session-by-session
basis. In particular, this implies that the mismatch(es) be
detectable in the signaling-plane itself: in SIP or SDP.
For most of the functions described in this document, such detection
ability exists in SIP or SDP. One function does not: RTCP
generation, as described in Section 4.3.6. An Interworking Function
cannot know, from either SIP or SDP exchanges, whether a deployed
SIP device will generate RTCP or not.
4.3. Potential Interworking Functions and Complexity
It is impossible to document the relative monetary costs of the
different interworking functions that may need to occur, because
they differ by manufacturer and system architecture. This section
highlights some of the complexities involved with the different
interworking functions that may need to be used, because complexity
usually translates to cost (though not always).
4.3.1 ICE Termination
If the Interworking Function has to terminate ICE (i.e., be an ICE
agent on behalf of the real SIP endpoint), this involves following
the procedures in [ICE], including calculating SHA-1 for each STUN
message, checking every UDP packet received during the lifetime of
the session to see if it is a STUN request or indication rather than
Kaplan Expires - May 2011 [Page 8]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
RTP, RTCP, or other message, and responding to STUN requests during
ICE restarts. Being an ICE-Lite agent is often simpler than being
an ICE-Full agent, however, because of the simpler logic and lack of
timers.
4.3.2 SRTP Termination
If the Interworking Function has to terminate SRTP (i.e.,
encrypt/decrypt SRTP on behalf of the real SIP endpoint), this
involves performing encryption/decryption and authentication
algorithms on every RTP/RTCP packet in both send/receive directions.
It should be noted that if SRTP is required to be used for every
call by WebRTC but the [SDES] key exchange model cannot be used on
the WebRTC side, then the Interworking Function likely has to
terminate SRTP from WebRTC even if the SIP-domain supports SRTP,
because [SDES] is the most commonly used form of key exchange in SIP
today.
4.3.3 RTP/RTCP Stream Multiplexing
If the Interworking Function has to multiplex/de-multiplex RTP and
RTCP on the same 5-tuple, this involves checking every received
packet for the RTP vs. RTCP header format and de-multiplexing them
onto separate 5-tuple flows, and in the other direction taking
packets from two 5-tuple flows and sending them on the same 5-tuple
set.
In some interworking system architectures, such a mux/demux function
would be trivial, or even simpler to do than not do due to the
reduction in number of ICE flows to terminate. Therefore this
document recommends it be possible to perform such muxing separately
from the media-type muxing described in the next sub-section 4.3.4.
4.3.4 Multi-media Stream Multiplexing
If the Interworking Function has to multiplex/de-multiplex RTP/RTCP
for audio and video streams on the same 5-tuple, the behavior
depends on how such multiplexing is defined. If the 5-tuple
multiplexing means they're all part of the same RTP session, then
de-multiplexing them is very complicated; if multiplexing means
they're all separate RTP/RTCP sessions and use some fixed header-
field mode of separation, then mux/demux is likely far simpler.
4.3.5 RFC-4733 DTMF Generation
If the Interworking Function has to generate [RFC4733] DTMF event
RTP packets to the SIP-domain side, this involves keeping track of
RTP timestamps and sequence numbers, and inserting the appropriate
Kaplan Expires - May 2011 [Page 9]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
sequence of [RFC4733] packets, etc. If SRTP is also used, then the
Interworking Function has to terminate SRTP to be able to insert
[RFC4733] events.
4.3.6 RTCP Generation
Because some SIP audio-only RTP endpoints do not generate RTCP, if
WebRTC requires receiving RTCP for calls to continue, then the
Interworking Function has to generate RTCP on behalf of them. This
is only a known issue for audio calls.
Unfortunately, generating fake RTCP is more complicated than most
people realize. The SDP in SIP does not indicate whether an
endpoint will generate RTCP - it is implicitly assumed in the AVP
profile. Therefore, the Interworking Function will have to check
every packet from the SIP-domain side to detect an RTCP message; if
it does not see one for a certain period of time, it will need to
generate one. The RTCP messages it generates will need to appear to
be true RTCP messages, and thus contain information for both sender
and receiver reports, DLSR, SSRCs, etc. It will need to continue to
check every packet throughout the call and use expiration timers,
because the call could be silently transferred as described in
section 5.6, resulting in a new RTP endpoint that does generate RTCP
on its own.
Furthermore, it will have to terminate SRTP as well even if the SIP-
domain side supports SRTP, in order to be able to generate the fake
RTCP messages. Even though it may appear unlikely that an RTP
endpoint that would support SRTP does not support RTCP, as far as
the Interworking Function knows that could be the case. In fact,
it's not unlikely to be the case, because middleboxes perform SRTP
on behalf of endpoints today, without generating RTCP on their
behalf. For example, the call may be from a WebRTC Browser to the
Interworking Function deployed by the WebRTC domain owner, to a
Service Provider with an SBC performing SRTP termination, and then
on to a PSTN gateway that does not generate RTCP (and some don't).
It is also possible that generating RTCP might actually require
transcoding in some system architectures, which would not only be
prohibitively expensive but also increase delay for RTP.
4.3.7 Transcoding and Transrating
If the Interworking Function has to perform transcoding, it is
likely the most expensive function described in this document.
Transcoding is typically performed in DSPs, which are expensive and
consume significant power and heat in large scale. DSP technology
has improved over the years in terms of cost and density, but it is
still one of the most expensive components of interworking. It also
Kaplan Expires - May 2011 [Page 10]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
impacts call quality at an audio level, as well as introduces delay
at an RTP level. For video, video transcoding DSPs exist, of
course, but scale far worse than audio transcoding.
Transrating (converting from one packetization rate to another) is
typically simpler and cheaper than transcoding, but still requires
terminating SRTP, RTP, and typically RTCP. It can sometimes be done
without using DSP technology, however, reducing the cost.
5. WebRTC-SIP Interworking Use-cases
Although [draft-use-cases] covers general use-cases, there are no
specific use-cases which drive requirements for interworking with
already-deployed SIP domains and their RTP endpoints. This section
provides such use-cases.
5.1. Basic Audio-Telephony Call
A WebRTC domain user should be able to generate and receive audio-
based sessions with currently deployed SIP Enterprise and Service
Provider domains. The author assumes the SIP aspects for a basic
call will "just work" or be easily inter-workable, but the media-
plane issues are as follows:
1) Most RTP endpoints do not support ICE.
2) Many RTP endpoints do not generate RTCP.
3) Most RTCP-capable endpoints only support RTCP on a separate UDP
port (i.e., the +1 odd number).
4) Most RTP endpoints do not support SRTP.
5) Most SRTP-capable endpoints only support [SDES] key exchange.
The above limitations drive some of the requirements in section 7,
although it may not be possible to meet all of the requirements due
to WebRTC security issues.
5.2. Secure Basic Calls
A WebRTC domain user should be able to generate and receive calls
with protection from eavesdropping and impersonation, to/from
currently deployed SIP Enterprise and Service Provider domains. For
example a WebRTC user should not be concerned about eavesdropping or
impersonation when using their laptop in public WiFi networks, or at
an IETF meeting, if their call goes to/from a SIP domain; likewise a
SIP-based user should not be concerned about it if their call goes
to/from a WebRTC domain.
Despite issue (4) in section 5.1 that most deployed RTP endpoints do
not support SRTP, the majority of ones that do support it are SIP
devices that are used from outside of the Enterprise or Service
Provider's physical network, such as software-clients. Within the
Kaplan Expires - May 2011 [Page 11]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
physical network (or VPN) most Enterprises and Service Providers
feel there is sufficient difficulty in eavesdropping and
impersonation that the benefits of not using SIP/TLS and SRTP
outweigh the risks; but beyond their or their trusted partners'
physical network(s) or VPNs there is not.
Therefore, SIP Enterprises and Service Providers may well *require*
SRTP be used in basic call scenarios with other WebRTC-application
domains. The way they handle such calls today, however, is by using
middleboxes to terminate SRTP and [SDES] based keying through secure
signaling (either SIP/TLS or SIP over IPsec). If [DTLS-SRTP] is
required to be used, then the WebRTC's Interworking Function will
have to interwork that to SRTP using [SDES], which will then likely
be terminated somewhere on the SIP Service Provider or Enterprise
side. This would be expensive for the WebRTC provider, and provide
dubious additional security beyond simply doing [SDES] in WebRTC.
In order to provide [SDES] in the Browser in a useful manner,
however, it needs to be secured with HTTPS to the Web Server.
5.3. Conference Call in SIP Domain
A WebRTC domain user should be able to call a SIP Enterprise or
Service Provider-reachable conference bridge, IVR services, make
credit-card-based toll calls, and access such things as their
voicemail, when the media server is in an Enterprise or Service
Provider's SIP domain. Typically such services are based on DTMF
event indications.
One means of generating DTMF events is using SIP messages, such as
KPML [RFC4730] or SIP INFO messages, and it is assumed that such
mechanisms would be possible in a WebRTC context without new
requirements. Many deployed SIP/RTP systems, however, rely on DTMF
events to be indicated in RTP using [RFC4733] event packets.
The ability to interwork SIP-based DTMF indications, including KPML,
to [RFC4733] DTMF events is already supported by some interworking
manufacturers, but it adds complexity. For example if SRTP is used,
handling DTMF interworking will require the Interworking Function to
also perform SRTP termination. An alternative solution is to
provide the means for both a Javascript-driven signaling-plane
indication (which likely already exists), as well as a Javascript-
driven media-plane [RFC4733] method in the Browser.
It should be noted that some deployed systems only use DTMF in-band
as tones in G.711 audio. This is a much smaller population of
deployed media servers, however, than it is of clients, and thus the
author believes may not be an issue for WebRTC. In other words,
most servers that need to process received DTMF events also support
[RFC4733], whereas some endpoints can only generate DTMF in-band;
Kaplan Expires - May 2011 [Page 12]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
since the use-case involves WebRTC Browsers generating DTMF to
deployed SIP media servers, rather than deployed SIP endpoints
generating DTMF to WebRTC Browsers, this is likely a non-issue.
5.4. Call Hold and Mute in WebRTC and SIP Domains
A WebRTC domain user should be able to call to, or be called by, a
SIP Enterprise or Service Provider and put their call on hold or
mute, and un-hold/un-mute it at any time; and have their call put on
hold/mute by the SIP side.
This use-case may seem obvious and non-problematic, since SDP has
direction attributes to indicate inactive/sendonly/recvonly for such
things. A call-hold case, for example, is often performed by
sending an SDP offer with a 'sendonly' direction attribute and
muting the local inputs. There are subtle issues, however,
depending on whether RTCP is required, as well as depending on the
WebRTC API design and architecture.
5.4.1 Legacy Call-Hold Devices Impacting RTCP
From a legacy deployment perspective, there are still SIP devices
which generate SDP with a connection address of 0.0.0.0 to indicate
call hold, and expect to receive such to be put on-hold. SIP B2BUA
middleboxes already interwork such cases to/from an SDP sendonly or
inactive direction mode, but the device receiving the SDP connection
address of 0.0.0.0 will not generate RTCP until the call is taken
off hold. Therefore, if WebRTC requires Browsers to receive RTCP as
a consent-refresh to continue the call, the call will fail if it is
put on hold too long. To avoid the call failure, the Interworking
Function may have to generate RTCP, which is complicated and thus
expensive.
5.4.2 RTP Generation when on Hold or Mute
Another potential issue depends on what the Browser does when
Javascript tells it to put the session on mute (i.e., disable the
microphone/camera inputs), or full hold (i.e., also stop rendering
received media). If the Browser stops generating RTP, but does not
send SDP to the SIP domain indicating such, the call may fail.
The reason for this is that many SIP Enterprises and Service
Providers have middleboxes in various locations, which detect an
absence of RTP packets for a sendrecv-mode call as a call failure,
and will tear the call down by issuing BYEs. Therefore, if a WebRTC
user puts a call on mute or hold by no longer generating RTP but
does not send SDP to the SIP domain indicating the appropriate
direction attribute, the call will be terminated eventually by the
SIP domain.
Kaplan Expires - May 2011 [Page 13]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
One way to avoid this is to offer the ability for the Javascript to
tell the Browser to turn off the microphone/camera inputs, while
still generating RTP packets.
5.4.3 Clipping with Off-hold/off-mute
Another issue is the clipping that can occur when taking a call off
hold or off mute. If the SIP user puts a call on hold, and a new
SDP Offer is sent with a direction attribute of sendonly, and some
time later the user takes the call off hold, it will take some time
to get a new SDP Offer to the WebRTC side Browser; the extra time it
takes may cause clipping: the WebRTC user will be able to hear/see
but not speak/be-seen for a bit. Likewise for the reverse
direction: if the WebRTC user puts the call on/off-hold.
In SIP, this generally doesn't take too long because the signaling
is over UDP, on managed networks, going through tightly managed
servers. In WebRTC, it will likely be over lossy access mediums,
over TCP, across the public Internet, and through Web Servers
performing a lot of other functions. A clever Web-Application
developer, therefore, might realize that clipping can be avoided by
not notifying the Browser of any direction change when the call is
put on hold from SIP; such a developer could have the Javascript
change the SDP Offer before giving it to the Browser, to be
sendrecv. What's needed, then, is the ability to tell the Browser
not to render received from the on-hold Browser and not send it to
the peer, so the peer never stops sending RTP to the on-hold
Browser; or the developer could be even too clever and send the
direction information separately in a direct data channel, for
example.
5.5. Call Transfer in SIP Domain
A WebRTC domain user should be able to call to, or be called by, an
Enterprise or Service Provider and have their call transferred to
another user in the same or different Enterprise or Service
Provider.
In the SIP signaling architecture model, this should either require
the SIP domain to issue a REFER request to the WebRTC domain's
logical SIP UA, to tell the logical UA to generate an INVITE to the
new party; or it should require the SIP domain to issue an INVITE
with Replaces header to the WebRTC domain's logical SIP UA, to
replace the original dialog. In the former case, it requires the
WebRTC application to issue a new SDP Offer for a new session; in
the latter case it causes the WebRTC application to receive an SDP
Offer for a new session. In both cases, however, the general
expectation of users is that the media impacts are minimal or non-
Kaplan Expires - May 2011 [Page 14]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
existent: they may hear a short-duration click or nothing at all
when the audio party changes. Likewise they would probably expect
to see the new transferred-to party in the same video window.
In practice for audio-only calls, it is quite common for the SIP
transfer to occur without the transferred UA being aware of it, by
having the REFER and INVITE signaling from the
transferor/transferred-to be locally processed by B2BUAs, such as a
PBX, Application Server or SBC. It is not very common, for example,
to send REFER or INVITE with Replaces-header SIP Requests across SIP
Enterprise-to-Service-Provider trunks or between Service Providers.
In practice, therefore, SIP and SDP signaling may not be sent to the
WebRTC domain for this call transfer use-case.
The RTP media source will change inside the Enterprise or Service
Provider, of course, but the change is hidden by the transfer-
processing B2BUA, at least at an IP:port transport layer. At an
audio codec and RTP layer, however, the change is frequently not
hidden, and the result is the transferred party suddenly starts
receiving RTP/RTCP packets from a new SSRC, sequence number space,
timestamp, CNAME, etc. The same Payload Type and codec is used, of
course. Naturally, this assumes SRTP is not used or not used end-
to-end (i.e., it may be terminated at the transfer-processing
B2BUA).
From a WebRTC interworking perspective, what this means is that the
Browser has to be able to receive a new SSRC and timestamp/sequence
number space from the Interworking Function, without receiving a new
SDP Offer, without changing SRTP keys, and without ICE re-
negotiation.
Note that this use-case describes Call Transfer cases, but similar
media-plane behavior sometimes occurs in Call Park and Pickup, Find-
Me-Follow-Me, Call Hunting, Rich-Ringtone, and Voicemail fallback
cases.
5.6. Audio/Video Call Transfer
A WebRTC domain user should be able to call to, or be called by, an
Enterprise or Service Provider and transfer their WebRTC call to
another user in the same or different WebRTC domain, SIP Enterprise
or Service Provider. This is similar to the previous use-case but
the WebRTC user is now the transferor.
In the SIP signaling architecture model, this should either require
the WebRTC domain to issue a REFER request to the SIP domain, to
tell the logical UA to generate an INVITE to the new party; or it
should require the SIP domain to issue an INVITE with Replaces
header to the WebRTC domain's logical SIP UA, to replace the
Kaplan Expires - May 2011 [Page 15]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
original dialog. In the former case, it requires the WebRTC
application to issue a new SDP Offer for a new session; in the
latter case it causes the WebRTC application to receive an SDP Offer
for a new session. In both cases, however, the general expectation
of users is that the media impacts are minimal or non-existent: they
may hear a short-duration click or nothing at all when the audio
party changes, and they likely expect the video rendering to replace
the previous video in the same window, even though the incoming SDP
Offer is for a new logical session.
5.7. Find-Me-Follow-Me in SIP Domain
A WebRTC domain user should be able to call to a SIP Enterprise or
Service Provider and have their call find the target user in the
same or different Enterprise or Service Provider, with a SIP Find-
Me-Follow-Me service (FMFM). FMFM service is similar to Call
Hunting and Call Forwarding services, but with the caller hearing a
"Please wait while we try to locate your party" type announcement
message. (Note that Call Hunting and Call Forwarding services
sometimes do this as well, in which case they're the same as FMFM)
A common method of providing FMFM is for the SIP INVITE to be
logically or physically forked to a media server that generates the
announcement; the media server sends back a 18x response with an
initial SDP Answer, and then when the final UAS is reached the UAS
sends a 200 response with a final SDP Answer. To the SIP UAC (i.e.,
the Web Server), it often appears as a parallel-forked call case.
Therefore the WebRTC model must support forked SIP calls, with two
or more SDP Answers for a given Offer. It is likely that Web-
Application developers will want this type of behavior as well, even
for WebRTC uses that do not go to SIP.
From an SDP offer/answer perspective, this means WebRTC needs to
support multiple, provisional SDP Answers. How it does so is beyond
the scope of this document.
From a media perspective, this means the WebRTC Browser needs to be
able to receive and render media from different IP/RTP peers on the
same local listen IP:port at different times, without having
generated nor received a new SDP Offer in-between.
Note that this use-case describes FMFM cases, but similar media-
plane behavior sometimes occurs in Call Park and Pickup, Call
Hunting, Rich-Ringtone, and Voicemail fallback cases.
It should also be noted that some media servers generate the
announcement message without sending a provisional 18x response with
SDP Answer. Such servers won't function correctly with UAs behind
NATs anyway, since an SDP Answer has to be sent to perform either
Kaplan Expires - May 2011 [Page 16]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
ICE or SBC-type Latching; and many PSTN Gateways won't accept media
until they get an SDP Answer either. Therefore, such media servers
have issues even in SIP, and can be effectively ignored for the
purposes of this document.
5.8. Video in SIP Domain
A WebRTC domain user should be able to make a video call to, or be
called by, a SIP Enterprise or Service Provider. While video is not
nearly as ubiquitous in SIP as audio-only calls, it does exist and
is a growing market, particularly now that most video-conferencing
vendors (both terminals and MCUs) have shifted from H.323 to SIP.
5.8.1 Video and SIP/SDP
From a SIP perspective there is nothing unique about this use-case;
but from an SDP perspective some video MCUs use the [SDP-CAP-NEG]
SDP capability negotiation mechanism. The author believes this
should not pose a problem for WebRTC, as [SDP-CAP-NEG] is backwards-
compatible with basic [SDP-CAP-NEG] SDP and reverts to using it.
[Note: what are the impacts for video-conf calls if SDP-CAP-NEG is
not used? Video MCU vendors need to be consulted]
5.8.2 Video Codec Compatibility
Codec compatibility is a concern because transcoding video codecs in
the Interworking Function would be prohibitively expensive: DSPs
don't scale well for video, and are very expensive. If the
currently used video codecs in SIP are all encumbered by royalties,
then the author recognizes this may not be a solvable problem for
Browsers.
5.8.3 Separate Video RTP Stream
SIP-based video terminals/MCUs use separate RTP sessions, in
separate UDP port numbers, for video vs. audio media. Furthermore,
some use separate video RTP sessions for separate cameras/screens,
while some use the same one and de-multiplex using SSRC.
[Note: this latter use is believed but not known by the author]
5.8.4 Video RTP Packet Size
Video-codec RTP packet size is a concern if IP-layer fragmentation
occurs, because many NATs and middleboxes discard IP fragments;
otherwise they would have to re-assemble them to correctly process
the whole UDP packet, and such re-assembly is processing intensive.
Carrier Grade NATs (CGNs), consumer NATs, and Firewalls, have
similar behavior, and thus this is an issue for WebRTC video usage
in general on the public Internet.
Kaplan Expires - May 2011 [Page 17]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
In particular, although video codecs can "fragment" themselves at
the codec layer, in deployed SIP and H.323 uses it has been found
that some devices don't do so, resulting in IP-fragmented packets
that get dropped along the way. Other devices constrain themselves
to an IP MTU of 1500 bytes, without leaving overhead space for
packet growth on the path, as can be caused by IPv4-to-IPv6
conversion, IPsec tunneling/VPNs, SSL-VPNs, etc. Unfortunately,
path MTU discovery is not supported or used in practice. Therefore,
the Browser's maximum codec packet size needs to be carefully
thought out.
6. Signaling-plane Interworking Requirements
REQ-ID DESCRIPTION
----------------------------------------------------------------
A1-1 WebRTC MUST provide a means for a sent
SIP SDP Offer to be forked and receive
multiple SDP Answers; how WebRTC accomplishes
this internally is up to the RTCWEB WG,
and need not require SDP be used in WebRTC.
----------------------------------------------------------------
A1-2 WebRTC MUST provide a means for a received
SIP SDP Offer to be Answered to a completion
state; i.e., that the SIP-side can know to
send a final SDP Answer back to the SIP domain,
either in a 200 OK or reliable provisional
response.
----------------------------------------------------------------
A1-3 WebRTC MUST provide a means for a received
session request to be requested without an SDP
Offer, and to send an SDP Offer from WebRTC
back to the SIP side; i.e., that the SIP-side
can receive a SIP INVITE without SDP, and be
able to send back SDP Offer in a response.
----------------------------------------------------------------
A1-4 WebRTC MUST provide a means for the
Browser to indicate SRTP [SDES], [DTLS-SRTP],
or RTP optionally in SDP. In other words
either [SDP-CAP-NEG] or some similar
mechanism, such as [draft-best-effort-srtp],
in order to make an SDP Offer that offers
both plaintext RTP and both types of SRTP key
exchanges.
----------------------------------------------------------------
A1-5 It MUST be possible for an Interworking Function
determine whether interworking is required based
on the signaling, on a session-by-session basis,
Kaplan Expires - May 2011 [Page 18]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
in order to avoid performing interworking at
either a signaling or media layer when it is not
needed.
----------------------------------------------------------------
7. Media-plane Interworking Requirements
REQ-ID DESCRIPTION
----------------------------------------------------------------
A2-1 WebRTC MUST provide a means for the
Browser to generate and receive RTP
and RTCP using UDP transport.
----------------------------------------------------------------
A2-2 WebRTC Browsers MUST support the ability
to use separate, distinct RTP sessions
on separate UDP ports for separate media
streams, such as audio vs. video.
----------------------------------------------------------------
A2-3 WebRTC Browsers SHOULD support the ability
to use the same UDP port for RTP and
RTCP of the same media type, without
needing to also multiplex media types
on the same UDP port.
----------------------------------------------------------------
A2-4 WebRTC SHOULD provide a means for the
Browser to generate and receive RTP
without having to perform ICE.
----------------------------------------------------------------
A2-5 WebRTC MUST provide a means for the
Browser to generate and receive RTP
with an ICE-Lite peer.
----------------------------------------------------------------
A2-6 WebRTC Browsers MUST support the
G.711 PCMU and PCMA codecs for 10,
20, and 30ms packetization times.
----------------------------------------------------------------
A2-7 WebRTC Browsers MUST support the
G.729, G.722, G.722.1, AMR, and AMR-WB codecs.
----------------------------------------------------------------
A2-8 WebRTC Browsers MUST support the
H.263 and H.263+ codecs.
----------------------------------------------------------------
A2-9 WebRTC Browsers MUST support the
H.264-AVC and SVC codecs for Baseline
profile.
----------------------------------------------------------------
A2-10 WebRTC Browsers MUST support a
minimum of QCIF, QSIF, CIF, and SIF
resolutions, and optionally higher.
Kaplan Expires - May 2011 [Page 19]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
----------------------------------------------------------------
A2-11 WebRTC Browsers MUST not generate
RTP or RTCP packets larger than 1400
bytes at an IP layer using UDP transport.
----------------------------------------------------------------
A2-12 WebRTC MUST provide a means for the
Browser to generate and receive RTP
without receiving RTCP, for at least the
G.711 PCMU and PCMA codecs.
----------------------------------------------------------------
A2-13 WebRTC MUST provide a means for the
Browser to generate and receive RTP
and RTCP over UDP without using SRTP.
----------------------------------------------------------------
A2-14 WebRTC MUST provide a means for the
Browser to generate and receive SRTP
using [SDES]; at least if the Web-Server
connection is HTTPS.
----------------------------------------------------------------
A2-15 WebRTC MUST provide a means for the
Browser to receive RTP/RTCP from a different
peer RTP stack instance, over the same
IP and port 5-tuple, at any time. In other
words, the SSRC, timestamp, sequence number
space, etc., may change during the lifetime
of receiving a remote stream, without the
remote IP:port nor SRTP key changing, and
without ICE restarting.
----------------------------------------------------------------
7.1. RFC 4733 DTMF Requirements
The following requirements are driven by the need for WebRTC
applications to generate DTMF events to non-WebRTC media servers,
gateways, and devices, as described in section 5.3.
It should be noted the following requirements only address a subset
of capabilities in [RFC4733] - namely those the author believes
actually matter for WebRTC, and that have the highest chances of
being interoperable. Note that these are also backwards-compatible
with RFC 2833, which is what most deployed devices actually
implemented to.
----------------------------------------------------------------
A3-1 WebRTC MUST provide a means for the
Browser to generate [RFC4733] DMTF RTP
Telephone-events for at least the events 0-15, in
an audio-type RTP packet stream.
Kaplan Expires - May 2011 [Page 20]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
----------------------------------------------------------------
A3-2 WebRTC MAY provide a means for the
Browser to receive [RFC4733] DMTF RTP
Events for the telephone-events 0-15.
----------------------------------------------------------------
A3-3 WebRTC MUST indicate [RFC4733] telephone-event
support in SDP, even if the browser cannot
receive and render [RFC4733] packets. This is
needed to make the far-end know the local browser
supports [RFC4733] and respond with it.
----------------------------------------------------------------
A3-4 WebRTC MUST provide a means for the
Javascript application to invoke [RFC4733]
DTMF events to be generated, and their
duration, with a default duration of 100ms.
In other words, the Javascript should be
able to tell the Browser to generate event
"0" for 100ms based on a button click, for
example.
----------------------------------------------------------------
A3-5 WebRTC MUST provide a means for the
Javascript application to enable or
disable [RFC4733] use, per session.
----------------------------------------------------------------
A3-6 WebRTC MUST NOT generate [RFC4733] events closer
than 50ms back-to-back. In other words, even if
the Javascript calls the API repeatedly or
provides a string of digits to send, the browser
must enforce a minimum of 50ms inter-event gap.
----------------------------------------------------------------
A3-7 WebRTC MUST generate [RFC4733] events using
the same SSRC as the audio codec(s) for a
stream.
----------------------------------------------------------------
A3-8 WebRTC MUST generate [RFC4733] events in the
same RTP sequence number space as the audio RTP
packets.
----------------------------------------------------------------
A3-9 WebRTC MUST generate [RFC4733] events with the
same clock frequency and timestamp space as
the audio.
----------------------------------------------------------------
A3-10 WebRTC MUST NOT generate audio RTP packets while
sending [RFC4733] DTMF event packets.
----------------------------------------------------------------
A3-11 WebRTC SHOULD generate [RFC4733] events using a
volume decimal value of 10 (binary 1010).
----------------------------------------------------------------
A3-12 WebRTC SHOULD NOT generate an [RFC4733] event for
Kaplan Expires - May 2011 [Page 21]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
longer than can be represented by the duration
field (typically 8 seconds for an 8KHz clock).
Although a means to do so is described in section
2.5.1.3 in [RFC4733], it is rarely supported.
----------------------------------------------------------------
A3-13 WebRTC MUST NOT concatenate multiple [RFC4733]
events into one RTP packet.
----------------------------------------------------------------
8. Security Considerations
From a SIP-signaling perspective, this document makes no
requirements which impact SIP-signaling security. SIP over TLS may
be used, or not, depending on what the WebRTC domain and SIP
Enterprise or Service Provider supports, with the usual security
issues and implications.
If [RFC4474] is used, the Interworking Function would likely need to
change SDP and thus break the signature, and would have to verify
and re-sign the request using a certificate it owns. Or the
Interworking Function could also be the trusted signer and verifier
for a domain to begin with, in which case it signs and verifier only
once. In practice, [RFC4474] is not used by most SIP Service
Providers and Enterprises, so it does not matter.
From a media-plane perspective, the difficulty of communicating with
deployed SIP devices using SRTP is discussed in section 5.2. The
idea of not requiring SRTP be used for all sessions is
controversial, but the author believes if the WebRTC Web-Server and
Browser are not using HTTPS but only plaintext HTTP, then a user
should not expect the session to be secure; thus, at least in this
case, SRTP should be optional. When HTTPS is being used, the idea
of not using SRTP becomes less appealing as the user likely expects
the session to be secure; but in such a case optionally using [SDES]
would also seem more reasonable than only allowing [DTLS-SRTP].
Technically, [SDES] is less secure than [DTLS-SRTP] in the sense
that the WebRTC Web-Server and Javascript can view the keys; and
with [DTLS-SRTP] the user could verify the session is secure end-to-
end by manually checking the fingerprint and asking the far-end user
if they sent it. Unless the user actually performs the manual
inspection and verification, however, [DTLS-SRTP] proves no more
than [SDES] does, since the Javascript could have maliciously sent
the call through a Man-in-the-Middle that terminated the DTLS-key-
based SRTP. In fact, in order to interwork with deployed SIP
devices it would have to use a middleman: the Interworking Function
itself. Therefore, there is little to gain by not just supporting
[SDES] as well as [DTLS-SRTP]; those users who wish to verify the
Kaplan Expires - May 2011 [Page 22]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
security can still do so, in exactly the same way they would verify
[DTLS-SRTP] fingerprints, and see there is no fingerprint to verify,
with appropriate text explaining why.
9. IANA Considerations
This document makes no request of IANA.
10. Acknowledgments
Thanks to Xavier Marjou, Victor Pascual, Parthasarathi Ravindran,
Roman Shpount, Randell Jesup, and others on the RTCWEB mailing list
for input. Funding for the RFC Editor function is provided by the
IETF Administrative Support Activity (IASA).
11. References
11.1. Informative References
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
"SIP: Session Initiation Protocol", RFC 3261, June 2002.
[RFC4566] Handley, M., Jacobson, V., Perkins, C., "SDP: Session
Description Protocol", RFC 4566, July 2006.
[ICE] Rosenberg, J., "Interactive Connectivity Establishment (ICE):
A Protocol for Network Address Translator (NAT) Traversal for
Offer/Answer Protocols", RFC 5245, March 2010.
[SDES] Andreasen, F., Baugher, M., and D. Wing, "Session Description
Protocol (SDP) Security Descriptions for Media Streams", RFC
4568, July 2006.
[RFC4733] Schulzrinne, H., Taylor, T., "RTP Payload for DTMF Digits,
Telephony Tones, and Telephony Signals", RFC 4733, December
2006.
[KPML] Burger, E., Dolly M., "A Session Initiation Protocol (SIP)
Event Package for Key Press Stimulus (KPML)", RFC 4730,
November 2006.
[DTLS-SRTP] McGrew, D., Resocrla, E., " Datagram Transport Layer
Security (DTLS) Extension to Establish Keys for the Secure
Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
[RFC4474] Peterson, J., Jennings, C., "Enhancements for
Authenticated Identity Management in the Session Initiation
Protocol (SIP)", RFC 4474, August 2006.
Kaplan Expires - May 2011 [Page 23]
Internet-Draft WebRTC-SIP Interworking Requirements November 2011
[SDP-CAP-NEG] Andreasen, F., "Session Description Protocol (SDP)
Capability Negotiation", RFC 5939, September 2010.
[draft-best-effort-srtp] Kaplan, H., Audet, F., "Session Description
Protocol (SDP) Offer/Answer Negotiation For Best-Effort Secure
Real-Time Transport Protocol", draft-kaplan-mmusic-best-effort-
srtp-01, October 2006.
[draft-use-cases] Holmberg, C., Hakansson, S., Eriksson, G., "Web
Real-Time Communication Use-cases and Requirements", draft-
ietf-rtcweb-use-cases-and-requirements-06, October 4, 2011.
Author's Address
Hadriel Kaplan
Acme Packet
Email: hkaplan@acmepacket.com
Kaplan Expires - May 2011 [Page 24]