Internet DRAFT - draft-kaplan-rtcweb-api-reqs
draft-kaplan-rtcweb-api-reqs
Network Working Group H. Kaplan
Internet Draft Acme Packet
Intended status: Informational D. Burnett
Expires: April 31, 2012 Voxeo
N. Stratford
Voxeo
Tim Panton
PhoneFromHere.com
October 31, 2011
API Requirements for WebRTC-enabled Browsers
draft-kaplan-rtcweb-api-reqs-01
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 31, 2011.
Copyright and License Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
Kaplan Expires April 24, 2012 [Page 1]
Internet-Draft Tao of Web October 2011
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the BSD License.
Abstract
This document discusses the advantages and disadvantages of several
proposed approaches to what type of API and architectural model
WebRTC Browsers should expose and use. The document then defines
the requirements for an API that treats the Browser as a library and
interface as opposed to a self-contained application agent.
Table of Contents
1. Terminology...................................................2
2. Introduction..................................................2
3. Defining a WebRTC Protocol in the Browser.....................4
4. Leaving Logic to Web Developers...............................6
5. API Requirements..............................................8
5.1. Browser User-Interface Requirements......................8
5.2. Media Properties.........................................9
5.3. RTP/RTCP Properties.....................................10
5.4. Data-stream Properties..................................11
5.5. IP and ICE Properties...................................11
5.6. API Design Recommendations..............................12
6. Security Considerations......................................12
7. IANA Considerations..........................................12
8. Acknowledgments..............................................12
9. References...................................................12
9.1. Informative References..................................12
Authors' Addresses...............................................12
1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119. The
terminology in this document conforms to RFC 2828, "Internet
Security Glossary".
2. Introduction
There has been a long discussion in the RTCWEB Working Group mailing
list concerning whether any "signaling" or protocol should be
standardized between the Browser and its Server, other Browsers, or
the JavaScript it runs.
Kaplan, et al Expires - April 2012 [Page 2]
Internet-Draft Tao of Web October 2011
Within the context of the WebRTC Browser architecture, shown in
Figure 1 below, the discussion is centered on how much intelligence,
logic, state, and decisions are built into the Browser, vs. provided
by Javascript.
+------------------------+ On-the-wire
| | Protocols
| Servers |--------->
| |
| |
+------------------------+
^
|
|
| HTTP/
| Websockets
|
|
+----------------------------+
| Javascript/HTML/CSS |
+----------------------------+
Other ^ ^WebRTC
APIs | |API
+---|-----------------|------+
| | | |
| +---------+|
| | Browser || On-the-wire
| Browser | RTC || Protocols
| | Function|----------->
| | ||
| | ||
| +---------+|
+---------------------|------+
|
V
Native OS Services
Figure 1: Browser Model
There has been some discussion that the protocol running over the
HTTP/Websockets connection between the Javascript and the Server be
standardized, which will be discussed in this document.
There has also been discussion that the interface between the
Javascript and Browser be a protocol, rather than an API, such that
the JavaScript could pass the protocol's messages as opaque blobs
between two Browsers to establish the media-plane characteristics.
Kaplan, et al Expires - April 2012 [Page 3]
Internet-Draft Tao of Web October 2011
An example of such a protocol interface is described in [draft-
roap]. That will also be discussed in this document.
The conclusion of the discussion in this document concerning those
designs is that they are detrimental to the applications enabled
with WebRTC, and contrary to the Web Application model in general.
Therefore, this document defines the beginning list of requirements
for an API that we feel is more appropriate for Browsers to expose.
3. Defining a WebRTC Protocol in the Browser
Proposals have been made for integrating an entire session signaling
protocol into the Browser, in [draft-signaling], and for integrating
an SDP offer/answer protocol in the Browser, in [draft-roap]. This
section discusses the benefits and drawbacks of doing such things in
Browsers.
1) For a session signaling mechanism to work, it is not sufficient
to just implement something in the browser. The Server also has
to have be involved in the protocol, in order to forward the
protocol messages between the appropriate Browsers. Minimally,
this requires identity and location services, such as a user
database and which Browser connections are which user, etc. Often
it involves authentication and authorization decisions as well.
In SIP, for example, this would be the role of a Proxy and
Registrar. In Web applications, such things are typically handled
in an application-specific way based on the needs and architecture
of the specific Web application. For example, a gaming website
already knows who its users are and where they are as part of
their game application, while Facebook already knows who its users
are and where they are in its specific application. There is no
need to standardize this in any way, and attempting to do so would
be fruitless since it would have to make assumptions about the
applications that could not possibly be known in advance, and thus
not usable in practice.
2) For either session signaling or SDP offer/answer protocols,
integrating the protocols into the Browser means more logic and
state is in the Browser, and ultimately more code. This leads to
the following properties:
a. It is easier for simple Web-application developers to
initially deploy, if the code they need was built-in to the
Browser the way they needed it to be.
b. But, the more logic is placed into the Browser, the more
need there is to extend/enhance/fix that logic in the future,
and Web-application developers have little control over users
to upgrade their Browsers.
c. History has shown that the more complex the interface is
between two implementations, the more interoperability
problems occur. Ultimately the best way to provide
Kaplan, et al Expires - April 2012 [Page 4]
Internet-Draft Tao of Web October 2011
interoperability is to run the same actual source-code; short
of that, the less logic placed into it, the better the odds
of interoperability.
3) For the SDP offer/answer protocol proposal, a benefit is that for
some very simple applications it makes deployment easier, if the
simple application does not need to know anything about the SDP
content or offer/answer semantics. If an application needs
control at the media layer, then it could create a fake
shim/interface from some real SDP in Javascript, to the SDP
offer/answer protocol in the Browser. Thus it trades off
additional simplicity for simple applications, against additional
complexity for advanced applications. If the goal of WebRTC is
purely simplicity, this might seem a reasonable trade-off; if the
goal is innovation, however, then making it harder for advanced
uses means making it harder to innovate.
4) For the SDP offer/answer protocol proposal, an argument has been
made that the logic/state required for media already has to exist
in the Browser itself, and thus splitting the domain of
responsibility between the Browser and Javascript is more
difficult than keeping it all in the Browser. We believe this
conclusion is drawn from an implicit assumption that the Browser
should be dealing with SDP to begin with. Unfortunately, SDP is
not just about media characteristics. There are numerous
attributes in SDP that are actually properties of a higher layer
than RTP and codecs. For example, the following IANA-registered
SDP attributes would be unknown to a media library in the browser
and only known to the Javascript: cat, keywds, tool, type,
charset, lang, setup, connection, confid, userid, floorid, and
probably a bunch more (we haven't investigated them all). The
point is that it is NOT true that "all the SDP information needs
to be handled by the Browser, so why not put offer/answer in it
too?".
5) Building the SDP offer/answer model into the Browser restricts
the Web application to only being able to do things that can be
encoded and communicated with the SDP offer/answer model. As an
example of something that cannot be accomplished because of this:
imagine a Web-application that allows the Browser to communicate
with a TelePresence (TP) system. TP systems have multiple
cameras, screen displays, microphones, and speakers. A PC-based
Browser typically only has a single microphone and camera, but can
display multiple video feeds separately and can render-mix the
incoming audio streams. Thus, a Browser to TP system would
produce an asymmetric media stream model: multiple video streams
from the TP system to the Browser, and one video stream from the
Browser to the TP system, and the same for audio. Each TP stream
is an independent RTP session and has unique attributes to
indicate position (left/center/right). Encoding that is currently
not possible with SDP offer/answer; not only because the SDP
attributes aren't yet defined, but because the offer/answer model
Kaplan, et al Expires - April 2012 [Page 5]
Internet-Draft Tao of Web October 2011
assumes a symmetric number of media-lines (m= lines), and also
that attributes represent media-receiving characteristics as
opposed to media-sending capabilities. Clearly if and when SDP is
changed to handle TelePresence cases, Browsers could be upgraded
to handle it as well sometime after; but they wouldn't need to if
the Browser hadn't been involved in SDP to begin with. SDP
information isn't strictly that of an RTP library layer; it's not
a one-to-one correlation.
6) Some Web application developers may prefer to make the decision
of which codecs/media-properties to use in the Server, and command
all the Browsers in a given session to do so. In some respects
this is the very simplest model possible; but with SDP
offer/answer model being forced on the developer it becomes much
more complicated to achieve.
7) Since SDP offer/answer mechanism is a protocol, involving both
state machines and encoding schemes, interoperability between
different vendor implementations is not guaranteed. In fact,
real-world SIP deployments have experienced interoperability
problems with both SDP and the offer/answer model.
8) For both session signaling and SDP offer/answer, troubleshooting
and debugging become difficult for the web-app provider if a
problem occurs in the protocol built in the Browser. Even if the
Javascript snoops on SIP or ROAP message exchanges and pushes back
copies to the server in case of failure, the developer has to
guess what the cause of an error response is. In other words,
it's the difference between having only Wireshark traces to debug
with vs. also having internal logs from code procedures.
9) Using the SDP offer/answer model provides a more rigid API
interaction model, enabling Browser vendors to perform less
testing and provide more robust implementations than exposing all
discrete components to a Javascript API would.
10) Using a higher-level API model, such as would be done with an
SDP offer/answer model, means the cross-browser vendor-specific
variances would be reduced. Exposing a lower-level API would
inevitably lead to some differences in different browsers due to
differences in their architectures/implementation.
4. Leaving Logic to Web Developers
The alternative to embedding protocols in the Browser, is to leave
the work up to Javascript, for whatever "work" might be required for
the particular application. After all, the actual knowledge of what
the specific Web application does, wants, how it encodes it, etc. is
only fully known by the Web developer for that application, and thus
by the Javascript+Server-code combination employed (i.e., the
application "source-code").
Kaplan, et al Expires - April 2012 [Page 6]
Internet-Draft Tao of Web October 2011
Clearly the Browser needs to perform quite a bit of "logic": for
implementing codec rendering/encoding, RTP/RTCP protocols, SRTP, and
ICE. That is unavoidable, and not in question. The question is who
should be in control, where any additional logic should be placed,
and what the API model should be.
There has been discussion that WebRTC should strive to enable media
communication session with about "20 lines of code". We assert the
only means of achieving that goal in a production-deployment manner
is to use Javascript, and in particular Javascript libraries.
Javascript libraries are used by a huge number of Web applications,
and they work. Some of the libraries are so popular, reference
books have been published for them. Yes, there are a lot of
libraries, but that's a *good thing*.
The properties of using Javascript, and Javascript libraries, are as
follows:
1) The logic is under the control of the Web developer. This means:
a. If something is broken, the Web developer can generate log-
type debug information within the javascript and push it back
to the server or log collector, and determine what is broken
and when to fix it; they do not need to rely on asking for
the user to provide Browser logs, rely on the Browser to
generate useful logs, understand the logs, nor rely on
Browser manufacturers to prioritize fixing them.
b. If an enhancement can be made, the Web developer can decide
if and when to do so; they do not need to rely on Browser
manufacturer decisions and timescales.
2) All the Browsers using a given application site run the same
literal Javascript source-code provided for that application.
There is no greater means of achieving interoperability than that.
3) If specific Browsers enable something proprietary, or some new
media extension, the Web developer can decide whether to use it or
not, when, and how. And the mechanism can be made flexibly; for
example, new codecs do not need new Javascript code to be used,
unless the Javascript wishes to follow a model where they do need
new Javascript to be used. (see API section on this) In other
words, the Web-application developer can be as conservative or
liberal as he/she wishes. There are already known use-cases where
a Web-app will never want to use new codecs or capabilities
introduced into Browser RTP libraries, and there are known use-
cases for the opposite. Let the Web-app developer make that
choice.
4) The Javascript code does need to be downloaded (although Browser
caching does exist), and clearly the larger the Javascript, the
longer it takes to download. BUT, popular Javascript libraries
are so necessary in modern Web applications, that they are often
available for free and fast downloading by local delivery
networks.
Kaplan, et al Expires - April 2012 [Page 7]
Internet-Draft Tao of Web October 2011
5) There are properties of the media library API that Javascript may
need to access that cannot and should not be expressed in SDP.
Some of these are described in the "Hints" and "Stats" section of
[draft-roap]. These will need a true API rather than an SDP
offer/answer protocol to learn, yet they are tied to the
information in the SDP regarding the media streams and codecs.
Therefore, it is not the case that the Javascript does not need to
understand SDP and could treat it as an opaque blob.
6) There are settings of the media library API that Javascript may
want to set that cannot be expressed in SDP. For example, setting
which local audio or video sources to fork to two or more remote
parties. Another example is local Javascript setting the media
library to use audio only even if an incoming session's remote
peer Browser indicates both audio and video, because the local
user only wants to use audio right now (e.g., they pressed some
Javascript-provided button which meant "audio-only" because
they're not wearing proper attire for this particular session).
These types of decisions and logic are not the domain of the
Browser, but rather the Javascript, yet they are also integral to
the SDP offer/answer.
7) For the Browser manufacturers, testing every discrete API
setting, in every possible permutation, and in all possible timing
sequences that a Javascript could invoke is extremely time-
consuming and error-prone.
5. API Requirements
Some requirements for an API are already documented in [draft-use-
cases-and-requirements]. This section expands upon those in further
detail, and adds new ones. In all cases, the term "Application"
means the Javascript, and "Web API" refers to the Javascript <->
Browser API.
It is not the goal of this document to define the actual API -
that's W3C's job. [Note: this is a strawman list]
5.1. Browser User-Interface Requirements
REQ-ID DESCRIPTION
----------------------------------------------------------------
A1-1 The Web API MUST provide a means for the
application to ask the browser for permission
to use cameras and microphones as input devices.
----------------------------------------------------------------
A1-2 The Web API MUST provide a means for the
application to ask the browser for permission
to the screen, a certain area on the screen
or what a certain application displays on the
screen as input to streams, and which stream.
Kaplan, et al Expires - April 2012 [Page 8]
Internet-Draft Tao of Web October 2011
----------------------------------------------------------------
A1-3 The Web API MUST provide a means for the
application to disable/enable the microphone and
camera inputs. [Note: this does NOT mean
disabling RTP transmission]
----------------------------------------------------------------
A1-4 The Web API MUST provide a means for the
application to disable/enable the rendering of
received audio and video, per stream.
----------------------------------------------------------------
5.2. Media Properties
REQ-ID DESCRIPTION
----------------------------------------------------------------
A2-1 The Web API MUST provide a means for the
application to learn what codecs and codec
properties the Browser supports
----------------------------------------------------------------
A2-2 The Web API MUST provide a means for the
Browser to indicate codecs and codec properties
such that the application does not need to know
about the specific codec type in advance
----------------------------------------------------------------
A2-3 The Web API MUST provide a means for the
Browser to indicate codecs and codec properties
such that the application can use them in SDP,
for example by providing the IANA-registered
encoding name for the payload format, and the
format specific parameters as strings, such that
they could be used in the 'a=rtpmap' and 'a=fmtp'
lines of SDP should the Javascript wish to
create SDP containing codecs unknown to it.
----------------------------------------------------------------
A2-4 The Web API MUST provide means for the
application to get the following media codec
properties: bandwidth, clock rate, number of
channels, type (audio vs. video)
----------------------------------------------------------------
A2-5 The Web API MUST provide a means for the
application to get the bandwidth values for
codecs which support multiple levels, and set
it for codecs which can be controlled/primed.
----------------------------------------------------------------
A2-6 The Web API MUST provide a means for the
application to set whether to use silence
suppression or not, for codecs which support it.
----------------------------------------------------------------
Kaplan, et al Expires - April 2012 [Page 9]
Internet-Draft Tao of Web October 2011
A2-7 The Web API MUST provide a means for the
Browser to notify the application when a used
codec falls below a given quality threshold
[Note: it is TBD what "quality" means]
----------------------------------------------------------------
A2-8 The Web API MUST provide a means for the web
application to detect the level in audio
streams.
----------------------------------------------------------------
A2-9 The Web API MUST provide a means for the web
application to adjust the level in audio
streams.
----------------------------------------------------------------
5.3. RTP/RTCP Properties
REQ-ID DESCRIPTION
----------------------------------------------------------------
A3-1 The Web API MUST provide a means for the
application to get and set the SSRC value(s)
----------------------------------------------------------------
A3-2 The Web API MUST provide a means for the
application to get and set the CNAME value(s)
----------------------------------------------------------------
A3-3 The Web API MUST provide a means for the
application to get and set the Payload Type
value(s) for each of the codecs
----------------------------------------------------------------
A3-4 The Web API MUST provide a means for the
application to set the audio and video codecs
to be used for each stream, for both rendering
and generating separately, at any time.
----------------------------------------------------------------
A3-5 The Web API MUST provide means for the
application to set whether to use SRTP, its
encryption algorithm and key length, with or
without authentication
----------------------------------------------------------------
A3-6 The Web API MUST provide a means for the
application to set whether to use SRTP or not,
and which key exchange type to use
[Note: this is TBD pending SRTP decisions of WG]
----------------------------------------------------------------
A3-7 The Web API MUST provide a means for the
application to set the SRTP master key value(s)
----------------------------------------------------------------
A3-8 The Web API MUST provide a means for the
application to get DTLS-SRTP fingerprint value(s)
Kaplan, et al Expires - April 2012 [Page 10]
Internet-Draft Tao of Web October 2011
----------------------------------------------------------------
A3-10 The Web API MUST provide a means for the
application to enable/disable generating RTP per
stream [Note: this does not disable RTCP]
----------------------------------------------------------------
A3-11 The Web API MUST provide a means for the
application to be notified by the Browser if
RTCP is no longer being received from the far-end
----------------------------------------------------------------
5.4. Data-stream Properties
This section will detail requirements for the API for the client-to-
client data connection stream.
[TBD, since no other document has proposed anything for this yet
either]
5.5. IP and ICE Properties
----------------------------------------------------------------
A5-1 The Web API MUST provide a means for the
application to get IPv4/v6 addresses and ports
for receiving ICE/RTP/RTCP on, per stream
----------------------------------------------------------------
A5-2 The Web API MUST provide a means for the
application to set a list of the remote IPv4/v6
addresses and ports to send to, per stream
----------------------------------------------------------------
A5-3 The Web API MUST provide a means for the
application to set a list of TURN servers to use,
including passwords
----------------------------------------------------------------
A5-4 The Web API MUST provide a means for the
application to set a list of STUN servers to use
----------------------------------------------------------------
A5-5 The Web API MUST provide a means for the
application to set the local ICE username and
password
----------------------------------------------------------------
A5-6 The Web API MUST provide a means for the
application to set the remote ICE username and
password to perform connectivity checks with
----------------------------------------------------------------
A5-7 The Web API MUST provide a means for the
application to set the remote IP Addresses and
ports to perform connectivity checks with
----------------------------------------------------------------
Kaplan, et al Expires - April 2012 [Page 11]
Internet-Draft Tao of Web October 2011
A5-8 The Web API MUST provide a means for the
application to get any IP Addresses and ports
learned by the Browser from STUN, TURN, or other
methods (such as UPnP, NAT-PMP, PCP), including
their candidate-type, foundation, etc.
----------------------------------------------------------------
A5-9 The Web API MUST provide a means for the
application to be notified by the Browser for
ICE event state changes
----------------------------------------------------------------
A5-10 The Web API MUST provide a means for the
application to be notified by the Browser if
the local in-use IP address changes or becomes
inactive (e.g., link loss)
----------------------------------------------------------------
5.6. API Design Recommendations
Technically the API design is the role of the W3C. That hasn't
stopped people in the IETF RTCWEB mailing list from discussing it ad
nauseum, however, and even defining a protocol for it. This
document therefore recommends the following to W3C:
1) That the API setters/getters function-arguments use
separate/discrete values, instead of one long string of separate
tokens in a pseudo-arbitrary order with weak and complex encoding
rules.
2) That when the Javascript calls an API setter function to the
Browser, that it be treated as a *command*, not a protocol
negotiation.
3) *IF* any "blob" of information should be passed from the Browser
to the Javascript and vice-versa, for use in such things as SDP,
that it be something for which there would not likely be any use
to a Javascript programmer and for which future extensions/changes
would require Browser changes only but would not be easily
representable in discrete fields. The most likely candidate for
such a need for a "blob" would be ICE-specific SDP attributes.
4) That when a IETF documents start telling you how to build
Javascript APIs, you should run far away... quickly. :)
6. Security Considerations
There are no security implications for this document, yet - this is
just a strawman document.
7. IANA Considerations
This document makes no request of IANA.
Kaplan, et al Expires - April 2012 [Page 12]
Internet-Draft Tao of Web October 2011
8. Acknowledgments
Many of the topics discussed in this document came from numerous
email posts and threads on the IETF RTCWEB mailing list over the
past couple months, so we will likely forget to recognize some
people who have had their input written herein. We believe, though,
that the following folks have possibly emailed something we've
stolen^M^M borrowed: Matthew Kaufman, Roman Shpount, Inaki Baz
Castillo, Albert Einstein, Saul Ibarra Corretge, Victor Pascual,
Henry Sinnreich, and Bernard Aboba.
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
9. References
9.1. Informative References
TBD
Authors' Addresses
Hadriel Kaplan
Acme Packet
Email: hkaplan@acmepacket.com
Dan Burnett
Voxeo
Email: dburnett@voxeo.com
Neil Stratford
Voxeo
Email: nstratford@voxeo.com
Tim Panton
PhoneFromHere.com
Email: tim@phonefromhere.com
Kaplan, et al Expires - April 2012 [Page 13]