Internet DRAFT - draft-jones-avtcore-private-media-reqts
draft-jones-avtcore-private-media-reqts
Network Working Group P. Jones (Ed.)
Internet Draft N. Ismail
Intended status: Informational D. Benham
Expires: September 7, 2015 N. Buckles
Cisco Systems
J. Mattsson
Y. Cheng
Ericsson
R. Barnes
Mozilla
March 7, 2015
Requirements for Private Media in a Switched Conferencing Environment
draft-jones-avtcore-private-media-reqts-01
Abstract
This document specifies the requirements for ensuring the privacy and
integrity of real-time media flows between two or more endpoints
communicating in a switched conferencing environment. This document
also provides a high-level overview of switched conferencing in order
to establish a common understanding of the goals and objectives of
this work.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 7, 2015.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
Jones, et al. Expires September 7, 2015 [Page 1]
Internet-Draft Private Media Requirements March 2015
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction...................................................2
2. Requirements Language..........................................3
3. Terminology....................................................3
4. Background.....................................................4
5. Motivation for Private Media in Switched Conferencing..........5
5.1. Switched Conferencing in Cloud Services...................5
5.2. Private Media Security through Switching..................7
6. Private Media Trust Model......................................8
6.1. Trusted Elements..........................................9
6.2. Untrusted Elements.......................................10
7. Goals and Non-Goals...........................................11
7.1. Goals....................................................11
7.1.1. Ensure End-To-End Confidentiality...................11
7.1.2. Ensure End-To-End Source Authentication of Media....11
7.1.3. Provide a More Efficient Service than "Full-Mesh"...11
7.1.4. Support Cloud-Based Conferencing....................12
7.1.5. Limiting a User's Access to Content.................12
7.1.6. Compatibility with the WebRTC Security Architecture.12
7.2. Non-Goals................................................13
7.2.1. Securing the Endpoints..............................13
7.2.2. Concealing that Communication Occurs................13
7.2.3. Individual Media Source Authentication..............13
7.2.4. Support for Multicast in Switched Conferencing......14
8. Requirements..................................................14
9. IANA Considerations...........................................15
10. Security Considerations......................................15
11. References...................................................16
11.1. Normative References....................................16
11.2. Informative References..................................16
12. Acknowledgments..............................................16
13. Contributors.................................................17
Authors' Addresses...............................................18
1. Introduction
Users of multimedia communication products and services have privacy
expectations that are largely satisfied with the use of SRTP
[RFC3711] and related technologies when communicating point-to-point
over the Internet. When communicating in a conferencing environment
with two or more participants, though, it is necessary for an
endpoint to share the SRTP master key and salt with the conference
Jones, et al. Expires September 7, 2015 [Page 2]
Internet-Draft Private Media Requirements March 2015
server so that it can authenticate and decrypt received RTP and RTCP
packets. The conference server also needs the master key and salt in
order to transmit media packets it receives to other participants in
the conference. The need for conferencing servers to have the master
key is a security risk for users.
Within a corporate or other isolated environment where conferencing
servers are tightly controlled, this security risk can be effectively
managed. However, managing this risk is becoming increasing
difficult as conferencing resources are being deployed in networks
that are less trusted, including virtualized conferencing servers
deployed in cloud environments.
There are also public voice and video conferencing service providers
in which users must place full trust in order to use those services,
as it is necessary for an endpoint to share the SRTP master key with
those conferencing servers. This exposes corporations, for example,
to a higher risk of being subjected to corporate espionage. While it
is not the intent of this draft to suggest that any existing service
provider would permit or condone any illicit use of its service, the
fact is that security threats can come from external sources and
remain undiscovered for long periods of time.
It is possible to ensure communication privacy within the context of
a switched conferencing environment with limited changes in the
security mechanisms used today. This document discusses this
possibility in more detail and presents a set of requirements for
meeting this objective.
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]
when they appear in ALL CAPS. These words may also appear in this
document in lower case as plain English words, absent their normative
meanings.
3. Terminology
Adversary - An unauthorized entity that may attempt to compromise the
performance of a conference server through various means, including,
but not limited to, the transmission of bogus media packets or
attempt to gain access to the plaintext of the media.
Media content - the portion of the RTP (i.e., the encrypted RTP
payload) or other packet containing the actual audio, video, or other
multimedia information that is considered confidential and is subject
to end-to-end encryption. This does not include, for example, RTP
headers, RTP header extensions, or RTCP packets.
Jones, et al. Expires September 7, 2015 [Page 3]
Internet-Draft Private Media Requirements March 2015
Switching conference server - A conference server that does not
decrypt RTP media flows or perform processing on the media payload,
but instead simply forwards the received media from a sender to the
other participants in a multimedia conference. A switching
conference server may modify some RTP headers.
4. Background
Traditional multimedia conferencing servers would mix, transcode,
transrate, and/or recompose media flows from one or more conference
participants, sending out a different audio and video flow to each
participant. For audio, this might entail mixing some number of
input flows that appear to contain audio intended to be heard by the
other participants, with each participant receiving a flow that does
not contain that participant's own audio. For video, the conference
server may elect to send only video showing the current active
speaker, a tiled composition of all participants or the most recent
active speakers, a video flow with the active speaker presented
prominently with other participants presented as thumbnail images, or
some other composite arrangement. It is also common for audio or
video to be transcoded. A typical traditional conferencing server is
depicted in Figure 1.
+-------------------+
+---+ --{A}--> | | <--{C}-- +---+
| A | | Media Composition | | C |
+---+ <-{BCD}- | | -{ABD}-> +---+
| Transcoders |
+---+ --{B}--> | Transraters | <--{D}-- +---+
| B | | | | D |
+---+ <-{ACD}- | Decrypt/Encrypt | -{ABC}-> +---+
+-------------------+
Figure 1 - Traditional Conferencing Server
Traditional conference servers require a significant amount of
processing power, which in turn translates into a high cost for
conferencing hardware manufacturers. Significantly, too, it is very
difficult to deploy these servers in a cloud environment due to the
high processing demands, as the specialized hardware found in the
traditional voice and video conferencing server does not exist in a
cloud environment.
To enable the traditional conferencing server to perform its job, the
server establishes an SRTP session with each of the conference
participants so that it can get the keys required to decrypt and
encrypt media flows from and to each participant. This means that
the conference server is necessarily a fully trusted entity in the
communication path. Anytime these servers are deployed in a network
that is not tightly controlled, it increases the risk that an
attacker might gain access to cryptographic key material, thus
Jones, et al. Expires September 7, 2015 [Page 4]
Internet-Draft Private Media Requirements March 2015
allowing the attacker to be able to see and listen to ongoing
conferences. In some instances, depending on how the hardware is
designed and how keys and certificates are managed, it might be
possible for an attacker to see and listen to previously recorded
conferences or future conferences.
The Secure Real-time Transport Protocol (SRTP) [RFC3711] is a profile
of RTP, which can provide confidentiality, message authentication,
and replay protection to the RTP traffic and to the RTP Control
Protocol (RTCP). Encryption of header extension in SRTP [RFC6904]
provides a mechanism extending the mechanisms of [RFC3711], to
selectively encrypt RTP header extensions in SRTP. [RFC3711] and
[RFC6904] solves end-to-end use cases between two endpoints, and does
not consider use cases where a sender delivers media to a receiver
via a cloud-based conferencing service.
5. Motivation for Private Media in Switched Conferencing
5.1. Switched Conferencing in Cloud Services
There is a trend in the industry for enterprises to use cloud
services to host multi-party conferences and meet-me services, either
exclusively or to meet peak loads on-demand. At the same time, there
is shift toward using light-weight, cost-effective switching
conference servers in cloud services that do not necessarily need to
mix audio or composite/transcode video. Also fueling the use of such
light-weight conference servers is the desire to fully exploit
virtualized computing resources and dynamic scalability potential
available in cloud computing environments.
The increased use of cloud services has exposed a problem. There are
two different trust domains from a media perspective: endpoints and
other devices in a trusted domain, and conference servers controlled
by the cloud service in an untrusted domain. Other examples of
conference devices spread across trusted and untrusted domains are
likely, but the cloud service trend is triggering the urgency to
address the need to allow for lightweight media conference while
enabling media privacy at the same time.
With a switching conference server, each participant transmits media
to the server as it would with a traditional conferencing server.
However, the switching conference server merely forwards media to the
other participants in the conference (where the other participant may
be associated with a cascaded conference server or an endpoint on the
same server), leaving composition to the receiving endpoint. Since
some endpoints may have a limited amount of bandwidth, each endpoint
might negotiate with the switching conference server to receive only
a subset of the available media flows. Each transmitting endpoint
might also send multiple media flows of varying frame sizes and/or
frame rates (e.g., simulcast or scalability layers), so that the
server can select the streams most appropriate for each receiver's
Jones, et al. Expires September 7, 2015 [Page 5]
Internet-Draft Private Media Requirements March 2015
bandwidth and capabilities. This allows, for example, an endpoint to
receive and display higher quality video for the active speaker and
thumbnails for other participants. It is also worth noting that, for
switched media to work successfully, each endpoint in the conference
must support the media formats transmitted by all other entities in
the conference. More modern endpoints support multiple codecs and
formats, making this commercially practical.
Figure 2 depicts an example of a switching conference server wherein
each participant is receiving the media flows transmitted by each of
the other participants in the conference.
+--------------------+
+---+ --{A}--> | | <-{C}--- +---+
| A | <-{B}--- |Switching Conference| --{A}--> | C |
| | <-{C}--- | Server | --{B}--> | |
+---+ <-{D}--- | | --{D}--> +---+
| Packet |
+---+ --{B}--> | Authentication | <-{D}--- +---+
| B | <-{A}--- | | --{A}--> | D |
| | <-{C}--- | | --{B}--> | |
+---+ <-{D}--- | Media Privacy | --{C}--> +---+
+--------------------+
Figure 2 - Switching Conference Server
Note - The use of multiple arrows directed toward each endpoint is
not intended to suggest the use of separate RTP sessions.
By using methods such as those described in [RFC6464], it is possible
for the switching conference server to transmit the appropriate audio
and video flows to conference participants without having knowledge
of the contents of the encrypted media. The examples that follow
help to illustrate this point.
In the Figure 3 below, endpoints A, B and D receive the video streams
from endpoint C, the currently active speaker, which is receiving
video from endpoint A, the previous active speaker. Later when
endpoint B becomes the active speaker (Figure 4), endpoints A, C and
D will start to receive video from B, while endpoint B continues to
receive video from endpoint C. Finally in Figure 5, endpoint A
becomes the active speaker.
Jones, et al. Expires September 7, 2015 [Page 6]
Internet-Draft Private Media Requirements March 2015
+--------------------+
+---+ --{A}--> | | <--{C}-- +---+
| A | |Switching Conference| | C |*
+---+ <-{C}--- | Server | ---{A}-> +---+
| |
+---+ --{B}--> | | <--{D}-- +---+
| B | | | | D |
+---+ <-{C}--- | | ---{C}-> +---+
+--------------------+
Figure 3 - Endpoint "C" is the Active Speaker
+--------------------+
+---+ --{A}--> | | <--{C}-- +---+
| A | |Switching Conference| | C |
+---+ <-{B}--- | Server | ---{B}-> +---+
| |
+---+ --{B}--> | | <--{D}-- +---+
*| B | | | | D |
+---+ <-{C}--- | | ---{B}-> +---+
+--------------------+
Figure 4 - Endpoint "B" is the Active Speaker
+--------------------+
+---+ --{A}--> | | <--{C}-- +---+
*| A | |Switching Conference| | C |
+---+ <-{B}--- | Server | ---{A}-> +---+
| |
+---+ --{B}--> | | <--{D}-- +---+
| B | | | | D |
+---+ <-{A}--- | | ---{A}-> +---+
+--------------------+
Figure 5 - Endpoint "A" is the Active Speaker
Switched conferencing can also enable conferences to scale to include
many more simultaneous participants than would be possible with a
traditional conferencing server. Like traditional conferencing
servers, switching conference servers can also be cascaded or
interconnected in a meshed topology to increase the size of the
conference without putting undue burden on any particular server.
5.2. Private Media Security through Switching
A traditional conferencing server, or MCU, establishes an SRTP
session with each participating endpoint separately, and needs
to decrypt packets containing media presented to other endpoints. By
using a switching conference server, it is possible to keep the media
encryption keys private to the endpoints such that the conference
server does not have access to the keys used for media encryption.
Jones, et al. Expires September 7, 2015 [Page 7]
Internet-Draft Private Media Requirements March 2015
The switching conference server just forwards media received to each
of the other participants in the conference.
This provides for a significantly improved security model, as one
can, for example, utilize conferencing resources in the cloud that do
not necessarily have to be trusted. That said, there may be
situations where the switching conference server needs to modify the
RTP packet received from an endpoint, such as by adding or removing
an RTP header extension, modifying the payload type value, etc. It
would be the responsibility of the switching conference server to
ensure that media of the expected type and containing the correct
information is received by a recipient.
Thus, there is a need to utilize an end-to-end encryption and
authentication key (or pair of keys) and a hop-by-hop encryption and
authentication key (or pair of keys). The purpose for the hop-by-hop
encryption key is to optionally encrypt RTP header extensions. The
current SRTP specification and related specifications do not define
use of a dual-key approach presently. However, such an approach is
possible and would result in ensuring the privacy of media while also
enabling the more scalable switched conferencing model.
The assumption is that no changes are made to SRTCP, i.e. SRTCP is
protected hop-by-hop with a single security context.
This dual-key model does necessitate a change in the way that keys
are managed. However, the topic of key management is outside the
scope of this requirements document. However, high-level assumptions
like if the end-to-end contexts use a group key as SRTP master key or
if individual SRTP master keys (that may be derived/negotiated from
another group key) is likely to influence the solution derived from
this document.
6. Private Media Trust Model
The architecture suggested in this specification enables switching
conference servers to be hosted in domains in which the network
elements may have low trust, or where the trustworthiness is
uncertain. This does not mean that the service provider is
untrusted; it simply means that high trust is not required. This has
the benefit of protecting the endpoints in the case of external
attacks against the conference server.
In this specification, certain elements are considered trusted and
others are considered untrusted. Trust in the context of this
specification means that the element can be in possession of the
media encryption key(s) for a past, current, or potentially future
conference (or portion thereof) used to protect media content.
There are very few elements that need to be trusted. However, it is
also recognized that in certain deployment models, some elements that
Jones, et al. Expires September 7, 2015 [Page 8]
Internet-Draft Private Media Requirements March 2015
are classified as untrusted might be placed into the trusted domain
and considered trusted. This specification is not intended to
prevent such deployment models, but it does not rely upon them.
Each of the elements discussed below has a direct or indirect
relationship with each other. The following diagram depicts the
trust relationships described in the following sub-sections and the
media or signaling interfaces that exist between them, showing the
trusted elements on the left and untrusted elements on the right.
Note that this is a logical diagram and functional elements may be
co-located or further divided into multiple separate physical
entities. Note that it is not necessary that every interface exist
between all elements, such as both an interface from the endpoint and
call processing function to a key management function, though both
are possible options.
|
|
+--------------------------------------------+
v | |
+----------+ | +-----------------+ |
| Endpoint |--------------> | Call Processing | |
+----------+ | +-----------------+ |
^ | ^ ^ |
Trusted | | | | +------+
Elements | | | | |
| +-----------------------+ | |
| | | v v
| | | +----------------------+
| | +--------------> | Switching Conference |
| | | | | Server |
v v v | +----------------------+
+----------------+ |
| Key Management | | Untrusted
| Function | | Elements
+----------------+ |
|
|
Figure 6 - Relationship of Trusted and Untrusted Elements
6.1. Trusted Elements
The endpoint is considered a trusted element, as it will be sourcing
media flows transmitted to other conference participants and will be
receiving media for rendering for the human user. While it is
possible for an endpoint to be compromised and perform in unexpected
ways, such as transmitting a decrypted copy of media content to an
adversary, such security issues and defenses are outside the scope of
this document.
Jones, et al. Expires September 7, 2015 [Page 9]
Internet-Draft Private Media Requirements March 2015
The other trusted element is a key management function (KMF). This
function is responsible for providing cryptographic keys to the
endpoints for encrypting and authenticating media content. The KMF
is also responsible for providing cryptographic keys to the
conferencing resources to enable authentication of media packets
received by a conference participant. Interaction between the KMF
and untrusted call processing functions may be necessary to ensure
conference participants are delivered the appropriate keys or are
directed to the appropriate conference server. It is expected that
the KMF will be tightly controlled and managed to prevent
exploitation by an adversary, as any kind of security compromise of
the KMF puts the security of all conferences at risk.
6.2. Untrusted Elements
The call processing function is responsible for such things as
authenticating the user, signing messages, and processing call
signaling messages. This element is responsible for ensuring the
integrity, and optionally the confidentiality, of call signaling
messages between itself, the endpoint, and other network elements.
However, it is considered an untrusted element for the purposes of
this specification, as it cannot be trusted to have access to or be
able to gain access to cryptographic key material that provides
privacy and integrity of media packets.
There might be several independent call processing functions within
an enterprise, service provider network, or the Internet that are
classified as untrusted. Any signaling information that passes
through these untrusted entities is subject to inspection by that
element and might be altered by an adversary.
Likewise, there may be certain deployment models where the call
processing function is considered trusted. In such cases, trusted
call processing functions MUST take responsibility for ensuring the
integrity of received messages before delivering those to the
endpoint. How signaling message integrity is ensured is outside the
scope of this document, but might use such methods as defined in
[RFC4474].
The final element is the switching conference server, which is
responsible for forwarding encrypted media packets and conference
control information to endpoints in the conference. It is also
responsible for conveying secured signaling between the endpoints and
the key management function, acquiring per-hop authentication keys
from the KMF, and performing per-hop authentication operations for
media packets. This function might also aggregate conference control
information and initiate various conference control requests.
Forwarding of media packets requires that the switching conference
server have access to RTP headers or header extensions and
potentially modify those message elements, but the actual media
content MUST not be decipherable by the switching conference server.
Jones, et al. Expires September 7, 2015 [Page 10]
Internet-Draft Private Media Requirements March 2015
Further, the switching conference server does not have the ability to
determine whether an endpoint is authorized to have access to media
encryption keys. Merely joining a conference MUST NOT be interpreted
as having authority. Media encryption keys are conveyed to the
endpoint by the KMF in such a way as to prevent the switching
conference server from having access to those keys.
It is assumed that an adversary might have access to the switching
conference server and have the ability to read any of the contents
that pass through. For this reason, it is untrusted to have access
to the media encryption keys.
As with the call processing functions, it is appreciated that there
may be some deployments wherein the switching conference server is
trusted. However, for the purposes of this specification, the
switching conference server is considered untrusted so that we can
ensure to develop a solution that will work even in the more hostile
environments.
7. Goals and Non-Goals
7.1. Goals
7.1.1. Ensure End-To-End Confidentiality
The content of the communication and all media needs to be
confidential within the group of entities explicitly invited into the
conference. An external monitoring adversary should not be able to
deduce the human-to-human communication that actually occurred from
capturing the media packets.
At the same time, it is necessary to allow switching media servers to
manipulate certain RTP header fields like the payload type value.
7.1.2. Ensure End-To-End Source Authentication of Media
In a conference system with multiple participants it is vital that
the media content presented to any of the human participants is from
the stated participant, and not an adversary that attempts to inject
misleading content. Nor should an adversary be able to fool the
system into becoming a trusted party in the conference. Only
explicitly invited parties shall be able to contribute content.
7.1.3. Provide a More Efficient Service than "Full-Mesh"
A multi-party conference that has the goals of confidentiality and
source authentication can be established as a "full mesh" (i.e., each
participating endpoint directly addresses each of the other
participants). However, this has a significant issue with the amount
of consumed resources in both the uplink and the downlink from each
participant.
Jones, et al. Expires September 7, 2015 [Page 11]
Internet-Draft Private Media Requirements March 2015
A switched conferencing model would yield the efficiencies desired.
7.1.4. Support Cloud-Based Conferencing
To achieve cost-effective and scalable conferencing, it must be
possible to run the conference server instances in a cloud-based
virtualized environment.
From a security standpoint, this is a significant issue since the
virtualized server instance and the underlying hardware and software
upon which it runs might not be secure from an adversary.
7.1.5. Limiting a User's Access to Content
Since an invited user will be provided with the content protection
keys, the user can decrypt content from time periods before and after
the user joined the conference. However, this is not always
desirable. It should be possible to re-key the content protection
keys every time a user joins or leaves the conference so each
particular set of conference participants uses a unique key.
This also changes the trust level required on the conference roster
handling at any point and how to keep that accurate and secured.
It should be noted that timely completion of the re-keying operations
become an obstacle in system design and operation. Thus, it is a
goal to allow for this possibility when it is deemed essential, but
it should not be a requirement on a system to re-key each time the
participant list changes.
7.1.6. Compatibility with the WebRTC Security Architecture
It is a goal of this work to ensure compatibility with the WebRTC
security architecture as described in [I.D-rtcweb-security-arch]. As
an example, local resources that are considered a part of the trusted
computing base (TCB), such as keying material derived using DTLS-
SRTP, will remain within the TCB and not exposed to untrusted
entities.
The browser is reliant on an external calling service to convey
signaling information that may open the door for a man-in-the-middle
attack, such as the conveyance of certificate fingerprints over the
interface between the browser and the calling service. However, as
described in [I.D-rtcweb-security-arch], the browser may utilize
additional services, such as a trusted identify provider, to mitigate
such risks.
Having said the foregoing, this document does not aim to define
requirements for end-to-end security for the WebRTC data channel.
Jones, et al. Expires September 7, 2015 [Page 12]
Internet-Draft Private Media Requirements March 2015
7.2. Non-Goals
7.2.1. Securing the Endpoints
The security of a communication session requires that the endpoints
are not compromised and that the users are trustworthy. If not,
credentials and decrypted content may be shared with third parties.
However, this is hard to prevent through system design. Thus, it
should be assumed that the endpoint is secure and the user is
trustworthy; how to achieve this is out of scope this document.
7.2.2. Concealing that Communication Occurs
A non-goal is to attempt to prevent a pervasive monitoring adversary
from knowing that the communication session has occurred. The reason
for excluding this as a goal is that it is extremely difficult to
achieve, as a pervasive monitoring adversary can be expected to be
able to have knowledge of all IP flows that enter or exit local ISPs,
across links that straddle nation borders or internet exchange
points. To hide the fact communication occurred, the flows required
to achieve the communication session need to be highly difficult to
correlate between different legs of the communication.
At this stage this is deemed too difficult to attempt and will need
to be a subject for further study. Existing attempts include The
Onion Router (TOR), against which it has been claimed to be possible
to monitor, at least partially, by an adversary with sufficient
reach.
Also of consideration is that trying to conceal the fact that
communication occurred actually makes it more difficult for network
administrators to effectively manage and troubleshoot issues with
conference calls.
7.2.3. Individual Media Source Authentication
Although the participants in the conference are authenticated, it is
not a goal to provide source authentication of the media at the
individual user level, instead being satisfied with being able to
authenticate media as coming from an invited conference participant
or not.
There exist solutions that can provide individual media source
authentication (e.g., TESLA). However, they impact the performance
or security properties they provide. Thus, further study is required
to determine impact and resulting security properties if desired to
have individual source authentication.
Jones, et al. Expires September 7, 2015 [Page 13]
Internet-Draft Private Media Requirements March 2015
7.2.4. Support for Multicast in Switched Conferencing
Multicast traffic is, by design, transmitted to every participant in
a conference. The focus of this document is only on centralized
unicast conferencing that utilizes a switched conferencing
architecture.
8. Requirements
The following are the security solution requirements for switched
conferencing that enable end-to-end media privacy between all
conference participants.
Note that while some switching media servers might be fully trusted
entities, the intent of this solution and purpose for these private
media (PM) requirements is to address those servers that are not
fully trusted.
PM-01: Switching conference server MUST be able to switch the media
between participants in a conference without having access to
unencrypted media content.
PM-02: Solution MUST maintain all current SRTP security goals,
namely the ability to provide for end-to-end confidentiality,
provide for hop-by-hop replay protection, and ensure hop-by-
hop and end-to-end message integrity. {Editor's Note:
Question asked, "Does this include third parties?" Jonathan
Lennox to suggest ways to make this more concrete.}
PM-03: Solution MUST extend replay protection to cover each hop in
the media path, both ensuring that any received packet is
destined for the recipient and not a duplicate.
PM-04: Keys used for end-to-end encryption and authentication of RTP
payloads and other information deemed unsuitable for access
by the switching conference server MUST NOT be generated by
or accessible to any component that is not in the fully
trusted domain.
PM-05: The switching conference server MUST be capable of making
changes to the RTP header and, optionally, the RTP header
extensions.
PM-06: The SRTP cryptographic context, which is identified in part
by an SSRC, contains transform-independent parameters used by
the sending endpoint, including the RTP packet sequence
number and rollover counter (ROC), required for packet
decryption and authentication that, along with the value of
the SSRC, MUST be protected end-to-end.
Jones, et al. Expires September 7, 2015 [Page 14]
Internet-Draft Private Media Requirements March 2015
PM-07: The switching conference server, or any entity that is not
fully trusted, MUST NOT be involved in the user or device
authentication for the purpose of media key distribution.
PM-08: The switching conference server MUST be able to switch an
already active SRTP stream to a new receiver, while
guaranteeing the timely synchronization between the SRTP
context of the transmitter and its current and new receivers.
PM-09: It MUST be possible for the switching conference server to
determine if a received media packet was transmitted by a
conference participant in possession of the end-to-end media
encryption keys and hop-by-hop authentication keys.
PM-10: It MUST be possible for a conference to be optionally re-
keyed as desired, such as each time a participant joins or
leaves the conference. {Editor's note: Who is allowed to
know who leaves and joins? Do you trust the conference
server to tell you reliably?}
PM-11: Any solution satisfying this requirements specification MUST
provide for a means through which WebRTC-compliant endpoints
can participate in a switched conference using private media
as outlined herein.
PM-12: All RTP senders, including the switching conference server,
MUST adhere to all congestion control requirements that are
required by the RTP profile and topology in use, including
RTP circuit breakers [I.D-ietf-avtcore-rtp-circuit-breakers].
Since the switching conference server is unable to perform
transcoding or transrating that requires access to the
unencrypted media, its reaction to congestion signals is
often limited to dropping packets that would otherwise be
forwarded in the absence of congestion, and signaling
congestion to the RTP source. This is similar to the
congestion control behavior of the Media Switching Mixer and
Selective Forwarding Middlebox/Unit in [I.D-ietf-avtcore-rtp-
topologies-update].
9. IANA Considerations
There are no IANA considerations for this document.
10. Security Considerations
[TBD]
Jones, et al. Expires September 7, 2015 [Page 15]
Internet-Draft Private Media Requirements March 2015
11. References
11.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol
(SRTP)", RFC 3711, March 2004.
[RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time
Transport Protocol (RTP) Header Extension for Client-to-
Mixer Audio Level Indication", RFC 6464, December 2011.
[I.D-rtcweb-security-arch]
E. Rescorla, "WebRTC Security Architecture", Work in
Progress, July 2014.
[RFC6904] J. Lennox, "Encryption of Header Extensions in the Secure
Real-time Transport Protocol (SRTP)", RFC 6904, December
2013.
[I.D-ietf-avtcore-rtp-topologies-update]
Westerlund, M., and S. Wenger, "RTP Topologies", Work in
Progress, March 2015.
[I.D-ietf-avtcore-rtp-circuit-breakers]
Perkins, C. S., and V. Singh, "Multimedia Congestion
Control: Circuit Breakers for Unicast RTP Sessions", Work
in Progress, March 2015.
11.2. Informative References
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
A., Peterson, J., Sparks, R., Handley, M., and E.
Schooler, "SIP: Session Initiation Protocol", RFC 3261,
June 2002.
[RFC4474] Peterson, J. and C. Jennings, "Enhancements for
Authenticated Identity Management in the Session
Initiation Protocol (SIP)", RFC 4474, August 2006.
12. Acknowledgments
The authors would like to thank Marcello Caramma, Matthew Miller,
Christian Oien, Magnus Westerlund, Cullen Jennings, Christer
Holmberg, Bo Burman, Jonathan Lennox, Suhas Nandakumar, Dan Wing,
Roni Even, and Mo Zanaty for their invaluable input.
Jones, et al. Expires September 7, 2015 [Page 16]
Internet-Draft Private Media Requirements March 2015
13. Contributors
[TBD]
Jones, et al. Expires September 7, 2015 [Page 17]
Internet-Draft Private Media Requirements March 2015
Authors' Addresses
Paul E. Jones
Cisco Systems, Inc.
7025 Kit Creek Rd.
Research Triangle Park, NC 27709
USA
Phone: +1 919 476 2048
Email: paulej@packetizer.com
Nermeen Ismail
Cisco Systems, Inc.
170 W Tasman Dr.
San Jose
USA
Email: nermeen@cisco.com
David Benham
Cisco Systems, Inc.
170 W Tasman Dr.
San Jose
USA
Email: dbenham@cisco.com
Nathan Buckles
Cisco Systems, Inc.
170 W Tasman Dr.
San Jose
USA
Email: nbuckles@cisco.com
John Mattsson
Ericsson AB
SE-164 80 Stockholm
Sweden
Phone: +46 10 71 43 501
Email: john.mattsson@ericsson.com
Yi Cheng
Ericsson
SE-164 80 Stockholm
Jones, et al. Expires September 7, 2015 [Page 18]
Internet-Draft Private Media Requirements March 2015
Sweden
Phone: +46 10 71 17 589
Email: yi.cheng@ericsson.com
Richard Barnes
Mozilla
331 E Evelyn Ave.
Mountain View
USA
Email: rlb@ipv.sx
Jones, et al. Expires September 7, 2015 [Page 19]