Internet DRAFT - draft-liu-webrtc-http-interactive-protocol
draft-liu-webrtc-http-interactive-protocol
Network Working Group D. Liu
Internet-Draft Y. He
Intended status: Standards Track X. Yu
Expires: 11 January 2024 X. Kai
S. Li
Alibaba Group
July 2023
WebRTC-HTTP Interactive Signaling Protocol(WHISP)
draft-liu-webrtc-http-interactive-protocol-00
Abstract
This document introduces a protocol used for allowing WebRTC-based
pull, merge and switch of content supported by media transmission
network.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 2 January 2024.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Liu, et al. Expires 11 January 2024 [Page 1]
Internet-Draft Protocol for interactive low-latency med July 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. System Architecture . . . . . . . . . . . . . . . . . . . . . 3
3. Protocol Operation . . . . . . . . . . . . . . . . . . . . . 5
4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Signaling Specification . . . . . . . . . . . . . . . . . . . 6
5.1. Merging signaling message . . . . . . . . . . . . . . . . 7
5.2. Switching signaling message . . . . . . . . . . . . . . . 9
5.3. Grabbing signaling message . . . . . . . . . . . . . . . 11
5.4. Pulling signaling message . . . . . . . . . . . . . . . . 12
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
8. Security Considerations . . . . . . . . . . . . . . . . . . . 13
9. Normative References . . . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14
1. Introduction
Emerging real-time interactive video/audio communication applications
bring new challenges for existing protocols. This documents
introduces the use cases, requirements and protocol for WebRTC-HTTP
interactive low-latency multimedia transmission network over the
Internet.
Interactive real-time media communication is getting popular with the
rapid growth of short video, on-line education, on-line gaming and
other similar applications. Some application providers build their
own interactive real-time media communication network to support
their applications yet facing high costs and technical issues. For
example, interactive communication between users is unpredictable,
which results in high costs when dedicated entity for interaction is
used and the wastage of reserved resources for interaction.
To avoid the aforementioned issues and challenges, some other
application providers attempt to use third party's interactive real-
time media communication network provided by cloud operators.
However, there are several challenges of existing protocol to support
the above mentioned scenarios.
Liu, et al. Expires 11 January 2024 [Page 2]
Internet-Draft Protocol for interactive low-latency med July 2023
1. Interactive online broadcasting service is flexible and much more
complicated compared with traditional media broadcasting service.
For interactive online broadcasting applications, audiences may
occasionally request to setup bidirectional real-time communication
with the broadcaster and all the other audiences are expected to be
able to receive the merged interactive media traffic containing the
broadcaster and connected audience. To meet this end, there is a
need for standardized signaling protocol which can support media
stream merging,switching and pulling to support those complicated
scenarios.
2. Applications such as interactive online broadcasting, short
video, on-line education, on-line gaming are very delay sensitive.
Thus, the protocols for media stream merging, switching and pulling
are expected to be able to meet the latency requirement for those
applications.
3. Nowadays, WebRTC is widely used in the multimedia ecosystem. The
protocols for media stream merging,switching and pulling are expected
to be able to compatible with WebRTC in order to deliver interactive
media services to customers.
2. System Architecture
This section specifies the system architecture of the Interactive
real-time media communication system.
Liu, et al. Expires 11 January 2024 [Page 3]
Internet-Draft Protocol for interactive low-latency med July 2023
Sever for media streaming control
+-----------+
| |
| |
| |
| |
| |
+-----+-----+ +-----+
+-----+ | | |
| | | | |
| | | | |
| |<------+ | +------->| |
| | | +-----------------v------------------+ | | |
| | | | | | +-----+
+-----+ | | | | Audience
Broadcaster | | +----+ +-----+
+--------------->| | | |
| | | |
| WHISP communication network +------------>| |
+-----+ +--------------->| | | |
| | | | | | |
| | | | +----+ +-----+
| |<------+ | | | Audience
| | | | | +-----+
| | +-------------^-----+----------------+ | | |
+-----+ | | | | |
Audience | | +------->| |
connected with | | | |
the broadcaster +-+-----v--+ | |
| | +-----+
| | Audience
| |
| |
| |
+----------+
Server for media stream merging
Figure 1: Architecture
The WHISP communication network can be provided by cloud providers.
The communication network can provide fundamental capabilities of
media stream, including media pulling. In addition, the network can
also support capabilities such as media merging and media switching.
The capabilities can be triggered by control server and server for
media streaming merging, which can be provided by 3rd party. Based
on those capabilities, the audience can receive corresponding media
from broadcaster or merged media between broadcaster and requested
audience for interaction seamlessly.
Liu, et al. Expires 11 January 2024 [Page 4]
Internet-Draft Protocol for interactive low-latency med July 2023
3. Protocol Operation
4. Overview
This section defines the signaling procedure of WHISP communication
network.
Audience Interactive real-time
connected with Server for media media communication
Broadcaster the broadcaster Control Server stream merging network Audience
+--------------+ +--------------+ +--------------+ +--------------+ +--------------+ +--------------+
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
+------+-------+ +-------+------+ +-------+------+ +------+-------+ +-------+------+ +------+-------+
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | push media stream | | | |
+----------------------------+---------------------------+-------------------------+---------------------------->| |
| | | | | |
| | | | | |
| | | | | |
| | | | | pull media stream |
| | | | |<------------------------+
| | | | | |
| | | | | |
| | | push media stream | | |
| +---------------------------+----------------------------+------------------------->| |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| pull media stream | | | | |
+----------------------------+---------------------------+----------------------------+------------------------->| |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | pull media stream | | | |
| +---------------------------+----------------------------+------------------------->| |
| | | | | |
| | | Command for stream merging | | |
| | +--------------------------->| pull stream for merging | |
| | | +------------------------->| |
Liu, et al. Expires 11 January 2024 [Page 5]
Internet-Draft Protocol for interactive low-latency med July 2023
| | | | | |
| | | | push merged stream | |
| | | +------------------------->| |
| | |Command for stream switching| | |
| | +----------------------------+------------------------->| |
| | | | +--+ |
| | | | | | |
| | | | |<-+ |
| | | | |Perform stream switch |
| | | | | |
Figure 2: Procedure
Figure 2 shows the signaling procedure of Interactive real-time media
communication among broadcaster, requested audience for interaction
and other audiences. HTTP POST is used for the signaling in the
aforementioned procedure. The broadcaster and audience firstly
ingest their media streams to the interactive real-time media
communications network. A audience wishes to interact with the
broadcaster and thus sends a request to the control server for
interaction. The control server processes the request and sends
command for media merging to the server for media stream merging.
Upon the receipt of merging request from control server, the server
for media stream merging pulls the corresponding streams from both
the broadcaster and the requested audience for interaction and
processes with the media merging.
After the completion of media merging, the server for media stream
merging ingests the merged media to the Interactive real-time media
communication network which then sends the merged media to
corresponding edge media distribution servers which connect the
audiences who consume the media. After the distribution, the control
server sends the command for media switching to the Interactive real-
time media communication network. The network then forwards the
switching signaling message to the edge node. Up the receipt of the
signaling message, the edge node performs the media switching by
ingesting the merged media to the audiences.
5. Signaling Specification
This section defines the signaling specification for the interactive
real time media communication. In order to achieve the merging and
switching functionalities for different media source, signaling
messages need to be delivered to the corresponding entities (e.g.
control server, edge node, etc) in order to perform the proper
operations. All the messages below are transmitted using HTTP POST.
The signaling message of interactive media control protocol is shown
as follows:
Liu, et al. Expires 11 January 2024 [Page 6]
Internet-Draft Protocol for interactive low-latency med July 2023
Interactive Media Control Message {
Message Type (i),
Message Length (i),
Message Payload (..),
}
Figure 3: Interactive media signaling message
To process with the signaling message, the corresponding entities
need to identify the type of signaling message. This can be achieved
via using message type which can be carried by the message header.
The message types of Interactive media control protocol can be
described as follows:
+=====+===========+
| ID | Messages |
+=====+===========+
| 0x0 | Merging |
+-----+-----------+
| 0x1 | Switching |
+-----+-----------+
| 0x2 | Pulling |
+-----+-----------+
| 0x3 | Grabbing |
+-----+-----------+
Table 1: Message types
of Interactive media
control protocol
The message length indicates the total length of the message payload
filed in bytes. Message payload contains the information for
controlling media merging and media switching. The subsequent sub-
section describes these two message types and related payload in
detail.
5.1. Merging signaling message
Merging signaling message is used to request the server for media
stream merging to perform media merging between a broadcaster and an
audience. The merging signaling message is shown as follows:
Liu, et al. Expires 11 January 2024 [Page 7]
Internet-Draft Protocol for interactive low-latency med July 2023
Merging Message:
{
POST /whisp/merging/endpoint HTTP/1.1
Host: whisp.example.com
Content-Type: application/json
Content-Length:
{
"main media": {
"media ID":[
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
],
"URL":"http://demo.example.com/liveapp****/liveStream****1",
}
"secondary media": {
"media ID":[
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
],
"URL":"http://demo.example.com/liveapp****/liveStream****2",
}
"merge template": {
"merge template id"[
"01"
]
}
}
}
Figure 4: Merging signaling message
The payload type field "/whisp/merging/endpoint" in the header
indicates the merging signaling message. Main media decides the
media-related parameters (such as video format) of the merged media
Liu, et al. Expires 11 January 2024 [Page 8]
Internet-Draft Protocol for interactive low-latency med July 2023
and the secondary media needs to comply with the parameters when
conducting merging. Merge template decides the video layout of the
merged media when merging main media and secondary media. The merge
template id represents the id of the merge template. Media ID
represents the ID of an media. Amsid and vmsid stand for audio
stream id and video stream id, respectively. The ID is comprised of
a string which represents the unique ID of an media source and the
format of media ID follows the definition in RFC 8830 [3]. The media
URL represents the address of edge node which interacts with the
audience and format of URL follows the definition in RFC 3986 [2].
5.2. Switching signaling message
Switching signaling message is used to instruct the Interactive real-
time media communication system to perform media switching upon the
receipt of the request from the control server. The switching
signaling message is shown as follows:
Liu, et al. Expires 11 January 2024 [Page 9]
Internet-Draft Protocol for interactive low-latency med July 2023
Switching Message:
{
POST /whisp/switching/endpoint HTTP/1.1
Host: whisp.example.com
Content-Type: application/json
Content-Length:
{
"source media": {
"media ID":[
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
],
"URL":"http://demo.example.com/liveapp****/liveStream****1",
}
"destination media": {
"media ID":[
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
],
"URL":"http://demo.example.com/liveapp****/liveStream****2",
}
}
}
Figure 5: Switching signaling message
The payload type field "/whisp/switching/endpoint" in the header
indicates the switching signaling message. Source media contains the
information regarding source media from the broadcaster. Destination
media contains the information regarding destination media which is
the merged media between the broadcaster and the requested audience
for interaction. Each media contains the media ID, media URL.
Liu, et al. Expires 11 January 2024 [Page 10]
Internet-Draft Protocol for interactive low-latency med July 2023
The switch signaling message is sent to the edge node which manages
the media delivery for the audience. If the edge node acknowledges
the media switching, it re-directs the media content with the
destination media using WebRTC protocol. Upon the receipt of the
switching signaling message, the media transmission protocol decides
time-stamp, information regarding I-frame, and optionally the
sequence number to achieve the re-direction of the new merged media.
This is to make sure that the audience can smoothly switch to the
merged media without the negative impact on user experience.
5.3. Grabbing signaling message
Grabbing signaling message is used to instruct the Interactive real-
time media communication system to switch edge node for audience, for
example, in mobility scenario. In the mobility case, the Interactive
real-time media communication system may decide to switch a more
suitable edge node for media ingestion for an audience according the
location information. The grabbing signaling message is shown as
follows:
Grabbing Message:
{
POST /whisp/grabbing/endpoint HTTP/1.1
Host: whisp.example.com
Content-Type: application/json
Content-Length:
{
"new media": {
"media ID":[
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
],
"URL":"http://demo.example.com/liveapp****/liveStream****",
}
error_code,
}
}
Figure 6: Grabbing signaling message
Liu, et al. Expires 11 January 2024 [Page 11]
Internet-Draft Protocol for interactive low-latency med July 2023
The grabbing signaling message is sent from Interactive real-time
media communication system to the edge node. A new edge node firstly
starts ingesting media to the audience. Meanwhile, it registers the
service to the Interactive real-time media communication system. The
system detects that the media ingesting service already exists and
thus sends the grabbing signaling message to the old edge node. For
the old edge node, the grabbing signaling message is used to instruct
the node to drop the media ingestion to the audience. The error code
indicates the reason for dropping. The reasons are shown below:
+========+=====================+
| Reason | Code |
+========+=====================+
| 0x0 | Dropped by Mobility |
+--------+---------------------+
| 0x1 | Proactive dropping |
+--------+---------------------+
| 0x2 | Passive dropping |
+--------+---------------------+
Table 2: Error code for
grabbing signaling message
Dropped by Mobility indicates the case where a new edge node has
taken place and ingests the media to the audience instead of the old
edge node. Proactive dropping indicates the case where an edge node
gets issues on the media ingestion and the audience can request for
re-connection for the delivery of the media. Passive dropping
indicates the case where the corresponding media has been banned and
thus can not be ingested anymore.
5.4. Pulling signaling message
Pulling signaling message is sent from audience to the edge node.
Once the pulling signaling message is acknowledged, the edge node
sends the corresponding media to the audience. The pulling signaling
message is shown below:
Liu, et al. Expires 11 January 2024 [Page 12]
Internet-Draft Protocol for interactive low-latency med July 2023
Pulling Message:
{
POST /whisp/pulling/endpoint HTTP/1.1
Host: whisp.example.com
Content-Type: application/json
Content-Length:
{
"media": {
"URL":"http://demo.example.com/liveapp****/liveStream****",
}
}
}
Figure 7: Pulling signaling message
The payload type field in the header indicates the pulling signaling
message. The media URL indicates the address of the target media
which can be obtained from the edge node.
The edge node allocates an media ID for the broadcaster or the
requested audience for interaction so that the media can be uniquely
identified in the communication system. Upon the receipt of the
pulling signaling message, the edge node acknowledges the signaling
message with the media ID which uniquely identifies the target media.
6. Acknowledgements
7. IANA Considerations
TBD.
8. Security Considerations
The signaling messages defined in this document should be protected
by security mechanism.
9. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", March 1997,
<http://xml.resource.org/public/rfc/html/rfc2119.html>.
Liu, et al. Expires 11 January 2024 [Page 13]
Internet-Draft Protocol for interactive low-latency med July 2023
[2] Berners-Lee, T., T.Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", RFC 3986,
DOI 10.17487/RFC3986, January 2005,
<https://www.rfc-editor.org/rfc/rfc3986>.
[3] Alvestrand, H., "WebRTC MediaStream Identification in the
Session Description Protocol", RFC 8830,
DOI 10.17487/RFC8830, January 2021,
<https://www.rfc-editor.org/rfc/rfc8830>.
[4] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
DOI 10.17487/RFC2629, June 1999,
<https://www.rfc-editor.org/info/rfc2629>.
Authors' Addresses
Dapeng(Max) Liu
Alibaba Group
Email: max.ldp@alibaba-inc.com
Yaming He
Alibaba Group
Email: heyaming.hym@alibaba-inc.com
Xiaobo Yu
Alibaba Group
Email: shibo.yxb@alibaba-inc.com
Xiao Kai
Alibaba Group
Email: xiaokaikai.xk@alibaba-inc.com
Songlin Li
Alibaba Group
Email: songlin.lsl@alibaba-inc.com
Liu, et al. Expires 11 January 2024 [Page 14]