Internet DRAFT - draft-liu-webrtc-http-interactive-protocol

draft-liu-webrtc-http-interactive-protocol







Network Working Group                                             D. Liu
Internet-Draft                                                     Y. He
Intended status: Standards Track                                   X. Yu
Expires: 11 January 2024                                          X. Kai
                                                                   S. Li
                                                           Alibaba Group
                                                               July 2023


           WebRTC-HTTP Interactive Signaling Protocol(WHISP)
             draft-liu-webrtc-http-interactive-protocol-00

Abstract

   This document introduces a protocol used for allowing WebRTC-based
   pull, merge and switch of content supported by media transmission
   network.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 2 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.






Liu, et al.              Expires 11 January 2024                [Page 1]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  System Architecture . . . . . . . . . . . . . . . . . . . . .   3
   3.  Protocol Operation  . . . . . . . . . . . . . . . . . . . . .   5
   4.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Signaling Specification . . . . . . . . . . . . . . . . . . .   6
     5.1.  Merging signaling message . . . . . . . . . . . . . . . .   7
     5.2.  Switching signaling message . . . . . . . . . . . . . . .   9
     5.3.  Grabbing signaling message  . . . . . . . . . . . . . . .  11
     5.4.  Pulling signaling message . . . . . . . . . . . . . . . .  12
   6.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  13
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  13
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  13
   9.  Normative References  . . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  14

1.  Introduction

   Emerging real-time interactive video/audio communication applications
   bring new challenges for existing protocols.  This documents
   introduces the use cases, requirements and protocol for WebRTC-HTTP
   interactive low-latency multimedia transmission network over the
   Internet.

   Interactive real-time media communication is getting popular with the
   rapid growth of short video, on-line education, on-line gaming and
   other similar applications.  Some application providers build their
   own interactive real-time media communication network to support
   their applications yet facing high costs and technical issues.  For
   example, interactive communication between users is unpredictable,
   which results in high costs when dedicated entity for interaction is
   used and the wastage of reserved resources for interaction.

   To avoid the aforementioned issues and challenges, some other
   application providers attempt to use third party's interactive real-
   time media communication network provided by cloud operators.
   However, there are several challenges of existing protocol to support
   the above mentioned scenarios.



Liu, et al.              Expires 11 January 2024                [Page 2]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   1.  Interactive online broadcasting service is flexible and much more
   complicated compared with traditional media broadcasting service.
   For interactive online broadcasting applications, audiences may
   occasionally request to setup bidirectional real-time communication
   with the broadcaster and all the other audiences are expected to be
   able to receive the merged interactive media traffic containing the
   broadcaster and connected audience.  To meet this end, there is a
   need for standardized signaling protocol which can support media
   stream merging,switching and pulling to support those complicated
   scenarios.

   2.  Applications such as interactive online broadcasting, short
   video, on-line education, on-line gaming are very delay sensitive.
   Thus, the protocols for media stream merging, switching and pulling
   are expected to be able to meet the latency requirement for those
   applications.

   3.  Nowadays, WebRTC is widely used in the multimedia ecosystem.  The
   protocols for media stream merging,switching and pulling are expected
   to be able to compatible with WebRTC in order to deliver interactive
   media services to customers.

2.  System Architecture

   This section specifies the system architecture of the Interactive
   real-time media communication system.

























Liu, et al.              Expires 11 January 2024                [Page 3]

Internet-Draft  Protocol for interactive low-latency med       July 2023


                                        Sever for media streaming control
                                                +-----------+
                                                |           |
                                                |           |
                                                |           |
                                                |           |
                                                |           |
                                                +-----+-----+                        +-----+
   +-----+                                            |                              |     |
   |     |                                            |                              |     |
   |     |                                            |                              |     |
   |     |<------+                                    |                     +------->|     |
   |     |       |                +-----------------v------------------+    |        |     |
   |     |       |                |                                    |    |        +-----+
   +-----+       |                |                                    |    |       Audience
  Broadcaster    |                |                                    +----+        +-----+
                 +--------------->|                                    |             |     |
                                  |                                    |             |     |
                                  |    WHISP communication network     +------------>|     |
   +-----+       +--------------->|                                    |             |     |
   |     |       |                |                                    |             |     |
   |     |       |                |                                    +----+        +-----+
   |     |<------+                |                                    |    |        Audience
   |     |                        |                                    |    |        +-----+
   |     |                        +-------------^-----+----------------+    |        |     |
   +-----+                                      |     |                     |        |     |
  Audience                                      |     |                     +------->|     |
connected with                                  |     |                              |     |
 the broadcaster                              +-+-----v--+                           |     |
                                              |          |                           +-----+
                                              |          |                           Audience
                                              |          |
                                              |          |
                                              |          |
                                              +----------+
                                    Server for media stream merging

                        Figure 1: Architecture

   The WHISP communication network can be provided by cloud providers.
   The communication network can provide fundamental capabilities of
   media stream, including media pulling.  In addition, the network can
   also support capabilities such as media merging and media switching.
   The capabilities can be triggered by control server and server for
   media streaming merging, which can be provided by 3rd party.  Based
   on those capabilities, the audience can receive corresponding media
   from broadcaster or merged media between broadcaster and requested
   audience for interaction seamlessly.



Liu, et al.              Expires 11 January 2024                [Page 4]

Internet-Draft  Protocol for interactive low-latency med       July 2023


3.  Protocol Operation

4.  Overview

   This section defines the signaling procedure of WHISP communication
   network.

                            Audience                                                                     Interactive real-time
                            connected with                                        Server for media         media communication
Broadcaster                the broadcaster               Control Server            stream merging                network                    Audience
+--------------+            +--------------+            +--------------+           +--------------+          +--------------+           +--------------+
|              |            |              |            |              |           |              |          |              |           |              |
|              |            |              |            |              |           |              |          |              |           |              |
|              |            |              |            |              |           |              |          |              |           |              |
|              |            |              |            |              |           |              |          |              |           |              |
+------+-------+            +-------+------+            +-------+------+           +------+-------+          +-------+------+           +------+-------+
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |       push media stream   |                            |                          |                         |
    +----------------------------+---------------------------+-------------------------+---------------------------->|                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |     pull media stream   |
    |                            |                           |                            |                          |<------------------------+
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |      push media stream     |                          |                         |
    |                            +---------------------------+----------------------------+------------------------->|                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |    pull media stream       |                           |                            |                          |                         |
    +----------------------------+---------------------------+----------------------------+------------------------->|                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |                           |                            |                          |                         |
    |                            |     pull media stream     |                            |                          |                         |
    |                            +---------------------------+----------------------------+------------------------->|                         |
    |                            |                           |                            |                          |                         |
    |                            |                           | Command for stream merging |                          |                         |
    |                            |                           +--------------------------->| pull stream for merging  |                         |
    |                            |                           |                            +------------------------->|                         |



Liu, et al.              Expires 11 January 2024                [Page 5]

Internet-Draft  Protocol for interactive low-latency med       July 2023


    |                            |                           |                            |                          |                         |
    |                            |                           |                            |  push merged stream      |                         |
    |                            |                           |                            +------------------------->|                         |
    |                            |                           |Command for stream switching|                          |                         |
    |                            |                           +----------------------------+------------------------->|                         |
    |                            |                           |                            |                          +--+                      |
    |                            |                           |                            |                          |  |                      |
    |                            |                           |                            |                          |<-+                      |
    |                            |                           |                            |                          |Perform stream switch    |
    |                            |                           |                            |                          |                         |

                         Figure 2: Procedure

   Figure 2 shows the signaling procedure of Interactive real-time media
   communication among broadcaster, requested audience for interaction
   and other audiences.  HTTP POST is used for the signaling in the
   aforementioned procedure.  The broadcaster and audience firstly
   ingest their media streams to the interactive real-time media
   communications network.  A audience wishes to interact with the
   broadcaster and thus sends a request to the control server for
   interaction.  The control server processes the request and sends
   command for media merging to the server for media stream merging.
   Upon the receipt of merging request from control server, the server
   for media stream merging pulls the corresponding streams from both
   the broadcaster and the requested audience for interaction and
   processes with the media merging.

   After the completion of media merging, the server for media stream
   merging ingests the merged media to the Interactive real-time media
   communication network which then sends the merged media to
   corresponding edge media distribution servers which connect the
   audiences who consume the media.  After the distribution, the control
   server sends the command for media switching to the Interactive real-
   time media communication network.  The network then forwards the
   switching signaling message to the edge node.  Up the receipt of the
   signaling message, the edge node performs the media switching by
   ingesting the merged media to the audiences.

5.  Signaling Specification

   This section defines the signaling specification for the interactive
   real time media communication.  In order to achieve the merging and
   switching functionalities for different media source, signaling
   messages need to be delivered to the corresponding entities (e.g.
   control server, edge node, etc) in order to perform the proper
   operations.  All the messages below are transmitted using HTTP POST.
   The signaling message of interactive media control protocol is shown
   as follows:



Liu, et al.              Expires 11 January 2024                [Page 6]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   Interactive Media Control Message {
     Message Type (i),
     Message Length (i),
     Message Payload (..),
   }

               Figure 3: Interactive media signaling message

   To process with the signaling message, the corresponding entities
   need to identify the type of signaling message.  This can be achieved
   via using message type which can be carried by the message header.
   The message types of Interactive media control protocol can be
   described as follows:

                            +=====+===========+
                            |  ID | Messages  |
                            +=====+===========+
                            | 0x0 | Merging   |
                            +-----+-----------+
                            | 0x1 | Switching |
                            +-----+-----------+
                            | 0x2 | Pulling   |
                            +-----+-----------+
                            | 0x3 | Grabbing  |
                            +-----+-----------+

                           Table 1: Message types
                            of Interactive media
                              control protocol

   The message length indicates the total length of the message payload
   filed in bytes.  Message payload contains the information for
   controlling media merging and media switching.  The subsequent sub-
   section describes these two message types and related payload in
   detail.

5.1.  Merging signaling message

   Merging signaling message is used to request the server for media
   stream merging to perform media merging between a broadcaster and an
   audience.  The merging signaling message is shown as follows:










Liu, et al.              Expires 11 January 2024                [Page 7]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   Merging Message:
   {

    POST /whisp/merging/endpoint HTTP/1.1
    Host: whisp.example.com
    Content-Type: application/json
    Content-Length:
    {

    "main media": {
       "media ID":[
         "amsid":[
           "rts audio"
         ],
         "vmsid":[
           "rts video"
         ]

       ],
       "URL":"http://demo.example.com/liveapp****/liveStream****1",
    }

    "secondary media": {
       "media ID":[
         "amsid":[
           "rts audio"
         ],
         "vmsid":[
           "rts video"
         ]

       ],
       "URL":"http://demo.example.com/liveapp****/liveStream****2",
     }

     "merge template": {
       "merge template id"[
          "01"
       ]
     }
    }
   }

                    Figure 4: Merging signaling message

   The payload type field "/whisp/merging/endpoint" in the header
   indicates the merging signaling message.  Main media decides the
   media-related parameters (such as video format) of the merged media



Liu, et al.              Expires 11 January 2024                [Page 8]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   and the secondary media needs to comply with the parameters when
   conducting merging.  Merge template decides the video layout of the
   merged media when merging main media and secondary media.  The merge
   template id represents the id of the merge template.  Media ID
   represents the ID of an media.  Amsid and vmsid stand for audio
   stream id and video stream id, respectively.  The ID is comprised of
   a string which represents the unique ID of an media source and the
   format of media ID follows the definition in RFC 8830 [3].  The media
   URL represents the address of edge node which interacts with the
   audience and format of URL follows the definition in RFC 3986 [2].

5.2.  Switching signaling message

   Switching signaling message is used to instruct the Interactive real-
   time media communication system to perform media switching upon the
   receipt of the request from the control server.  The switching
   signaling message is shown as follows:


































Liu, et al.              Expires 11 January 2024                [Page 9]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   Switching Message:
   {

    POST /whisp/switching/endpoint HTTP/1.1
    Host: whisp.example.com
    Content-Type: application/json
    Content-Length:
    {

     "source media": {
       "media ID":[
         "amsid":[
           "rts audio"
         ],
         "vmsid":[
           "rts video"
         ]

       ],
       "URL":"http://demo.example.com/liveapp****/liveStream****1",
    }


     "destination media": {
       "media ID":[
         "amsid":[
           "rts audio"
         ],
         "vmsid":[
           "rts video"
         ]

       ],
       "URL":"http://demo.example.com/liveapp****/liveStream****2",
     }
    }
   }

                   Figure 5: Switching signaling message

   The payload type field "/whisp/switching/endpoint" in the header
   indicates the switching signaling message.  Source media contains the
   information regarding source media from the broadcaster.  Destination
   media contains the information regarding destination media which is
   the merged media between the broadcaster and the requested audience
   for interaction.  Each media contains the media ID, media URL.





Liu, et al.              Expires 11 January 2024               [Page 10]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   The switch signaling message is sent to the edge node which manages
   the media delivery for the audience.  If the edge node acknowledges
   the media switching, it re-directs the media content with the
   destination media using WebRTC protocol.  Upon the receipt of the
   switching signaling message, the media transmission protocol decides
   time-stamp, information regarding I-frame, and optionally the
   sequence number to achieve the re-direction of the new merged media.
   This is to make sure that the audience can smoothly switch to the
   merged media without the negative impact on user experience.

5.3.  Grabbing signaling message

   Grabbing signaling message is used to instruct the Interactive real-
   time media communication system to switch edge node for audience, for
   example, in mobility scenario.  In the mobility case, the Interactive
   real-time media communication system may decide to switch a more
   suitable edge node for media ingestion for an audience according the
   location information.  The grabbing signaling message is shown as
   follows:

   Grabbing Message:
   {

    POST /whisp/grabbing/endpoint HTTP/1.1
    Host: whisp.example.com
    Content-Type: application/json
    Content-Length:
    {


     "new media": {
       "media ID":[
         "amsid":[
           "rts audio"
         ],
         "vmsid":[
           "rts video"
         ]

       ],
       "URL":"http://demo.example.com/liveapp****/liveStream****",
     }
    error_code,
    }
   }

                    Figure 6: Grabbing signaling message




Liu, et al.              Expires 11 January 2024               [Page 11]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   The grabbing signaling message is sent from Interactive real-time
   media communication system to the edge node.  A new edge node firstly
   starts ingesting media to the audience.  Meanwhile, it registers the
   service to the Interactive real-time media communication system.  The
   system detects that the media ingesting service already exists and
   thus sends the grabbing signaling message to the old edge node.  For
   the old edge node, the grabbing signaling message is used to instruct
   the node to drop the media ingestion to the audience.  The error code
   indicates the reason for dropping.  The reasons are shown below:

                     +========+=====================+
                     | Reason | Code                |
                     +========+=====================+
                     |    0x0 | Dropped by Mobility |
                     +--------+---------------------+
                     |    0x1 | Proactive dropping  |
                     +--------+---------------------+
                     |    0x2 | Passive dropping    |
                     +--------+---------------------+

                         Table 2: Error code for
                        grabbing signaling message

   Dropped by Mobility indicates the case where a new edge node has
   taken place and ingests the media to the audience instead of the old
   edge node.  Proactive dropping indicates the case where an edge node
   gets issues on the media ingestion and the audience can request for
   re-connection for the delivery of the media.  Passive dropping
   indicates the case where the corresponding media has been banned and
   thus can not be ingested anymore.

5.4.  Pulling signaling message

   Pulling signaling message is sent from audience to the edge node.
   Once the pulling signaling message is acknowledged, the edge node
   sends the corresponding media to the audience.  The pulling signaling
   message is shown below:














Liu, et al.              Expires 11 January 2024               [Page 12]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   Pulling Message:
   {

    POST /whisp/pulling/endpoint HTTP/1.1
    Host: whisp.example.com
    Content-Type: application/json
    Content-Length:
    {


     "media": {
       "URL":"http://demo.example.com/liveapp****/liveStream****",
     }
    }
   }

                    Figure 7: Pulling signaling message

   The payload type field in the header indicates the pulling signaling
   message.  The media URL indicates the address of the target media
   which can be obtained from the edge node.

   The edge node allocates an media ID for the broadcaster or the
   requested audience for interaction so that the media can be uniquely
   identified in the communication system.  Upon the receipt of the
   pulling signaling message, the edge node acknowledges the signaling
   message with the media ID which uniquely identifies the target media.

6.  Acknowledgements


7.  IANA Considerations

   TBD.

8.  Security Considerations

   The signaling messages defined in this document should be protected
   by security mechanism.

9.  Normative References

   [1]        Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", March 1997,
              <http://xml.resource.org/public/rfc/html/rfc2119.html>.






Liu, et al.              Expires 11 January 2024               [Page 13]

Internet-Draft  Protocol for interactive low-latency med       July 2023


   [2]        Berners-Lee, T., T.Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", RFC 3986,
              DOI 10.17487/RFC3986, January 2005,
              <https://www.rfc-editor.org/rfc/rfc3986>.

   [3]        Alvestrand, H., "WebRTC MediaStream Identification in the
              Session Description Protocol", RFC 8830,
              DOI 10.17487/RFC8830, January 2021,
              <https://www.rfc-editor.org/rfc/rfc8830>.

   [4]        Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
              DOI 10.17487/RFC2629, June 1999,
              <https://www.rfc-editor.org/info/rfc2629>.

Authors' Addresses

   Dapeng(Max) Liu
   Alibaba Group
   Email: max.ldp@alibaba-inc.com


   Yaming He
   Alibaba Group
   Email: heyaming.hym@alibaba-inc.com


   Xiaobo Yu
   Alibaba Group
   Email: shibo.yxb@alibaba-inc.com


   Xiao Kai
   Alibaba Group
   Email: xiaokaikai.xk@alibaba-inc.com


   Songlin Li
   Alibaba Group
   Email: songlin.lsl@alibaba-inc.com












Liu, et al.              Expires 11 January 2024               [Page 14]