Internet DRAFT - draft-huang-mmusic-multiview
draft-huang-mmusic-multiview
INTERNET-DRAFT R. Huang
Intended Status: Standard Track Huawei
October 19, 2015
Multi-view streams in SDP and RTP Sessions
draft-huang-mmusic-multiview-00
Abstract
This document analyses the streaming options of multi-view
applications,and describes the required SDP signaling for them.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Copyright and License Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
<R. Huang> Expires April 21, 2016 [Page 1]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. 3D Channel of IPTV . . . . . . . . . . . . . . . . . . . . 4
3.2 Multi-view video conference . . . . . . . . . . . . . . . . 4
3.2.3 Tele-education with Free Viewpoint Video . . . . . . . . 4
3.2.4 Immersive Telepresence with Multi-view Video . . . . . . 4
4. Multi-view Video Transmission . . . . . . . . . . . . . . . . 4
5. SDP Signaling Requirements for Multi-view Video . . . . . . . . 5
6. Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.1. RFC5583 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.3. RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.4. CLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.5. 3D Signaling . . . . . . . . . . . . . . . . . . . . . . . 7
7. Possible Solutions . . . . . . . . . . . . . . . . . . . . . . 7
8. Security Considerations . . . . . . . . . . . . . . . . . . . 7
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
11.1 Normative References . . . . . . . . . . . . . . . . . . . 8
11.2 Informative References . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
<R. Huang> Expires April 21, 2016 [Page 2]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
1 Introduction
Multi-view video consists of multiple views that are taken by
multiple cameras from different positions and angles. 3D video and
free viewpoint video are two typical use cases. The first offers a 3D
depth impression of the observed scenery, while the second allows for
interactive selection of viewpoint and direction within a certain
operating range as known from computer graphics. Streaming of such
multi-view applications on Internet is usually offered at varying
speeds and costs over a variety of physical infrastructures. However,
since the multi-view video consists of multiple video sequences, the
traffic is several times larger than traditional multimedia, which
brings the dramatic increase in the bandwidth requirement. For
streaming of multi-view representations two operating options are
usually used: streaming all views in a highly compressed MVC bit-
stream with little possibility of random access or encoding all views
independently and streaming only required views.
This document analyses the streaming options of multi-view
applications,and describes the required SDP signaling for them.
2 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
This document uses the following terms:
Multi-view Video Compression: Using efficient compression techniques
to transmit multi-view video. Usually, inter-view similarity between
adjacent views and temporal similarity between temporally successive
images of each video are exploited.
MVC (Multiview Video Coding): An extension of the AVC standard,
standardised in 2009, covers a wide range of 3D video applications
including 3D video streaming, free-viewpoint video. It is inherently
backward compatible with AVC which serves as the base-view that can
be decoded independently in the absence of the MVC decoder. Any
additional views are referred to as enhancement views and are
typically coded using interview prediction within the same bitstream.
3. Use Cases
Multi-view video communications can be applied on a wide variety of
use cases. In this section, several of the most likely usage
scenarios are introduced.
<R. Huang> Expires April 21, 2016 [Page 3]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
3.1. 3D Channel of IPTV
The multi-view video content is sent to a group of end users by
multicast, which is widely applied in 3D channels of IPTV. Different
views can be separately transmitted in different multicast group or
just be transmitted in one multicast group in MVC.
3.2 Multi-view video conference
Two or more users engage in a remote video conversation using mobile
or desktop terminals, which have stereoscopic capture and display
capabilities. The multi-view video stream requires the transmission
of 2 or more views or, alternately, of one view and a depth map. The
views or depth maps may be transmitted as separate transport streams,
or together,depending on the choice of multi-view video
transmissions. In the case of a multiparty session, an intermediate
media server should be used for mixing. Video mixing for the multi-
view video stream needs to be aware of the additional views or depth
maps.
3.2.3 Tele-education with Free Viewpoint Video
A teacher intends to give a real-time lecture to students in one or
more remote sites. The teacher's site is equipped with a multi-view
capture setup, and has a 2D display to show feedback from the
students, who are equipped with regular 2D cameras and displays.
Multi-view video content is transmitted so that the student sites are
capable of selecting perspectives of the captured scene within a
certain operating range, and rendering them in real time, thus some
signaling interaction between students and the teacher may be
required.
3.2.4 Immersive Telepresence with Multi-view Video
A group of users wants to conduct a meeting through a telepresence
system, connecting to the session from several sites, and with each
terminal supporting multiple users. Participants are arranged around
a shared virtual table, and each remote participant is shown in a
separate screen, which is autostereoscopic and displays two different
views for each user in the room. At each site, several
autostereoscopic displays and a multi-view capture setup are
deployed. The viewpoint of these views is adjusted during the session
so as to match the position of observing users. The use of multi-view
video makes telepresence systems having perfect eye contact and
spatial faithfulness.
4. Multi-view Video Transmission
<R. Huang> Expires April 21, 2016 [Page 4]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
Multi-view video, which has two or more views, increases the encoding
complexity and bandwidth requirement for transmission. There are
several ways that can be used to transmit these views:
* Multi-view simulcast: encode each view and/or depth map
independently using a monocular video codec, which enables
streaming each view over separate channels; and clients can
requests as many views as their displays require without worrying
about inter-view dependencies. While this method has its benefits,
it does not exploit the redundancies that are preset in between
the views.
* Multi-view video compression: encode views using specific
technique to decrease the overall bit rate by exploiting the
inter-view redundancies. However, although it exploits the
similarities that are present between the views, it increases the
effect of transmission errors. And the inter-view efficient
compression techniques make the views depend on each others. In
order to decode a frame correctly, the frames it depend on must be
decoded at first, which will bring more unnecessary transmission
and delay if the views should be displayed is far away from the
reference view. MVC is a typical and often used video compression
format applied to multi-view videos.
* Adaptive Multi-view video transmission: Adaptive streaming of
multi-view video is also considered when functionalities such as
rate scalability, resolution scalability, view scalability, and
packet-loss resilience should be offered. In such a case,
simulcast or Multi-view video compression will be used together
with adaptive streaming techniques, e.g., SVC.
* Combination transmission: Sending multi-view video by using a
combination of simulcast and multi-view video compression
techniques may be also a good option for some specific scenarios.
For example, if one multi-view video compression technique works
well for closely related views but not for widely differing views,
sending several multi-view video compression streams
simultaneously can solve the problem.
5. SDP Signaling Requirements for Multi-view Video
The following requirements need to be met to support streaming multi-
view video in previous sections:
REQ-1: It must be possible to signal whether multi-view simulcast or
multi-view video compression is used.
REQ-2: It must be possible to signal adaptive multi-view video
<R. Huang> Expires April 21, 2016 [Page 5]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
transmission, e.g., multi-view video simulcast is used together with
adaptive simulcast.
REQ-3: It must be possible to signal combination transmission where
multi-view video compression is used together with simulcast.
REQ-4: Bundled [I-D.ietf-mmusic-sdp-bundle-negotiation] usage must be
considered.
REQ-5: It must be possible to signal multi-view video related decoder
constraints, e.g., maximum number of view streams that can be
provided at the sender and maximum number of view streams that can be
received at the receiver.
REQ-6: It must be possible to support both declarative SDP and SDP
offer/answer.
REQ-7: When multi-view simulcast is used, it must be possible to have
some ways to allow receivers ask for required view streams that they
wish to receive.
REQ-8: It must be compatible with existing other mechanisms, e.g.,
RTP retransmission [RFC4588], Forward Error Correction [RFC5109].
6. Gap Analysis
6.1. RFC5583
This specification defines a SDP mechanism to signaling the decoding
dependency of different media descriptions with the same media type
[RFC5583]. It can be used to signal the use of SVC or MVC. However,
it cannot differentiate the usages between multi-view simulcast and
MVC, not mention when multi-view transmission is used together with
simulcast or SVC.
6.2. Simulcast
[I.d-ietf-mmusic-sdp-simulcat] describes simulcast as the scenarios
where sending multiple differently encoded versions of the same media
source in different RTP streams. It is mainly used for the same video
source encoded with different video encoder types or image
resolutions. It still can not deal with the case when multi-view
transmission is used together with simulcast or SVC.
6.3. RID
[I.d-pthatcher-mmusic-rid] defines a framework to identify Source RTP
<R. Huang> Expires April 21, 2016 [Page 6]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
streams with constraints on its payload format in SDP. It can
effectively identify the source RTP stream within a RTP session,
which is quite useful for simulcast. Thus, it may be also helpful for
multi-view video signaling. It need to be considered further for
potential problems and issues.
6.4. CLUE
CLUE is dedicated for telepresence systems which provide high
definition, high quality audio/video enabling a "being-there"
experience. It involves multiple devices like multiple cameras,
displays, microphones, and loudspeakers. It specifies spatial
relationship of these devices, viewpoint, field of view/capture for
these devices, and related information [I.d-ietf-clue-signal].
However, the usages of 3D or free view point video in CLUE are not
considered in current CLUE scope. Thus, the supporting multi-view
view of CLUE needs to be considered further.
6.5. 3D Signaling
There were some work in MMUSIC to propose some 3D signaling
solutions. [I.d-greevenbosch-mmusic-signal-3d-format] and [I.d-
greevenbosch-mmusic-sdp-parallax] introduce new SDP attributes to
provide format description and depth position signaling in 3D
applications. [I.d-capelastegui-mmusic-3dv-sdp] introduces a
mechanism to describe 3D video streams composed of multiple video
video views, or of a combination of views and depth maps. They are
not directly multi-view, but should be evaluated when proposing
possible solutions.
7. Possible Solutions
TBD.
8. Security Considerations
TBD.
9. IANA Considerations
TBD.
10. Acknowledgments
11. References
<R. Huang> Expires April 21, 2016 [Page 7]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
11.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.,
Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
July 2006.
[RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error
Correction", RFC 5109, December 2007.
[I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand,
H., and C. Jennings, "Negotiating Media Multiplexing Using
the Session Description Protocol (SDP)", draft-ietf-
mmusic-sdp-bundle-negotiation-23 (work in progress), July
2015.
[I.d-ietf-mmusic-sdp-simulcat] Burman, B., Westerlund, M.,
Nandakumar, S., and M. Zanaty, "Using Simulcast in SDP and
RTP Sessions", draft-ietf-mmusic-sdp-simulcast-02 (work in
progress), October 6, 2015.
[I.d-pthatcher-mmusic-rid] Thatcher, P., Zanaty, M., Nandakuma, S.,
Roach, A., Burman, B., and B. Campen, "RTP Payload Format
Constraints", draft-pthatcher-mmusic-rid-01 (work in
progress), October 2015.
[I.d-ietf-clue-signal] Kyzivat, P., Xiao, L., Groves, C., and R.,
Hansen, "CLUE Signaling", draft-ietf-clue-signaling-06,
August 2015.
[I.d-greevenbosch-mmusic-signal-3d-format] Greevenbosch, B., and Y.,
Hui, "Signal 3D format", draft-greevenbosch-mmusic-sdp-3d-
format-01, October 2012.
[I.d-greevenbosch-mmusic-sdp-parallax] Greevenbosch, B., and Y., Hui,
"SDP attribute to signal parallax", draft-greevenbosch-
mmusic-sdp-parallax-01, October 2012.
[I.d-capelastegui-mmusic-3dv-sdp] Capelastegui, P., "3D Video in the
Session Description Protocol (SDP)", draft-capelastegui-
mmusic-3dv-sdp-00, April 2012.
11.2 Informative References
<R. Huang> Expires April 21, 2016 [Page 8]
INTERNET DRAFT <Multi-view in SDP> October 19, 2015
Authors' Addresses
Rachel Huang
Huawei
101 Software Avenue
Nanjing, China
EMail: rachel.huang@huawei.com
<R. Huang> Expires April 21, 2016 [Page 9]