Internet DRAFT - draft-gudumasu-avtcore-rtp-volumetric-media-roi
draft-gudumasu-avtcore-rtp-volumetric-media-roi
avtcore S. Gudumasu
Internet-Draft A. Hamza
Intended status: Standards Track InterDigital
Expires: 28 March 2024 25 September 2023
Viewport and Region-of-Interest-Dependent Delivery of Visual Volumetric
Media
draft-gudumasu-avtcore-rtp-volumetric-media-roi-01
Abstract
This document describes RTCP messages and RTP header extensions to
enable partial access and support viewport- and region-of-interest-
dependent delivery of visual volumetric media such as visual
volumetric video-based coding (V3C). Partial access refers to the
ability to access retrieve or deliver only a subset of the media
content. The RTCP messages and RTP header extensions described in
this document are useful for XR services which transport coded visual
volumetric content, such as point clouds.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 28 March 2024.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
Gudumasu & Hamza Expires 28 March 2024 [Page 1]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Background on Visual Volumetric Video-based Coding
(V3C) . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Definitions, and Abbreviations . . . . . . . . . . . . . . . 4
3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4
4. Format of RTCP feedback messages . . . . . . . . . . . . . . 5
4.1. Static 3D regions request . . . . . . . . . . . . . . . . 5
4.1.1. Message format . . . . . . . . . . . . . . . . . . . 6
4.2. Arbitrary spatial region request . . . . . . . . . . . . 6
4.2.1. Message format . . . . . . . . . . . . . . . . . . . 6
4.3. Viewport request . . . . . . . . . . . . . . . . . . . . 7
4.3.1. Message format . . . . . . . . . . . . . . . . . . . 8
5. RTP header extension for signaling transmitted 3D regions
information . . . . . . . . . . . . . . . . . . . . . . . 11
5.1. Response to a static 3D regions request . . . . . . . . . 11
5.1.1. Message format . . . . . . . . . . . . . . . . . . . 11
5.2. Response to an arbitrary spatial region request . . . . . 12
5.2.1. Message format . . . . . . . . . . . . . . . . . . . 12
5.3. Response to a 3D viewport request . . . . . . . . . . . . 15
5.3.1. Message format . . . . . . . . . . . . . . . . . . . 15
5.4. Dynamic 3D regions information transmission . . . . . . . 16
5.4.1. Message format . . . . . . . . . . . . . . . . . . . 16
6. SDP signaling for Viewport and Region-of-Interest dependent
delivery of V3C data . . . . . . . . . . . . . . . . . . 19
6.1. SDP signaling of static 3D regions . . . . . . . . . . . 19
6.2. SDP signaling for region-of-interest feedback messages
capability . . . . . . . . . . . . . . . . . . . . . . . 20
6.2.1. Request for static 3D regions . . . . . . . . . . . . 21
6.2.2. Request for arbitrary spatial region . . . . . . . . 21
6.2.3. Request for a viewport . . . . . . . . . . . . . . . 22
6.3. SDP signaling for 3D regions transported using RTP header
extension . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4. SDP signaling for dynamic 3D regions information
transported using RTP header extension . . . . . . . . . 23
6.5. Offer/Answer Considerations . . . . . . . . . . . . . . . 23
7. Security Considerations . . . . . . . . . . . . . . . . . . . 29
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 30
9.1. Normative References . . . . . . . . . . . . . . . . . . 30
9.2. Informative References . . . . . . . . . . . . . . . . . 30
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32
Gudumasu & Hamza Expires 28 March 2024 [Page 2]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
1. Introduction
Unlike traditional 2D videos, visual volumetric media represent 3D
shapes or objects. Examples of such media include point clouds,
meshes, and volumetric videos. For example, a point cloud is a set
of data points in space which may represent a 3D shape or object.
Each point position has its set of Cartesian coordinates (X, Y, Z)
and attribute information such as texture/color, reflectance, or
transparency.
To enable parallel processing, partial access, as well as a variety
of other functionalities, a visual volumetric media frame can be
divided into a number of independently decodable tiles. For partial
access use cases, these tiles are mapped to three-dimensional (3D)
sub-divisions of the space encompassing the volumetric object,
referred to here as 3D regions. The 3D regions are axis-alligned
cuboids, i.e., with no associated orientation or rotation, defined in
Cartesian space using an anchor point and size of the spatial region
along the three axes. The position of the anchor point and the size
of the spatial region are defined in terms of volumetric pixels
relative to the origin of the volumetric content's coordinate system.
Each 3D region has bounding box information of that spatial region
and an association with one or more tiles present in that spatial
region. The 3D regions information can be used by the receiving
devices to stream or access only a subset of the coded media content.
With the information provided by the 3D spatial regions, a player can
access relevant parts of the immersive media content (e.g., by
determining which spatial regions and/or objects falls within the
boundaries of the user's viewport or region(s)-of-interest and
mapping those to tiles).
When the bounding box information of a spatial region and its
association with one or more tiles in the visual volumetric frame is
not changing over time, those 3D regions are referred as static 3D
regions. Otherwise, if the bounding box information of a spatial
region or its association with one or more tiles changes over time,
then those 3D regions are referred as dynamic 3D regions. An
immersive media content provider provides static or dynamic 3D
regions information to the immersive media receivers. The media
player requests one or more interested 3D regions based on that
information. In some cases, the media player can also request for an
arbitrary 3D region within the immersive media content.
Gudumasu & Hamza Expires 28 March 2024 [Page 3]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
This document defines RTCP messages and RTP header extensions to
enable partial access and support viewport- and region-of-interest-
dependent delivery of visual volumetric media such as visual
volumetric video-based coding (V3C) [ISO.IEC.23090-5]. The defined
RTCP messages and RTP header extensions can be used with the RTP
payload format for V3C in [I-D.draft-ietf-avtcore-rtp-v3c].
1.1. Background on Visual Volumetric Video-based Coding (V3C)
A volumetric media content may be coded using the visual volumetric
video-based coding standard 23090-5 [ISO.IEC.23090-5]. V3C is
generic mechanism for volumetric video coding and it can be used by
applications targeting volumetric content, such as point clouds,
immersive video with depth, mesh representations of visual volumetric
frames, etc. Examples of such applications are Video-based Point
Cloud Compression (V-PCC) [ISO.IEC.23090-5], and MPEG Immersive Video
(MIV) [ISO.IEC.23090-12]. V3C encoding of a volumetric frame is
achieved through a conversion of volumetric frame from its 3D
representation to multiple 2D representations and a generation of
associated data. V3C supports the concept of tiling where the
volumetric frame is encoded in a number of tiles to enable parallel
encoding/decoding and for easy access to one or more regions of V3C
content, especially in streaming scenarios. The ISO/IEC 23090-5
specification also defines a set of Volumetric Annotation SEI
messages providing information on different objects within the V3C
content and the spatial regions or V3C atlas tiles associated with
those objects. Moreover, the ISO/IEC International Standards
23090-10 [ISO.IEC.23090-10] defines information on the different
spatial regions defined for the V3C content, including the bounding
box for the spatial region and its association with one or more V3C
atlas tiles. The RTP payload format for V3C content is defined in
[I-D.draft-ietf-avtcore-rtp-v3c]. This allows for packetization of
one or more V3C Network Abstraction Layer (NAL) units in a RTP packet
payload as well as fragmentation of a V3C NAL unit into multiple RTP
packets.
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Definitions, and Abbreviations
3.1. Definitions
The following terms are defined here for convenience:
Gudumasu & Hamza Expires 28 March 2024 [Page 4]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
Coordinate Systems: The reference coordinate system is a right-
handed 3D Cartesian coordinate system with 6 degrees of freedoms
(DoFs): 3 translations along the 3 x-y-z dimensions, and 3
rotations about the 3 x-y-z dimensions with the right-hand. The
following variations can be derived: Cartesian coordinate system:
the reference coordinate system with the 3 translations but
without the 3 rotations. World coordinate space - referring to
scene space, where manipulation is done relative to scene origin:
the reference coordinate system with the origin at the scene
origin and with the 3 translations and 3 rotations limited to the
scene space (or scene viewing space).
cuboid: A volume having six rectangular faces placed at right
angles.
field of view: The extent of the observable world in captured/
recorded content or in a physical display device.
tile: independently decodable rectangular 2D region of a video
frame or cuboid 3D region of a volumetric frame
4. Format of RTCP feedback messages
The 3D regions present in a volumetric media object can be signaled
using an SDP extension. This document extends the RTCP feedback
messages defined in the RTP/AVPF [RFC4585] RTP profile and in
[RFC5104] to define RTCP feedback messages for requesting static 3D
regions, an arbitrary spatial region, or a certain viewport. These
messages can be transmitted by the receiver to inform the sender of
the desired region(s)-of-interest.
These feedback messages follow a similar message format as RTCP Full
Intra Request and Temporal-Spatial Trade-off Request messages defined
in [RFC5104]. The message may be sent in a regular full compound
RTCP packet or in an early RTCP packet, as per the RTP/AVPF profile
rules.
4.1. Static 3D regions request
When the 3D regions available at the sender-side are static, the RTCP
feedback message for requesting one or more 3D regions-of-interest
contains the required number of 3D regions and a list of region_id
parameters. The values of region_id SHALL be acquired from the
"a=3d-regions" attributes defined in section 6.1 that are signaled by
the sender during SDP negotiation.
Gudumasu & Hamza Expires 28 March 2024 [Page 5]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
4.1.1. Message format
The static 3D regions request feedback message is identified by the
RTCP payload type value PT=PSFB, which indicates payload-specific
Feedback messages, and message type FMT=18.
The FCI field MUST contain a list of one or more static 3D region
ids.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| mode | num_regions |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more region ids (16 bits for each region id) |
+ -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL Zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
mode (16 bits): This field is uniquely set to all ones for static
3d-regions request.
num_regions (16 bits): indicate the number of interested 3D
regions
region_id (16 bits): identifies a pre-defined 3D region
4.2. Arbitrary spatial region request
The RTCP feedback message for a desired spatial region SHALL contain
the parameters position_x, position_y, position_z, size_x, size_y and
size_z. The values for each of the parameters is indicated using
four bytes. The sender SHALL ignore arbitrary spatial region
requests describing a region outside the original volumetric content.
4.2.1. Message format
The arbitrary spatial region request feedback message is identified
by an RTCP payload type value PT=PSFB and message type FMT=18.
The FCI field for the RTCP feedback message for arbitrary spatial
region request is formatted as follows:
Gudumasu & Hamza Expires 28 March 2024 [Page 6]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_x (h)| position_x | position_x | position_x(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_y (h)| position_y | position_y | position_y(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_z (h)| position_z | position_z | position_z(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_x (h) | size_x | size_x | size_x(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_y (h) | size_y | size_y | size_y(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_z (h) | size_z | size_z | size_z(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
position_x (32 bit signed int): specifies the origin position of
the 3D bounding box in the Cartesian coordinates along the x axis
position_y (32 bit signed int): specifies the origin position of
the 3D bounding box in the Cartesian coordinates along the y axis
position_z (32 bit signed int): specifies the origin position of
the 3D bounding box in the Cartesian coordinates along the z axis
size_x (32 bit unsigned int): specifies the extension of the 3D
bounding box of the volumetric media in Cartesian coordinates
along the x axis relative to the origin position
size_y (32 bit unsigned int): specifies the extension of the 3D
bounding box of the volumetric media in Cartesian coordinates
along the y axis relative to the origin position
size_z (32 bit unsigned int): specifies the extension of the 3D
bounding box of the volumetric media in Cartesian coordinates
along the z axis relative to the origin position
The four-byte value of the position_x, position_y, position_z,
size_x, size_y and size_z parameters are expressed in big-endian
order or the network byte order.
4.3. Viewport request
Gudumasu & Hamza Expires 28 March 2024 [Page 7]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
4.3.1. Message format
The RTCP feedback message for requesting a viewport is identified by
the RTCP payload type value PT=PSFB and message type FMT=19. The FCI
SHALL contain exactly one 3D viewport. The FCI format for 3D
viewport request feedback message is as follows.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|E|C|I|F|R| CT | cam_pos_x(h) | cam_pos_x | cam_pos_x |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_pos_x(l) | cam_pos_y(h) | cam_pos_y | cam_pos_y |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_pos_y(l) | cam_pos_z(h) | cam_pos_z | cam_pos_z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_pos_z(l) | cam_quat_x(h)| cam_quat_x | cam_quat_x(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_quat_x(l) | cam_quat_y(h)| cam_quat_y | cam_quat_y(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_quat_y(l) | cam_quat_z(h)| cam_quat_z | cam_quat_z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cam_quat_z(l) | horizontal_fov |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | vertical_fov |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | clipping_near_plane |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | clipping_far_plane |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL Zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The desired viewport information in the RTCP feedback viewport
message is composed of the following parameters:
ext_camera_flag (E) [1 bit]: This flag value equal to 1 indicates
that extrinsic camera parameters information is present in the
message. Value 0 indicates that extrinsic camera parameters
information is not present in the message.
Gudumasu & Hamza Expires 28 March 2024 [Page 8]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
center_view_flag (C) [1 bit]: This flag indicates whether the
signalled viewport position corresponds to the center of the
viewport or to one of two stereo positions of the viewport. Value
1 indicates that the signalled viewport position corresponds to
the center of the viewport. Value 0 indicates that the signalled
viewport position corresponds to one of two stereo positions of
the viewport. When ext_camera_flag is set to value 0, this flag
value is set to 0 otherwise set to 1.
int_camera_flag (I) [1 bit]: Intrinsic camera flag value equal to
1 indicates that intrinsic camera parameters information is
present in the message. Value 0 indicates that intrinsic camera
parameters information is not present in the message.
equal_fov_flag (F) [1 bit]: This flag indicates weather the
horizontal FOV and the vertical FOV of the viewport are equal or
not. Value 1 indicates the horizontal FOV and vertical FOV are
equal. Value 0 indicates horizontal FOV and vertical FOV are not
equal. When int_camera_flag value is 0, this flag value is set to
1 otherwise set to 0.
resv (R) [1 bit]: This is reserved for reserved for future
definition.
camera_type (CT) [3 bits]: indicates the projection method of the
viewport. Value 0 specifies equirectangular projection (ERP).
Value 1 specifies a perspective projection. Value 2 specifies an
orthographic projection. Values in the range 3 to 2557 are
reserved for future use.
cam_pos_x, cam_pos_y, and cam_pos_z (32 bits): respectively,
indicate the x, y, and z coordinates of the position of the camera
in metres in the global reference coordinate system. The value
for each field is expressed in 32-bit binary floating-point format
with the 4 bytes in big-endian order and with the parsing process
as specified in IEEE 754. This information shall be present only
when the ext_camera_flag (E bit) is set to 1.
cam_quat_x, cam_quat_y, and cam_quat_z (32 bits): indicate the x,
y, and z components, respectively, of the rotation of the camera
using the quaternion representation. The values are in the range
of -2^30 to 2^30, inclusive. When the component of rotation is
not present, its value is inferred to be equal to 0. This
information shall be present only when the ext_camera_flag (E bit)
is set to 1.
The value of rotation components may be calculated as follows:
Gudumasu & Hamza Expires 28 March 2024 [Page 9]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
qX = cam_quat_x / 2^30, qY = cam_quat_y / 2^30, qZ = cam_quat_z
/ 2^30
The fourth component, qW, for the rotation of the current
camera model using the quaternion representation is calculated
as follows:
qW = Sqrt( 1 - ( qX^2 + qY^2 + qZ^2 ) )
The point (w,x,y,z) represents a rotation around the axis
directed by the vector (x,y,z) by an angle
2*cos ^{-1}(w)=2*sin ^{-1}(sqrt(x^{2}+y^{2}+z^{2})).
horizontal_fov (32 bits): indicates the longitude range
corresponding to the horizontal size of the viewport region, in
units of radians, when camera_type is ERP projection. The value
is in the range 0 to 2 pi. When camera_type is perspective
projection this value specifies the horizontal field of view in
radians. The value is in the range of 0 and pi. When camera_type
is orthographic projection, this value specifies the horizontal
size of the orthogonal in metres. The value is expressed in
32-bit binary floating-point format with the 4 bytes in big-endian
order and with the parsing process as specified in IEEE 754. This
information shall be present only when the int_camera_flag (I bit)
is set to 1.
vertical_fov (32 bits): specifies the latitude range corresponding
to the vertical size of the viewport region, in units of radians,
when camera_type is ERP projection. The value is in the range 0
to pi. When camera_type is perspective projection this value
specifies the relative aspect ratio of viewport for perspective
projection (horizontal/vertical). The value is expressed in
32-bit binary floating-point format with the 4 bytes in big-endian
order and with the parsing process as specified in IEEE 754. When
camera_type is orthographic projection, this value specifies the
relative aspect ratio of viewport for orthogonal projection
(horizontal/vertical). The value is expressed in 32-bit binary
floating-point format with the 4 bytes in big-endian order and
with the parsing process as specified in IEEE 754. This
information shall be present only when the int_camera_flag (I bit)
is set to 1 and equal_fov_flag (F) is set to 0. Other cases,
vertical FOV information shall not be present.
clipping_near_plane and clipping_far_plane (32 bits): indicate the
near and far depths (or distances) based on the near and far
clipping planes of the viewport in meters. The values is
expressed in 32-bit binary floating-point format with the 4 bytes
Gudumasu & Hamza Expires 28 March 2024 [Page 10]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
in big-endian order and with the parsing process as specified in
IEEE 754. This information shall be present only when the
int_camera_flag (I bit) is set to 1.
5. RTP header extension for signaling transmitted 3D regions
information
The sender response may or may not agree with the exact 3D regions of
interest requested by the receiver but may contain an extended or
reduced version of the requested spatial region(s) depending on the
number and size of the 3D regions available in the content that
overlap with the requested spatial region(s). This helps the
receiver determine when to send subsequent spatial region requests,
e.g., in response to head movement sensor information and based on
the spatial volume covered by the 3D regions transmitted by the
sender. Moreover, signaling the 3D regions sent by the sender also
indicates the start of an RTP media flow belonging to a requested 3D
region of interest. A response to a request for 3D regions-of-
interest involves the sender signaling information of the volumetric
media 3D regions that are included in the response.
5.1. Response to a static 3D regions request
If the transmitted 3D regions information response corresponds to a
request for one or more of the static 3D regions signaled during SDP
negotiation, then the transmitted 3D regions information SHALL be
carried using the RTP header extension and includes a num_regions
field and a list of region ids corresponding to the static 3D regions
included in the response. The value for the num_regions and list of
region_id parameters is indicated using two bytes.
5.1.1. Message format
The payload of the transmitted static 3D regions information header
extension element can be encoded using the two-byte header defined in
[RFC8285].
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=xx | num_regions |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more region ids (16 bits for each region id) |
+ -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL Zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ID (8 bit): is the local identifier.
Gudumasu & Hamza Expires 28 March 2024 [Page 11]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
len (8 bit): is the length of extension data in bytes not
including the ID and length fields. The value zero indicates
there is no data following.
num_regions (16 bits): indicate the number of transmitted 3D
regions.
region_id (16 bit): is a unique identifier for a pre-defined
static 3D region in the encoded media.
5.2. Response to an arbitrary spatial region request
If the transmitted 3D region information response corresponds to a
request for an arbitrary spatial region, the transmitted 3D regions
information SHALL be carried using the RTP header extensions as
specified in [RFC8285].
5.2.1. Message format
The payload of the transmitted 3D regions information header
extension element can be encoded using the two-byte header defined in
[RFC8285].
Gudumasu & Hamza Expires 28 March 2024 [Page 12]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_x(h) | position_x | position_x | position_x(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_y(h) | position_y | position_y | position_y(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_z(h) | position_z | position_z | position_z(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_x(h) | size_x | size_x | size_x(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_y(h) | size_y | size_y | size_y(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_z(h) | size_z | size_z | size_z(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| region_id(h) | region_id(l) | num_tiles(h) | num_tiles(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more tile ids (16 bits for each tile id) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=xx | num_regions(h)| num_regions(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ one or more spatial regions information +
| |
+ +
| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ID (8 bit): is the local identifier.
len (8 bit): is the length of extension data in bytes not
including the ID and length fields. The value zero indicates
there is no data following.
num_regions (16 bit): indicate the number of transmitted 3D
regions.
position_x (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the x axis.
Gudumasu & Hamza Expires 28 March 2024 [Page 13]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
position_y (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the y axis.
position_z (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the z axis.
size_x (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the x
axis relative to the origin position.
size_y (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the y
axis relative to the origin position.
size_z (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the z
axis relative to the origin position.
region_id (16 bits): is a unique identifier for a 3D region in the
encoded media.
num_tiles (16 bits): identifies the number of tile identifiers
associated with that spatial region.
tile_id (16 bits); identifies a tile identifier associated with
that spatial region.
If the requested region-of-interest is an arbitrary spatial region,
the sender may choose to send one or more pre-defined 3D regions
which were signaled to the receiver during SDP negotiation which
overlap with the requested arbitrary spatial region. In this case,
the transmitted 3D regions information SHALL be carried using the RTP
header extension.
The payload of the transmitted static 3D regions information header
extension element can be encoded using two-byte header defined in
[RFC8285].
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=xx | num_regions |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more region ids (16 bits for each region id) |
+ -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL Zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Gudumasu & Hamza Expires 28 March 2024 [Page 14]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
ID (8 bit): is the local identifier.
len (8 bit): is the length of extension data in bytes not
including the ID and length fields. The value zero indicates
there is no data following.
num_regions (16 bits): indicate the number of transmitted 3D
regions.
region_id (16 bit): is a unique identifier for a pre-defined
static 3D region in the encoded media.
5.3. Response to a 3D viewport request
When an RTCP feedback message for a desired 3D viewport is received
by a sender, the sender SHALL respond to receiver with one or more 3D
spatial regions information that overlap with the requested viewport.
As the transmitted 3D regions correspond to the static 3D regions
(indicated via the URN urn:ietf:params:rtp-hdrext:static-3d-regions-
sent in the SDP negotiation), the signaling of the transmitted 3D
regions use the RTP header extension.
5.3.1. Message format
The payload of the transmitted static 3D regions information header
extension element can be encoded using the two-byte header defined in
[RFC8285].
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=xx | num_regions |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more region ids (16 bits for each region id) |
+ -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL Zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ID (8 bit): is the local identifier.
len (8 bit): is the length of extension data in bytes not
including the ID and length fields. The value zero indicates
there is no data following.
num_regions (16 bits): indicate the number of transmitted 3D
regions.
Gudumasu & Hamza Expires 28 March 2024 [Page 15]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
region_id (16 bit): is a unique identifier for a pre-defined
static 3D region in the encoded media.
5.4. Dynamic 3D regions information transmission
When the 3D regions information in a volumetric media content is
changing over time, the transport of the updated 3D regions
information SHALL be carried using an RTP header extension. The RTP
header extension payload carries the total number of spatial regions
present in the volumetric media and each spatial region information.
5.4.1. Message format
The payload of the transmitted dynamic 3D regions information header
extension element can be encoded using two-byte header defined in
[RFC8285].
Gudumasu & Hamza Expires 28 March 2024 [Page 16]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=xx | num_regions(h)| num_regions(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ one or more spatial regions information +
| |
+ +
| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | OPTIONAL zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_x(h) | position_x | position_x | position_x(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_y(h) | position_y | position_y | position_y(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| position_z(h) | position_z | position_z | position_z(l)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_x(h) | size_x | size_x | size_x(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_y(h) | size_y | size_y | size_y(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| size_z(h) | size_z | size_z | size_z(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| region_id(h) | region_id(l) | num_tiles(h) | num_tiles(l) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more tile ids (16 bits for each tile id) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ID (8 bit): is the local identifier.
len (8 bit): is the length of extension data in bytes not
including the ID and length fields. The value zero indicates
there is no data following.
num_regions (16 bit): indicates the total number of dynamic 3D
regions present in the volumetric media.
position_x (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the x axis.
Gudumasu & Hamza Expires 28 March 2024 [Page 17]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
position_y (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the y axis.
position_z (32 bits): specifies the origin position of the 3D
bounding box in the Cartesian coordinates along the z axis.
size_X (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the x
axis relative to the origin position.
size_Y (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the y
axis relative to the origin position.
size_Z (32 bits): specifies the extension of the 3D bounding box
of the volumetric media in the Cartesian coordinates along the z
axis relative to the origin position.
region_id (16 bits): is an identifier for a 3D region.
num_tiles (16 bits): identifies the number of tile identifiers
associated with that spatial region.
tile_id (16 bits): identifies a tile identifier associated with
that spatial region.
When the total number of spatial regions information is large and
cannot be accommodated into a single RTP packet due to RTP header
extension size limitations or RTP packet size limitations, the
information of all updated spatial regions present in an immersive
media content is signaled over multiple RTP packets. When the
dynamic spatial regions information is sent in multiple RTP packets,
the first, and last RTP packets carrying the dynamic spatial regions
information in an RTP header extension data is identified using the
'appbits' values.
In the two-byte header form, the 16-bit value required by the RTP
specification for a header extension, labeled in the RTP
specification [RFC8285], was defined as shown below.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x100 |appbits|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Gudumasu & Hamza Expires 28 March 2024 [Page 18]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
The 'appbits' field in the RTP header extension SHALL be defined as
below for the transmitted dynamic 3D regions information (indicated
via the URN urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent in the
SDP negotiation).
0
0 1 2 3
+-+-+-+-+
|0|0|S|E|
+-+-+-+-+
S (1 bit): This bit is set to 1 if this is the first RTP packet
carrying the dynamic 3d regions information otherwise set to 0.
E (1 bit): This bit is set to 1 if this is the last RTP packet
carrying the dynamic 3d regions information otherwise set to 0.
6. SDP signaling for Viewport and Region-of-Interest dependent delivery
of V3C data
6.1. SDP signaling of static 3D regions
The 3D regions present in a volumetric media object can be signaled
as an SDP extension. A sender MAY offer information on static 3D
regions present in the volumetric media in the initial offer-answer
negotiation by carrying it in the SDP message. This is done by
including the "a=3d-regions" attribute under the relevant media line.
The following parameters are provided in the attribute for each
static 3D region:
region_id: identifies a pre-defined 3D region.
position_x: specifies the origin position of the 3D region in the
Cartesian coordinate system along the x axis.
position_y: specifies the origin position of the 3D region in the
Cartesian coordinate system along the y axis.
position_z: specifies the origin position of the 3D region box in
the Cartesian coordinate system along the z axis.
size_x: specifies the extension of the 3D region in the Cartesian
coordinates along the x axis relative to the origin position.
size_y: specifies the extension of the 3D region in the Cartesian
coordinates along the y axis relative to the origin position.
Gudumasu & Hamza Expires 28 March 2024 [Page 19]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
size_z: specifies the extension of the 3D region in the Cartesian
coordinates along the z axis relative to the origin position.
name: specifies the name of the pre-defined 3D region.
The syntax for the "a=3d-regions" attribute conforms to the following
ABNF (byte-string defined in [RFC8866] and WSP and DIGIT defined in
[RFC5234]):
3d-regions = "3d-regions:" PT 1*WSP attr-list
PT = 1*DIGIT / "*"
attr-list = ( set *(1*WSP set) ) / "*"
; WSP and DIGIT defined in [RFC5234]
set= "[" "region_id=" idvalue "," "position_x=" posvalue ","
"position_y=" posvalue "," "position_z=" posvalue ","
"size_x=" sizevalue "," "size_y=" sizevalue ","
"size_z=" sizevalue "," "Name=" namevalue "]
idvalue= onetonine*2DIGIT
; Digit between 1 and 9 that is followed by 0 to 2 other digits
posvalue = sizevalue / "0"
; position may be "0"
sizevalue = onetonine *5DIGIT
; Digit between 1 and 9 that is followed by 0 to 5 other digits
onetonine = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
; Digit between 1 and 9
namevalue = byte-string
; byte-string defined in [RFC8866]
An example use of the "a=3d-regions" attribute relative to a media
line
m=application 40008 RTP/AVP 100
a=rtpmap:100 v3c/90000
a=fmtp:100 v3c-unit-header=08000000; // atlas
a=mid:4
a=3d-regions:99 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
6.2. SDP signaling for region-of-interest feedback messages capability
A client supporting region-of-interest-dependent streams SHALL
support at least one of the following modes of requesting a desired
region-of-interest (signaled from a receiver to a sender):
Gudumasu & Hamza Expires 28 March 2024 [Page 20]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
* Static 3D regions
* Arbitrary spatial region
* Viewport
6.2.1. Request for static 3D regions
A client supporting the static 3D regions mode SHALL include the
a=rtcp-fb attribute with the static 3D regions feedback type under
the relevant media line scope. The static 3D regions type in
conjunction with the RTCP feedback method is expressed with the
following parameter: static-3d-regions. A wildcard payload type
("*") may be used to indicate that the RTCP feedback capability
attribute for signaling static 3D regions request capability applies
to all payload types. If several types of 3D regions signaling is
supported and/or the same static 3D regions are specified for a
subset of the payload types, several "a=rtcp-fb" lines can be used.
Here is an example usage of this attribute to signal static 3D
regions relative to a media line based on the RTCP feedback method:
a=rtcp-fb:* ack static-3d-regions
6.2.2. Request for arbitrary spatial region
A client that supports requests for arbitrary spatial region SHALL
indicate this in the SDP offer for the volumetric media where
arbitrary spatial region request capabilities are desired. This is
done by including the a=rtcp-fb attribute line within the scope of
the relevant media line in the SDP message with a feedback message
type corresponding to the arbitrary spatial region mode. The RTCP
feedback message type corresponding to the arbitrary spatial region
request is expressed with the parameter: arbitrary-spatial-region. A
wildcard payload type ("*") may be used to indicate that the RTCP
feedback capability attribute for signaling arbitrary spatial region
request capability applies to all payload types. If the same
arbitrary spatial region capability is specified for a subset of the
payload types, several "a=rtcp-fb" lines can be used.
Here is an example for the usage of this attribute to signal support
for arbitrary spatial region requests in an SDP message based on the
RTCP feedback method:
a=rtcp-fb:* ack arbitrary-spatial-region
Gudumasu & Hamza Expires 28 March 2024 [Page 21]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
6.2.3. Request for a viewport
A client (sender or receiver) supporting streaming of immersive media
content based on the user's viewport SHALL offer the 'Viewport-
dependent streaming (VDS)' capability in SDP for all volumetric media
content where viewport-based immersive media streaming is desired.
VDS support is offered by including the a=rtcp-fb attribute under the
relevant media line scope. The VDS support using the RTCP feedback
method is expressed with the following parameter: 3d-viewport. A
wildcard payload type ("*") may be used to indicate that the RTCP
feedback capability attribute for VDS capability applies to all
payload types. If the same VDS capability is specified for a subset
of the payload types, several "a=rtcp-fb" lines can be used. Here is
an example usage of this attribute to signal viewport-dependent
streaming capability relative to a media line based on the RTCP
feedback method:
a=rtcp-fb:* ack 3d-viewport
6.3. SDP signaling for 3D regions transported using RTP header
extension
A client supporting receiving of static 3D regions, arbitrary spatial
regions and viewport information feedback messages SHOULD include the
transported 3D regions information signaling capability in its SDP
offer for all volumetric media streams. The transported 3D regions
information is signalled be extending RTP Header extension mechanism
defined in [RFC8285].
The transported 3D regions signaling capability is offered by
including the a=extmap attribute under the relevant media line scope.
The URN corresponding to an arbitrary spatial region is
urn:ietf:params:rtp-hdrext:arbitrary-3d-regions-sent
The URN corresponding to static 3D regions is
urn:ietf:params:rtp-hdrext:static-3d-regions-sent.
Here is an example usage of this URN to signal transmitted 3D regions
relative to a media line (e.g., this signaling can be part of the
atlas component media line):
a=extmap:9 urn:ietf:params:rtp-hdrext:static-3d-regions-sent
a=extmap:10 urn:ietf:params:rtp-hdrext:arbitrary-3d-regions-sent
Gudumasu & Hamza Expires 28 March 2024 [Page 22]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
The numbers 9 and 10 in the example may be replaced with any number
in the range 1-254 using the two-byte header extension mechanism.
6.4. SDP signaling for dynamic 3D regions information transported using
RTP header extension
When the 3D regions in an immersive media content are changing over
time, a sender transmits all the dynamic 3D regions information to
the receiver whenever the 3D regions are updated or changed. This
information is not sent in response to any RTCP feedback message
received from a receiver.
A sender supporting the transmission of dynamic 3D regions
information SHOULD offer the dynamic 3D regions signaling capability
in the SDP offer for all volumetric media content. The dynamic 3D
regions information transmission capability signaling in SDP is
offered by including the a=extmap attribute under the relevant media
line scope.
The URN corresponding to the transmitted dynamic 3D regions
information is
urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent.
Here is an example usage of this URN to signal transmitted dynamic 3D
regions relative to a media line (e.g., this signaling can be part of
the atlas component media line):
a=extmap:255 urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent
6.5. Offer/Answer Considerations
The following SDP offer/answer examples are provided for V3C content.
An example of offer which supports providing information of static 3D
regions present in the volumetric media and providing region-of-
interest-dependent streams with the RTCP feedback request modes
static 3D regions, arbitrary spatial region and viewport.
Gudumasu & Hamza Expires 28 March 2024 [Page 23]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
a=group:v3c 1 2 3 4 v3c-ptl-level-idc=10;
v3c-parameter-set=AF6F00939921878
m=video 40000 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H265/90000
a=rtpmap:98 H266/90000
a=fmtp:96 v3c-unit-type=2;v3c-vps-id=0;v3c-atlas-id=0
a=fmtp:97 v3c-unit-type=2;v3c-vps-id=0;v3c-atlas-id=0
a=fmtp:98 v3c-unit-type=2;v3c-vps-id=0;v3c-atlas-id=0
a=sendonly
a=mid:1
m=video 40002 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H265/90000
a=rtpmap:98 H266/90000
a=fmtp:96 v3c-unit-type=3;v3c-vps-id=0;v3c-atlas-id=0;
a=fmtp:97 v3c-unit-type=3;v3c-vps-id=0;v3c-atlas-id=0;
a=fmtp:98 v3c-unit-type=3;v3c-vps-id=0;v3c-atlas-id=0;
a=mid:2
a=sendonly
m=video 40004 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H265/90000
a=rtpmap:98 H266/90000
a=fmtp:96 v3c-unit-type=4;v3c-vps-id=0;v3c-atlas-id=0
a=fmtp:97 v3c-unit-type=4;v3c-vps-id=0;v3c-atlas-id=0
a=fmtp:98 v3c-unit-type=4;v3c-vps-id=0;v3c-atlas-id=0
a=mid:3
a=sendonly
m=application 40006 RTP/AVP 100
a=rtpmap:100 v3c/90000
a=fmtp:100 v3c-unit-type=1;v3c-vps-id=0;v3c-atlas-id=0
a=mid:4
a=sendonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head]
[region_id=1,position_x=0,position_y=360,position_z=0,size_x=1080,
size_y=360,size_z=360,name=Arms]
[region_id=2,position_x=0,position_y=720,position_z=0,size_x=540,
size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=rtcp-fb:* ack static-3d-regions
a=rtcp-fb:* ack arbitrary-spatial-region
a=rtcp-fb:* ack 3d-viewport
Gudumasu & Hamza Expires 28 March 2024 [Page 24]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
An example answer which accepts the information of static 3D regions
present in the volumetric media and requests region-of-interest,
interested viewport content with the RTCP feedback request modes
static 3D regions, arbitrary spatial region and viewport.
...
a=group:v3c 1 2 3 4
m=video 50000 RTP/AVP 96
a=rtpmap:96 H264/90000
a=recvonly
m=video 50002 RTP/AVP 97
a=rtpmap:97 H265/90000
a=recvonly
m=video 50004 RTP/AVP 98
a=rtpmap:98 H266/90000
a=recvonly
m=application 50006 RTP/AVP 96
a=rtpmap:100 v3c/90000
a=recvonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=rtcp-fb:* ack static-3d-regions
a=rtcp-fb:* ack arbitrary-spatial-region
a=rtcp-fb:* ack 3d-viewport
An example of offer which supports the transported 3D regions
information signaling capability.
Gudumasu & Hamza Expires 28 March 2024 [Page 25]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
a=group:v3c 1 2 3 4 v3c-ptl-level-idc=10;
v3c-parameter-set=AF6F00939921878
m=video 40000 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=fmtp:96 v3c-unit-type=2;v3c-vps-id=0;v3c-atlas-id=0
a=sendonly
a=mid:1
m=video 40002 RTP/AVP 96 97 98
a=rtpmap:97 H265/90000
a=fmtp:97 v3c-unit-type=3;v3c-vps-id=0;v3c-atlas-id=0;
a=mid:2
a=sendonly
m=video 40004 RTP/AVP 96 97 98
a=rtpmap:98 H266/90000
a=fmtp:98 v3c-unit-type=4;v3c-vps-id=0;v3c-atlas-id=0
a=mid:3
a=sendonly
m=application 40006 RTP/AVP 100
a=rtpmap:100 v3c/90000
a=fmtp:100 v3c-unit-type=1;v3c-vps-id=0;v3c-atlas-id=0
a=mid:4
a=sendonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=rtcp-fb:* ack static-3d-regions
a=rtcp-fb:* ack arbitrary-spatial-region
a=rtcp-fb:* ack 3d-viewport
a=extmap:9/sendonly urn:ietf:params:rtp-hdrext:static-3d-regions-sent
a=extmap:10/sendonly
urn:ietf:params:rtp-hdrext:arbitrary-3d-regions-sent
An example answer which supports sending only static region-of-
interest RTCP feedback request messages and receiving the transported
3D regions information.
Gudumasu & Hamza Expires 28 March 2024 [Page 26]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
...
a=group:v3c 1 2 3 4
m=video 50000 RTP/AVP 96
a=rtpmap:96 H264/90000
a=recvonly
m=video 50002 RTP/AVP 97
a=rtpmap:97 H265/90000
a=recvonly
m=video 50004 RTP/AVP 98
a=rtpmap:98 H266/90000
a=recvonly
m=application 50006 RTP/AVP 96
a=rtpmap:100 v3c/90000
a=recvonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=rtcp-fb:* ack static-3d-regions
a=extmap:9/recvonly urn:ietf:params:rtp-hdrext:static-3d-regions-sent
An example of offer which supports transmission of dynamic 3D regions
information and it's signaling capability.
Gudumasu & Hamza Expires 28 March 2024 [Page 27]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
a=group:v3c 1 2 3 4 v3c-ptl-level-idc=10;
v3c-parameter-set=AF6F00939921878
m=video 40000 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=fmtp:96 v3c-unit-type=2;v3c-vps-id=0;v3c-atlas-id=0
a=sendonly
a=mid:1
m=video 40002 RTP/AVP 96 97 98
a=rtpmap:97 H265/90000
a=fmtp:97 v3c-unit-type=3;v3c-vps-id=0;v3c-atlas-id=0;
a=mid:2
a=sendonly
m=video 40004 RTP/AVP 96 97 98
a=rtpmap:98 H266/90000
a=fmtp:98 v3c-unit-type=4;v3c-vps-id=0;v3c-atlas-id=0
a=mid:3
a=sendonly
m=application 40006 RTP/AVP 100
a=rtpmap:100 v3c/90000
a=fmtp:100 v3c-unit-type=1;v3c-vps-id=0;v3c-atlas-id=0
a=mid:4
a=sendonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=extmap:255/sendonly
urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent
An example answer which accepts receiving of dynamic 3D regions
information and it's signaling capability.
Gudumasu & Hamza Expires 28 March 2024 [Page 28]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
...
a=group:v3c 1 2 3 4
m=video 50000 RTP/AVP 96
a=rtpmap:96 H264/90000
a=recvonly
m=video 50002 RTP/AVP 97
a=rtpmap:97 H265/90000
a=recvonly
m=video 50004 RTP/AVP 98
a=rtpmap:98 H266/90000
a=recvonly
m=application 50006 RTP/AVP 96
a=rtpmap:100 v3c/90000
a=recvonly
a=3d-regions:100 [region_id=0,position_x=0,position_y=0,position_z=0,
size_x=540,size_y=360,size_z=360,name=Head] [region_id=1,
position_x=0,position_y=360,position_z=0,size_x=1080,size_y=360,
size_z=360,name=Arms] [region_id=2,position_x=0,position_y=720,
position_z=0,size_x=540,size_y=360,size_z=360,name=Body]
[region_id=3,position_x=0,position_y=1080,position_z=0,size_x=540,
size_y=360,size_z=360,name=Legs]
a=extmap:255/recvonly
urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent
7. Security Considerations
RTCP feedback messages and RTP packets using the header extension
format defined in this specification are subject to the security
considerations discussed in the RTP specification [RFC3550], and in
any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF
[RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124].
8. IANA Considerations
For the Session Description Protocol, the following attributes
attribute needs to be registered:
- "3d-regions"
The following RTCP feedback type parameters needs to be registered:
- "static-3d-regions"
- "arbitrary-spatial-region"
- "3d-viewport"
Within the RTCP payload type value PSFB range, the following two
format (FMT) values needs to be registered:
Gudumasu & Hamza Expires 28 March 2024 [Page 29]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
- 18: Spatial region
- 19: Viewport
The following new extension URIs in the RTP Header Extensions sub
registry of the Real-Time Transport Protocol (RTP) Parameters
registry needs to be registered:
Extension URI: urn:ietf:params:rtp-hdrext:static-3d-regions-sent
Description: Transmitted static 3D regions
Extension URI: urn:ietf:params:rtp-hdrext:arbitrary-3d-regions-sent
Description: Transmitted arbitrary spatial regions
Extension URI: urn:ietf:params:rtp-hdrext:dynamic-3d-regions-sent
Description: Transmitted dynamic 3D regions
9. References
9.1. Normative References
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
"Extended RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
DOI 10.17487/RFC4585, July 2006,
<https://www.rfc-editor.org/rfc/rfc4585>.
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008,
<https://www.rfc-editor.org/rfc/rfc5234>.
[RFC8285] Singer, D., Desineni, H., and R. Even, Ed., "A General
Mechanism for RTP Header Extensions", RFC 8285,
DOI 10.17487/RFC8285, October 2017,
<https://www.rfc-editor.org/rfc/rfc8285>.
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
Session Description Protocol", RFC 8866,
DOI 10.17487/RFC8866, January 2021,
<https://www.rfc-editor.org/rfc/rfc8866>.
9.2. Informative References
Gudumasu & Hamza Expires 28 March 2024 [Page 30]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
[I-D.draft-ietf-avtcore-rtp-v3c]
Ilola, L. and L. Kondrad, "RTP Payload Format for Visual
Volumetric Video-based Coding (V3C)", Work in Progress,
Internet-Draft, draft-ietf-avtcore-rtp-v3c-03, 27 July
2023, <https://datatracker.ietf.org/doc/html/draft-ietf-
avtcore-rtp-v3c-03>.
[ISO.IEC.23090-10]
ISO/IEC, "Information technology - Coded representation of
immersive media - Part 10: Carriage of visual volumetric
video-based coding data", ISO/IEC FDIS 23090-10, 2022,
<https://www.iso.org/standard/78991.html>.
[ISO.IEC.23090-12]
ISO/IEC, "Information technology - Coded representation of
immersive media - Part 12: MPEG Immersive video (MIV)",
ISO/IEC 23090-12, 2022,
<https://www.iso.org/standard/79113.html>.
[ISO.IEC.23090-5]
ISO/IEC, "Information technology - Coded representation of
immersive media - Part 5: Visual volumetric video-based
coding (V3C) and video-based point cloud compression
(V-PCC)", ISO/IEC 23090-5, 2021,
<https://www.iso.org/standard/73025.html>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
July 2003, <https://www.rfc-editor.org/rfc/rfc3550>.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
Video Conferences with Minimal Control", STD 65, RFC 3551,
DOI 10.17487/RFC3551, July 2003,
<https://www.rfc-editor.org/rfc/rfc3551>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, DOI 10.17487/RFC3711, March 2004,
<https://www.rfc-editor.org/rfc/rfc3711>.
Gudumasu & Hamza Expires 28 March 2024 [Page 31]
Internet-Draft VOLUMETRIC-MEDIA-ROI-DELIVERY September 2023
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
"Codec Control Messages in the RTP Audio-Visual Profile
with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
February 2008, <https://www.rfc-editor.org/rfc/rfc5104>.
[RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
Real-time Transport Control Protocol (RTCP)-Based Feedback
(RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2008, <https://www.rfc-editor.org/rfc/rfc5124>.
Authors' Addresses
Srinivas Gudumasu
InterDigital
Canada
Email: srinivas.gudumasu@interdigital.com
Ahmed Hamza
InterDigital
Canada
Email: ahmed.hamza@interdigital.com
Gudumasu & Hamza Expires 28 March 2024 [Page 32]