Internet DRAFT - draft-vitali-ietf-avt-mdc-lc
draft-vitali-ietf-avt-mdc-lc
Network Working Group A. Vitali
Internet Draft <draft-vitali-ietf-avt-mdc-lc-00.txt> STMicroelectronics
Expires: January 2006 M.Fumagalli
CEFRIEL - Politecnico di Milano
July 2005
Standard-compatible Multiple-Description Coding (MDC) and
Layered Coding (LC) of Audio/Video Streams
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Comments are solicited and should be addressed to the AVT WG mailing
list at avt@ietf and/or the author(s).
Abstract
This document specifies an efficient way to ensure erasure resilient,
scalable transmission of encoded multimedia sources via RTP using
standard-compatible Multiple Description Coding (MDC) and Layered
Coding (LC) together with interleaving, thus allowing a graceful
degradation of the application quality with increasing packet loss
rate and decreasing bandwidth/throughput on the network.
This document describes what information needs to be communicated
out-of-band to use a specific MDC/LC scheme and what information is
A. Vitali Standards Track [Page 1]
RFC PROPOSED standard-compatible MDC/LC May 2005
needed in RTP packets to identify what description/layer they carry.
Definitions for SDP and MIME are provided.
Table of Contents
1. Introduction.................................................. 3
1.1. Performance and efficiency of MDC/LC.................... 3
1.2. Application scenarios................................... 4
1.3. Standard-compatible framework, compatibility issues..... 4
1.4. Complexity of MDC/LC, implementation issues............. 5
1.5. Interaction with other techniques (ARQ/FEC)............. 6
1.6. Joint Source-Channel Coding (JSCC)...................... 6
2. Conventions................................................... 7
3. MDC/LC identification parameters.............................. 7
3.1. SDP media-level attributes.............................. 7
3.2. MIME registration....................................... 11
4. Typical cases and examples.................................... 11
4.1. Polyphase Downsampling Multiple Description (PDMD)...... 11
4.2. Multiple Description by filter bank..................... 13
4.3. Frame-expanded Multiple Description..................... 15
4.4. Unbalanced Multiple Description (UMD)................... 17
4.5. Classical Layered Coding (LC)........................... 18
4.6. Haar-wavelet Layered Coding............................. 19
5. Packetization................................................. 20
5.1. Synchronization......................................... 20
5.2. Multiplexing, Interleaving.............................. 20
6. Security Considerations....................................... 20
7. Congestion Control and bandwidth management................... 21
8. IANA Considerations........................................... 21
9. Informative appendix: rationale for std-compatible MDC/LC..... 21
9.1. Rationale for video MDC................................. 21
9.2. MDC in the error-prediction or in the transform domain.. 22
9.3. MDC in the pixel domain................................. 22
10. References.................................................... 23
10.1. Normative references.................................... 23
10.2. Informative references.................................. 23
Authors' Addresses................................................ 24
Full Copyright Statement.......................................... 24
Intellectual Property Right....................................... 25
Acknowledgements.................................................. 25
A. Vitali Standards Track [Page 2]
RFC PROPOSED standard-compatible MDC/LC May 2005
1. Introduction
Multiple Description Coding (MDC) and Layered Coding (LC) are
analogous. Each description/layer can contribute to one or more
characteristics of multimedia data: spatial/temporal resolution and
quality (SNR). Descriptions/layers can also contribute to frequency
content in the transform domain.
The difference between MDC and LC lies in the dependency among
bitstreams. The simplest case is when two bitstreams are created.
In the case of LC they are referred to as "base layer" and
"enhancement layer", the latter depends on the former and cannot be
decoded independently. On the other hand, in the case of MDC, each
description can be individually decoded to get a base quality. The
more descriptions decoded, the larger the output quality.
The base layer is clearly more important than enhancement layers. On
the other hand, descriptions can have the same importance (as for
balanced MDC) or they can have different importance (as for
unbalanced MDC).
1.1. Performance and efficiency of MDC/LC
MDC and LC are useful in case of varying bandwidth/throughput and in
case of losses/erasures due to congestion (as for Internet) and
uncorrectable errors (as for wireless channels).
MDC greatly improves loss/erasure resilience because each bitstream
can be decoded independently, making it unlikely to have the same
portion of data corrupted in every description. LC can improve error
resilience when the protection level for a given layer can be adapted
to its importance so that the base layer is more protected.
MDC/LC eases the management of variable bandwidth/throughput by
transmitting a suitable number of descriptions/layers. It must be
noted that, when neither LC nor MDC is used, an expensive transcoding
process is needed to match the channel capacity.
MDC/LC can also exploit path diversity.
Coding efficiency is somewhat reduced depending on the amount of
redundancy left among descriptions/layers. However, any technique
that introduce scalability or resilience does reduce coding
efficiency.
A. Vitali Standards Track [Page 3]
RFC PROPOSED standard-compatible MDC/LC May 2005
1.2. Application scenarios
Foreseen applications are summarized in the following list:
- Easy cell hand-over: different descriptions can be streamed from
different base stations exploiting multi-paths on a cell boundary.
- Adaptation to low resolution/memory/power: mobiles decode as many
descriptions/layers as they can based on their display size,
available memory, processor speed, and battery level.
- Easy picture-in-picture: with the classical solution, a second
full-decoding is needed plus downsizing; with MDC/LC, it is
sufficient to decode one description or the base layer and paste it
on the display.
- Better delivered quality: in a broadcast scenario, there is no need
to protect heavily the stream for the farthest user with FEC
techniques, lowering quality at the same gross rate (media+FEC);
nearer users will experience a better quality by receiving more
descriptions/layers.
- Adaptation to varying bandwidth: the base station can simply drop
descriptions/layers; more users can be easily served, and no
transcoding process is needed.
- Multi-standard support (simulcast without simulcast): descriptions
can be encoded with different codecs (MPEG-2, H.263, H.264);
there's no waste of capacity as descriptions carry different
information.
- Divide-et-impera approach for HDTV distribution: HDTV sequences can
be split into SDTV descriptions; no custom high-bandwidth h/w is
required.
- Enhanced carousel: instead of repeating the same data over and over
again, different descriptions are transmitted one after another;
the decoder can store and combine them to get an higher quality.
- "pay-per-quality" services: user can decide at which quality level
to enjoy a service, from low-cost low-resolution (base layer or one
description only) to higher cost high-resolution (by paying for
enhancement layers / more descriptions).
1.3. Standard-compatible framework, compatibility issues
The implementation of the proposed MDC/LC scheme is completely
independent of the underlying multimedia codec. The creation of
A. Vitali Standards Track [Page 4]
RFC PROPOSED standard-compatible MDC/LC May 2005
descriptions/layers is performed in the data domain. This is done in
a pre-processing stage. Descriptions/layers can then be coded
independently. At the decoder side, there is a post-processor stage
in which decoded descriptions/layers are merged.
Specific information needs to be communicated out-of-band via SDP or
MIME to specify which MDC/LC scheme is in use. Standard decoders
will ignore this specific information. However, such decoders will
still be able to decode each successfully received description/layer.
At the same time, decoders MDC/LC-aware will parse this information
in order to properly decode and merge descriptions/layers.
Balanced MDC can even be beneficial for standard decoders.
Multiplexed descriptions can be marked so that standard decoders
understand they are multiple copies of the same data. Of course,
decoded data will have a smaller resolution/quality. As an example,
when balanced descriptions are transmitted, standard decoders will
understand that the same data is transmitted multiple times in a way
similar, but not equal to, repetition codes. Actually, there is no
repetition but slightly different data packets are transmitted.
Decoders can be instructed to decode only the first successfully
received 'copy'.
1.4. Complexity of MDC/LC, implementation issues
Pre- and post-processing stages can be completely decoupled from the
underlying multimedia codec. However, it must be noticed that
keeping MDC/LC decoupled from the underlying codec prevent MDC/LC to
give its best. To get maximum quality for the decoded MDC/LC and to
do MDC/LC encoding with the least effort, joint or coordinated
encoding could be used. Also, to exploit MDC/LC redundancy and to
maximize the error resilience, joint MDC/LC decoding is recommended.
As an example, video encoders can share expensive encoding decisions
(motion vectors) instead of computing them; also they can coordinate
encoding decisions (quantization policies) to enhance quality or
enhance resilience (interleaved multiframe prediction policies, intra
refresh policies). Decoders can share decoded data to ease error
concealment; also they can share critical internal variables (anchor
frame buffer) to stop error propagation due to prediction.
It is worth mentioning that, if balanced descriptions are properly
compressed and packetized, losses/erasures can be recovered before
the decoding stage. In this case, decoders are preceeded by a
special processor in which lost packets are recovered by copying
similar packets from other descriptions. Similar packets are those
that carry the same portion of data.
A. Vitali Standards Track [Page 5]
RFC PROPOSED standard-compatible MDC/LC May 2005
1.5. Interaction with other techniques (ARQ/FEC)
Several techniques have been proposed to enhance error/loss
resilience of multimedia streams sent through unreliable channels.
Among these, there are techniques like forward error
detection/correction codes (FEC) or automatic repetition requests
(ARQ).
ARQ is very effective but it requires a feedback channel and it can
be used only in point-to-point communications, not for broadcast. Of
course, time must be allowed for retransmissions. On the opposite,
FEC does not require a feedback channel and it is suitable for
broadcast.
FEC usually needs to be complex (plus it introduces a substantial
coding and interleaving delay) in order to be effective and it has an
all-or-nothing performance: if the correction capability is exceeded,
almost nothing is delivered to the receiver. Capacity may be wasted
if the worst case (worst channel conditions, farthest user) must be
considered. On the opposite, when the channel is better than
expected and there are no losses, FEC redundancy is useless. On the
other hand, a particular MDC scheme (frame-expanded MDC) can yield a
superior quality to the user in this case as redundant data may
contribute to the reduction of quantization noise power.
Both these techniques, ARQ and/or FEC, can be used together with
MDC/LC.
It is suggested to adapt the protection level of a given description/
layer to its importance, a technique commonly known as unequal error
protection. It is suggested to use unequal error protection even in
the case of equally important descriptions (balanced MDC). In fact,
armoring only one description may be more effective than trying to
protect all descriptions. If this is done, there's one description
which is heavily protected. If the channel becomes really bad, this
description is likely to survive losses. Then the decoder will be
able to guarantee a basic quality, thanks to this description.
1.6. Joint Source-Channel Coding (JSCC)
It is often useful to optimize the parameters of the source and
channel encoders jointly (joint source-channel coding, JSCC). In the
case of multimedia communications, this means exploiting the error
resilience that may be embedded in compressed multimedia bitstreams
rather than using complex forward error detection/correction codes or
complex communication protocols.
As an example, in MPEG-x/H.26x video encoders, it is possible to
A. Vitali Standards Track [Page 6]
RFC PROPOSED standard-compatible MDC/LC May 2005
increase error resilience through one or more of the following
techniques: more frequent intra pictures to reset the motion
prediction loop; suitable intra macroblock update policy; more slices
per picture to reset differential motion vector and DC coefficient
coding; Flexible Macroblock Order (FMO) or Asynchronous Slice Order
(ASO) or concealment motion vectors or interleaved multiframe
prediction policies to ease error concealment; reversible variable
length coding (RVLC, with MPEG-4) or an error resilient entropy
coding scheme (EREC) ormore sync markers; etc.
MDC can be seen as another way of enhancing error resilience without
using complex channel coding schemes. Channel coding can then be
reduced, compensating for the reduced coding efficiency due to
redundancy left among descriptions. This can be seen as a form of
joint source channel coding.
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [3].
3. MDC/LC identification parameters
Specific information needs to be communicated out-of-band to use an
MDC/LC scheme. There are two sets of parameters. The first set of
parameters is related to the pre-processor: it describes the creation
of each description/layer so that smart receivers MDC/LC-aware can
compute how to merge them depending on the loss pattern. The second
set of parameters is related to the post-processor: it describes the
merge of descriptions/layers for basic receivers MDC/LC-aware that
can not compute how to merge them. Smart receivers MDC/LC-aware can
compute a second set for the merge given the first set.
One of these sets MUST be sent via Session Description Protocol (SDP)
using media-level attributes ("a="). Both sets MAY be sent at the
same time. A mapping of the parameters into MIME optional parameters
is also provided. Equivalent parameters could be defined elsewhere
for use with control protocols that do not use MIME or SDP.
Standard receivers not MDC/LC-aware MUST ignore these sets.
Receivers MDC/LC-aware MAY ignore these sets.
3.1. SDP media-level attributes
Media-level unregistered attributes ("X-attribute:value") are
defined. Their formatting in SDP is defined using the Augmented BNF
(ABNF) grammar for syntax specifications (RFC 2234). Note: this
A. Vitali Standards Track [Page 7]
RFC PROPOSED standard-compatible MDC/LC May 2005
draft is not yet perfectly consistent with RFC 2234.
- "a=X-mdclc-tag:" payload-type mdclc-tag
All descriptions/layers used in MDC/LC MUST be identified by a
media identification tag.
Tag "S" MUST NOT be used: it is reserved to indicate the original
multimedia data which is the default starting point for
descriptions/layers generation.
"_Q" MUST NOT be used as part of the tag: it is a suffix reserved
to indicate decoded (and quantized) descriptions/layers. When
lossless co/decoding is used (no quantization), the "_Q" suffix
SHOULD NOT be used. "T_" MUST NOT be used ad part of the tag: it
is a prefix reserved to indicate temporary descriptions/layers for
computations in pre- and post-processor.
"D" SHOULD be included as part of the tag to indicate that it is an
independent description. "L" SHOULD be included as part of the tag
to indicate that it is a dependent layer. Numbers MAY be used.
Lower numbers SHOULD be used to indicate more important
descriptions/layers. "_" SHOULD be used as separator to improve
readability.
The following suffixes are reserved video data components and they
MUST NOT be used as part of the tag: ".Y" (luma), ".Cb" (first
chroma), ".Cr" (second chroma), ".R" (red), ".G" (green), ".B"
(blu). The following suffixes are reserved audio channels they
MUST NOT be used as part of the tag: ".L" (left), ".R" (right),
".C" (center), ".RL" (rear left) or ".SL" (surround left), ".RR"
(rear right) or ".SR" (Surround right), ".S" (surround, rear
center), ".LFE" (low frequency effects). When they are used, they
MUST be the last suffix. Alternatively, numbers can be used
instead of labels for video components or audio channels. When
they are not used as part of the tag, the same computations MUST be
applied on all components or channels that are present.
- "a=X-mdclc-group:" group-tag +(space mdclc-tag)
Groups of descriptions/layers in MDC/LC are identified by a group
tag followed by zero or more media identification tags. The media
name in the "m=" line of SDP MUST be the same for all
descriptions/layers in the group.
- "a=X-mdclc-pre:" *(tag"="expression";")
The creation of the indicated description/layer is specified as a
A. Vitali Standards Track [Page 8]
RFC PROPOSED standard-compatible MDC/LC May 2005
list of one or more expression that are computed in the pre-
processor. The tag for the starting point is "S", it indicates the
original data.
Valid expression MAY use every tagged description/layer. Decoded
descriptions/layers MAY be used, they are indicated by appending
"_Q" to the tag. Temporary descriptions/layers MAY be used, they
SHOULD be indicated by prepending "T_" to the tag. If an invalid
expression is listed, the line MUST be entirely ignored.
- "a=X-mdclc-post:" group-tag *(tag"="expression";")
The merge of descriptions/layers in the indicated group is
specified as a list of one or more expression that are computed in
the post-processor. The tag for the ending point is "E", it
indicates the reconstructed original data.
The ending point MUST be initialized to zero, which is equivalent
to the following expression at the beginning: "E=0;".
Valid expression MAY use every tagged description/layer. Only
decoded descriptions/layers are available at the receiver,
therefore "_Q" SHOULD be added to every tag. Temporary
descriptions/layers MAY be used, they SHOULD be indicated by
prepending "T_" to the tag. If an invalid action is listed, the
line MUST be ignored.
All descriptions/layers listed in the indicated group MUST be
received in order for the merge to take place. If there is more
than one group for which all descriptions/layers have been
received, the first group listed SHOULD be used.
Valid expression are standard mathematical expressions: they are made
of operands (tagged descriptions/layers) together with their
operators, parenthesis MAY be used to prioritize computations.
Operands are numbers or tags. The set of valid operators for the
pre- and post-processor is listed below. Also, there is a set of
available functions for up/downsampling, filtering, conversion.
Labels are case insensitive.
Computations are done with the precision of the operands. Typically
for video each pixel is represented with 1 byte per component
(YCbCr). Typically for audio each sample is represented with 2
bytes. Precision can be changed by using specific conversion
functions.
- Arithmetic operators.
Plus +
A. Vitali Standards Track [Page 9]
RFC PROPOSED standard-compatible MDC/LC May 2005
Unary plus +
Minus -
Unary minus -
Multiply *
- Relational operators.
Equal ==
Not equal ~= or !=
Less than <
Greater than >
Less than or equal <=
Greater than or equal >=
- Logical operators.
Short-circuit logical AND &&
Short-circuit logical OR ||
Bit-wise logical AND &
Bit-wise logical OR |
Logical NOT ~ or !
Logical EXCLUSIVE OR ^
- up(F,P,A,tag): up-sampling of indicated description/layer by zero
insertion along axis A. For every incoming sample, F output
samples are created; F-1 zeros have to be inserted, valid factors
are greater than or equal to one. The phase P indicates the output
position of the incoming sample, valid phases goes from 0 to F-1.
The axis A is indicated by a letter: X (horizontal), Y (vertical)
or Z (temporal); more than one letter can be specified at the same
time.
- dn(F,P,A,tag): down-sampling of indicated description/layer by
sample deletion along axis A. For every F incoming samples, one
output sample is selected; F-1 samples have to be deleted, valid
factors are greater than or equal to one. The phase P indicates
the position of the incoming sample that survives the downsampling,
valid phases goes from 0 to F-1. The axis A is indicated by a
letter: X (horizontal), Y (vertical) or Z (temporal); more than one
letter can be specified at the same time.
- fir("["coefficients"]",tag[,IC]): finite impulse response (FIR)
filter for indicated description/layer. Coefficients are listed
from left to right, they MAY be separated by commas; from first row
to last row, rows MUST be separated by semicolon; from first plane
to last plane, planes MUST be separated by dots. Unlisted
coefficients MUST be set to 0.
There MUST be an odd number of coefficients per row, an odd number
of rows, an odd number of planes. The coefficients of the filter
A. Vitali Standards Track [Page 10]
RFC PROPOSED standard-compatible MDC/LC May 2005
are centered around the output sample.
Initial conditions are specified by IC: 0=unavailable samples are
set to zero, 1=unavailable samples are copies of the first nearest
available sample, 2=unavailable samples are taken by mirroring
nearest available samples. The default is IC=0.
To be added: conversion functions as long(), int(), etc; clipping
function. To be discussed: allow for the definition of new
functions? C-like operators as "? :".
3.2. MIME registration
Equivalent MIME parameters are defined.
4. Typical cases and examples
Several typical cases are discussed in this section for the case of
video data. To be done: audio data.
Multiple description (MDC) schemes. Basic polyphase downsampling MDC
schemes (PDMD) with two balanced descriptions (2MD) and four balanced
descriptions (4MD). Slightly unbalanced alias-free PDMD by filter
bank. Frame expanded MDC with three descriptions (3MD) and five
descriptions (5MD). Unbalanced MDC (UMD) where a low resolution
redundant description is added.
Layered coding (LC) schemes. Classical layered coding (LC) with two
layers (2LC) and three layers (3LC). 1D and 2D Haar wavelet layered
coding.
4.1. Polyphase Downsampling Multiple Description (PDMD)
Two balanced descriptions can be generated by separating odd/even
lines. The short name of this scheme is 2MD. The erasure resilience
is low, but the overhead is also low.
m=video 49170 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= dn(2,0,Y, S);
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D2= dn(2,1,Y, S);
Note that the quantity of data to be encoded is 2*1/2.
A. Vitali Standards Track [Page 11]
RFC PROPOSED standard-compatible MDC/LC May 2005
The behaviour of the basic receiver MDC/LC-aware SHOULD be specified
for every loss pattern. In the case of 2MD, there are four loss
patterns indicated as lpN, four merge scenarios are specified. When
all descriptions are available, they are merged as they are. When
only one description is available, missing lines are computed by
averaging neighboring lines. When no description is received, pixel
are set to gray (128). Smart receivers MDC/LC-aware MAY ignore this
and implement a more effective concealment (e.g. copy-previous or
motion-compensated copy).
a=X-mdclc-group: lp0 D1 D2
a=X-mdclc-post: lp0 E= up(2,0,Y,D1) + up(2,1,Y,D2);
a=X-mdclc-group: lp1 D1
a=X-mdclc-post: lp1 E= fir([0.5;1;0.5], up(2,0,Y,D1));
a=X-mdclc-group: lp2 D2
a=X-mdclc-post: lp2 E= fir([0.5;1;0.5], up(2,1,Y,D2));
a=X-mdclc-group: lp3
a=X-mdclc-post: lp3 E= 128;
Two balanced descriptions can also be generated by separating
odd/even frames.
Four balanced descriptions can be generated by separating odd/even
lines and taking every other pixel. The short name of this scheme is
4MD. The erasure resilience is high, but the overhead is also high.
m=video 49170 RTP/AVP 97 98 99 100
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= dn(2,0,XY, S);
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D1= dn(2,1,X, dn(2,0,Y, S));
a=rtpmap:99 H264/90000
a=fmtp:99 ...
a=X-mdclc-tag: 99 D3
a=X-mdclc-pre: D3= dn(2,1,X, dn(2,0,Y, S));
a=rtpmap:100 H264/90000
a=fmtp:100 ...
a=X-mdclc-tag:100 D4
a=X-mdclc-pre: D4= dn(2,1,XY, S);
Note that the quantity of data to be encoded is 4*1/4.
In the case of 4MD there are sixteen loss patterns. It is clear the
advantage of having a smart receiver MDC/LC aware: merge scenarios
A. Vitali Standards Track [Page 12]
RFC PROPOSED standard-compatible MDC/LC May 2005
need not be specified explicitly. The concealment of missing data
may be driven by actual received data rather than being independently
fixed once and for all (e.g. edge-driven interpolation).
4.2. Multiple Description by filter bank
Downsampling may cause aliasing in a given description if there are
high frequencies in the original data. The aliasing is canceled
during the merge process. The aliasing may also be avoided if
original data is lowpass filtered prior downsampling. Original data
may be reconstructed (except for quantization noise) if filters are
properly designed.
Two balanced descriptions can be generated by different mild lowpass
filters. Each odd line (A) is combined with the following even line
(B) with different weights.
D1 = +0.75 A +0.25 B
D2 = +0.25 A +0.75 B
Corresponding SDP attributes are:
m=video 49170 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= dn(2,0,Y, fir([0 0.75;0.25], S));
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D2= dn(2,0,Y, fir([0 0.25;0.75], S));
Note that in this example the phase of downsampling is not changed.
Instead coefficients of FIRs are changed.
If descriptions are quantized, their quantization noise affects the
reconstructed lines. Pairs of odd (A) and even lines (B) can be
reconstructed as follows:
A = +1.5 D1 -0.5 D2
B = -0.5 D1 +1.5 D2
The generation of descriptions can be seen as a filter bank whose
output is downsampled. Conversely, the merge of descriptions can be
seen as a filter bank whose input is upsampled. Corresponding SDP
attributes are:
a=X-mdclc-group: lp0 D1 D2
A. Vitali Standards Track [Page 13]
RFC PROPOSED standard-compatible MDC/LC May 2005
a=X-mdclc-post: lp0 E= fir([0;-0.5;+1.5], up(2,0,Y, D1)) +
fir([0;+1.5;-0.5], up(2,0,Y, D2));
Reconstruction can also be done in this trivial way:
a=X-mdclc-group: lp0 D1 D2
a=X-mdclc-post: lp0 E= up(2,0,Y,1.5*D1) + up(2,0,Y,-0.5*D2) +
up(2,1,Y,-0.5*D1) + up(2,1,Y,+1.5*D2);
Four slightly unbalanced descriptions can be generated by different
lowpass filters. In the following example, the first description is
not filtered and it may be aliased; the second description is lowpass
filtered horizontally and it is free of horizontal alias; the third
description is lowpass filtered vertically and it is free of vertical
alias; the fourth description is lowpass filtered both horizontally
and vertically and it is completely alias free.
Pixels can be labeled in the following way:
A B A' ..
C D C'
A" B" A"'
:
The invertible equation system is:
D1 = 1/1 A
D2 = 1/4 A + 1/2 B + 1/4 A'
D3 = 1/4 A + 1/2 C + 1/4 A"
D4 = 1/16 A + 1/8 B + 1/16 A' ...
1/8 C + 1/4 D + 1/8 C' ...
1/16 A" + 1/8 B" + 1/16 A"'
Corresponding SDP attributes are:
m=video 49170 RTP/AVP 97 98 99 100
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= dn(2,0,XY, S);
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D2= dn(2,0,XY, fir([1,2,1]/4, S));
a=rtpmap:99 H264/90000
a=fmtp:99 ...
a=X-mdclc-tag: 99 D3
a=X-mdclc-pre: D3= dn(2,0,XY, fir([1;2;1]/4, S));
A. Vitali Standards Track [Page 14]
RFC PROPOSED standard-compatible MDC/LC May 2005
a=rtpmap:100 H264/90000
a=fmtp:100 ...
a=X-mdclc-tag:100 D4
a=X-mdclc-pre: D4= dn(2,0,XY, fir([1,2,1; 2,4,2; 1,2,1]/16, S));
Note that the gain of FIR filters is 1. The gain is computed as the
sum of the absolute values of coefficients. If the gain is 1, the
range of data (max value - min value) is not changed. However, an
offset may be required to have the same range of original data (typ.
0-255). It is better to have the same range because clipping due to
over/underflows is avoided.
Received multiplexed description can be labeled in the following way:
D1 D2 D1' ..
D3 D4 D3'
D1" D2" D1"'
:
The inverse system is:
A = +1 D1
B = -1/2 D1 +2 D2 -1/2 D1'
C = -1/2 D1 +2 D3 -1/2 D1"
D = +1/4 D1 -1 D2 +1/4 D1' ...
-1 D3 +4 D4 -1 D3' ...
+1/4 D1" -1 D2" +1/4 D1"'
Corresponding SDP attributes for reconstruction are:
a=X-mdclc-group: lp0 D1 D2 D3 D4
a=X-mdclc-post: lp0 E= fir([1/4,-1/2,1/4; -1/2,+1,-1/2;
1/4,-1/2,1/4], up(2,0,XY, D1)) + fir([-1,0,0; 2,0,0; -1,0,0],
up(2,0,XY,D2)) + fir([-1,2,-1; 0,0,0; 0,0,0], up(2,0,XY,D3)) +
fir([4,0,0; 0,0,0; 0,0,0], up(2,0,XY,D4));
4.3. Frame-expanded Multiple Description
Frame expansion is a way to expand the original data so that some
controlled redundancy is added. In literature, frames expansion has
been used with quantized filter banks. Here frames expansion is used
with downsampled filter banks.
In this example: 2 descriptions can be generated by separating odd
and even lines as for 2MD; the 3rd description is simply the average
of odd and even lines. The short name of this scheme is 3MD. It is
clear that perfect reconstruction (except for quantization noise) is
achieved if any 2 descriptions out of 3 are correctly received. The
A. Vitali Standards Track [Page 15]
RFC PROPOSED standard-compatible MDC/LC May 2005
3MD system can be seen as equivalent to a FEC code with rate 2/3: one
single erasure can be recovered.
m=video 49170 RTP/AVP 97 98 99
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= dn(2,0,Y,S)
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D2= dn(2,1,Y,S)
a=rtpmap:99 H264/90000
a=fmtp:99 ...
a=X-mdclc-tag: 99 D3
a=X-mdclc-pre: D3= dn(2,0,Y, fir([0.5;0.5], S));
Note that the quantity of data to be encoded is 3*1/2.
It must be appreciated that if there is quantization noise, its power
on the reconstruction is reduced when all three descriptions are
received. Unlike FEC, in case of better than expected channel, there
is an improvement in received quality.
The filter bank made by two all-pass and one lowpass filter can be
seen as an encoding matrix, where two data (an odd line A and the
following even line B) are encoded into three descriptions (D1, D2
and D3):
D1 = +1.0 A
D2 = +1.0 B
D3 = +0.5 A +0.5 B
Every 2x2 submatrix is invertible. Hence when two descriptions are
received, the original data is reconstructed except for quantization
noise using the corresponding inverse matrix. When all three
description the pseudo-inverse matrix can be used:
A = +0.833 D1 -0.166 D2 +0.333 D3
B = -0.166 D1 +0.833 D2 +0.333 D3
The range of the quantization noise from D1/D2/D3 to A/B can be found
by summing the coefficient along the corresponding row. The power of
the quantization noise can be found by summing squared coefficients.
The redundancy can be controlled easily by quantizing more heavily
the third downsampled description. This can be signalled via the
"fmtp" attribute in SDP. When all three descriptions are received,
A. Vitali Standards Track [Page 16]
RFC PROPOSED standard-compatible MDC/LC May 2005
the smart receiver MDC/LC-aware MAY decide not to use it, as it has a
lower quality.
Another example: 4 descriptions can be generated by separating odd
and even lines and taking every other pixel as for 4MD; the 5th
descriptions can be generated by averaging separated pixels (i.e.
averaging descriptions). The short name of this scheme is 5MD.
Note that the quantity of data to be encoded is 5*1/4.
4.4. Unbalanced Multiple Description (UMD)
In this example one descriptions corresponds to the original, the
other description is simply a downsampled version having 1/4th the
size of the original. The second description is more important and
should be more protected than the first. It is to be used in case of
losses to enhance the concealment.
m=video 49170 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 D1
a=X-mdclc-pre: D1= S;
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 D2
a=X-mdclc-pre: D2= dn(2,0,XY, S);
Note that the quantity of data to be encoded is 1+1/4.
The redundancy can be controlled easily by quantizing more heavily
the second downsampled description. The protection level of the
second description can be increased by simply increasing the intra
refresh rate. This is useful because the second description is used
only for the concealment of the first.
When both descriptions are received, the second is discarded. When
only the second description is received, it is upsampled by a linear
interpolation filter.
a=X-mdclc-group: lp01 D1
a=X-mdclc-post: lp01 E= D1;
a=X-mdclc-group: lp2 D2
a=X-mdclc-post: lp2 E= fir([0.0625,0.125,0.0625; 0.125,0.25,0.125;
0.0625,0.125,0.0625], up(2,0,XY, D2));
a=X-mdclc-group: lp3
a=X-mdclc-post: lp3 E= 128;
A. Vitali Standards Track [Page 17]
RFC PROPOSED standard-compatible MDC/LC May 2005
4.5. Classical Layered Coding (LC)
Two layers are created as follows: the original data is downsampled
to 1/4th the original and encoded. This is the base layer. The base
layer is decoded, upsampled and subtracted from the original data,
generating what can be seen as a prediction error to be encoded.
This is the enhancement layer.
m=video 49170 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 L1
a=X-mdclc-pre: L1= dn(2,0,XY, L1)
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 L2
a=X-mdclc-pre: L2= S - fir([0.0625,0.125,0.0625; 0.125,0.25,0.125;
0.0625,0.125,0.0625], up(2,0,XY, L1_Q));
Note that the quantity of data to be encoded is 1+1/4.
Three layers are created as follows: the original data is downsampled
to 1/16th the original and encoded. This is the base layer. The
base layer is decoded, upsampled and subtracted from the original
downsampled to 1/4th. This is the first enhancement layer. The
first enhancement layer is decoded, upsampled and subtracted from the
original. This is the second enhancement layer.
m=video 49170 RTP/AVP 97 98 99
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 L1
a=X-mdclc-pre: L1= dn(4,0,XY, S)
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 L2
a=X-mdclc-pre: L2= dn(2,0,XY, S) - fir([0.0625,0.125,0.0625;
0.125,0.25,0.125; .0625,0.125,0.0625], up=(2,0,XY, L1_Q));
a=rtpmap:99 H264/90000
a=fmtp:99 ...
a=X-mdclc-tag: 99 L3
a=X-mdclc-pre: L3= dn(1,0,XY, S) - fir([0.0625,0.125,0.0625;
0.125,0.25,0.125; 0.0625,0.125,0.0625], up=(2,0,XY, L2_Q));
Note that the quantity of data to be encoded is 1+1/4+1/16.
A. Vitali Standards Track [Page 18]
RFC PROPOSED standard-compatible MDC/LC May 2005
4.6. Haar-wavelet Layered Coding
Classical layered encoding suffers from overhead. Wavelet coding
does not as enhancement data is downsampled critically.
Two layers are created using the 1D vertical Haar wavelet. The base
layer is the average of odd and even lines. The enhancement layer is
the difference between odd and even lines.
m=video 49170 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 L1
a=X-mdclc-pre: L1= dn(2,0,Y, fir([+0.5;+0.5], S));
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 L2
a=X-mdclc-pre: L2= dn(2,0,Y, fir([+0.5;-0.5], S));
Note that the quantity of data to be encoded is 2*1/2.
Four layers are created using the 2D Haar wavelet. The base layer is
the average of odd and even lines and columns. The other three
enhancement layers are the horizontal, vertical and diagonal
difference with respect to the base layer.
m=video 49170 RTP/AVP 97 98 99 100
a=rtpmap:97 H264/90000
a=fmtp:97 ...
a=X-mdclc-tag: 97 L1
a=X-mdclc-pre: L1= dn(2,0,XY, fir([+0.25 +0.25;+0.25 +0.25], S));
a=rtpmap:98 H264/90000
a=fmtp:98 ...
a=X-mdclc-tag: 98 L2
a=X-mdclc-pre: L2= dn(2,0,XY, fir([+0.25 -0.25;+0.25 -0.25], S));
a=rtpmap:99 H264/90000
a=fmtp:99 ...
a=X-mdclc-tag: 99 L3
a=X-mdclc-pre: L3= dn(2,0,XY, fir([+0.25 +0.25;-0.25 -0.25], S));
a=rtpmap:100 H264/90000
a=fmtp:100 ...
a=X-mdclc-tag:100 L4
a=X-mdclc-pre: L4= dn(2,0,XY, fir([+0.25 -0.25;-0.25 +0.25], S));
Note that the quantity of data to be encoded is 4*1/4.
A. Vitali Standards Track [Page 19]
RFC PROPOSED standard-compatible MDC/LC May 2005
5. Packetization
5.1. Synchronization
The post-processor need to merge the same portion of the data in
order to produce the correct result. Syncronization of
descriptions/layers is therefore critical and can be accomplished in
one of the following ways.
Timestamp syncronization. Data to be merged can be sent in packets
having the same timestamp. This is possible if the sampling clock is
the same.
Sequence Number syncronization. Data to be merged can be sent in
packets having the same sequence number. This is possible if each
packet contain the same portion of the data. This means that packets
may have variable length.
Payload syncronization. Data to be merged can be identified looking
at the payload.
5.2. Multiplexing, interleaving
Descriptions should be offseted as much as possible when streams are
multiplexed. In this way a burst of losses does not cause the loss
of the same portion of data in all descriptions at the same time.
+--------------------------------------> time
D1_Nth_frame...
| D2_Nth_frame...
|<------offset------>|
If interleaving is used, the same criterion is to be used:
descriptions are spaced as much as possible. In this way a burst of
losses does not cause the loss of the same portion of data in all
descriptions at the same time.
+--------------------------------------> time
D1_Nth_frame...D2_Nth_frame...|D1_N+1th_frame...
| |
|<-----interleaver depth----->|
6. Security Considerations
Using one of proposed SDP parameters, an entity that managed to
modify the session descriptions exchanged between the partecipants to
establish a multimedia session could impede partecipants the correct
decoding/merge process.
A. Vitali Standards Track [Page 20]
RFC PROPOSED standard-compatible MDC/LC May 2005
The attacker can modify datagrams, particularly the RTP header which
is used for syncronization, to impede the correct decoding/merge
process.
The attacker can insert pathological expression into the SDP "X-
mdclc-pre:" or "X-mdclc-post:" that are complex to decode and that
cause the post-processor in the receiver to be overloaded.
Integrity mechanism provided by protocols used to exchange session
descriptions and media encryption can be used to prevent this attack.
7. Congestion control and bandwidth management
Congestion control for RTP SHALL be used in accordance with RFC 3550
[4], and with any applicable RTP profile; e.g., RFC 3551 [16].
An additional requirement if best-effort service is being used is:
users of this payload format MUST monitor packet loss to ensure that
the packet loss rate is within acceptable parameters. Packet loss is
considered acceptable if a TCP flow across the same network path, and
experiencing the same network conditions, would achieve an average
throughput, measured on a reasonable timescale, that is not less than
the RTP flow is achieving. This condition can be satisfied by
implementing congestion control mechanisms to adapt the transmission
rate (or the number of descriptions/layers subscribed), or by
arranging for a receiver to leave the session if the loss rate is
unacceptably high.
8. IANA Consideration
This document defines several unregistered SDP attributes.
IANA has not registered MIME types.
9. Informative appendix: rationale for std-compatible MDC/LC
In the following paragraph the rationale for video MDC is presented.
Neither video LC nor audio MDC/LC are covered.
9.1. Rationale for video MDC
There are many techniques to create multiple descriptions: MDC
quantization, correlating transforms and filters, quantized frames or
redundant bases, FEC combined with layered coding, spatial or
temporal polyphase downsampling.
Many of these schemes can be adapted to existing video codecs which
are based on prediction, transform, quantization and entropy coding:
A. Vitali Standards Track [Page 21]
RFC PROPOSED standard-compatible MDC/LC May 2005
it is possible to create descriptions in the pixel domain, in the
error-prediction domain or in the transform domain.
9.2. MDC in the error-prediction or in the transform domain
Working in the error-prediction domain or in the transform domain
yields very efficient but complex schemes. If not all descriptions
are received correctly, the prediction drifts because the frame
memory in the decoder will not be the same as the one used in the
encoder. To solve this problem, prediction may be removed, but this
greatly reduce the video compression capability of the codec. A
solution is to send a drift compensation term together with each
description, but if there are more than two descriptions, the number
of drift compensation terms increases dramatically, again reducing
efficiency. Because of this, we restricted our attention to schemes
that work in the pixel domain.
9.3. MDC in the pixel domain
Working in the pixel domain has the advantage that MDC can be
completely decoupled from the underlying video codec: descriptions
can be created in a pre-processing stage before compression;
successfully received descriptions can be merged in a post-processing
stage after decompression. Spatial and temporal descriptions can be
created by using Polyphase Downsampling (PDMD), programmable lowpass
filters controls redundancy; SNR descriptions can be created by means
of MDC quantizers (either scalar or vector), the structure of
quantizers controls redundancy.
Replicated headers/syntax and replicated motion vectors among
bitstreams greatly impede coding efficiency in SNR MDC. Replicated
headers/syntax also hinder temporal MDC, and motion compensation is
less effective because of the increased temporal distance between
frames. Spatial MDC is hindered by headers/syntax as well but unlike
with temporal MDC, motion compensation is affected to a smaller
extent, particularly when 8x8 blocks are split into smaller blocks,
as in the latest H.264 codec.
According to our experience, spatial PDMD is preferable over temporal
PDMD: at very low bitrates, when temporal PDMD is used, there is an
annoying flashing in the decoded sequence due to independent
compression of descriptions; instead, when spatial PDMD is used,
there is a kind of dithering due to the same reason. Dithering is
not annoying, it can be easily detected and eliminated, and it can
even improve the perceived quality.
A. Vitali Standards Track [Page 22]
RFC PROPOSED standard-compatible MDC/LC May 2005
10. References
10.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[2] Crocker & Overell, "Augmented BNF for Syntax Specifications:
ABNF", RFC 2234, November 1997.
[3] Handley, M. and V. Jacobson, "SDP: Session Description Protocol",
RFC 2327, April 1998.
[4] Camarillo et al., "Grouping of Media Lines in the Session
Description Protocol (SDP)", RFC 3388, December 2002.
[5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
Session Description Protocol (SDP)", RFC 3264, June 2002.
[6] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", STD 64,
RFC 3550, July 2003.
10.2. Informative References
[7] John G. Apostolopoulos "Video Streaming: Concepts, Algorithms and
Systems" HP Laboratories, report HPL-2002-260, September 2002.
[8] Vivek K. Goyal "Multiple Description Coding: Compression Meets
the Network" IEEE Signal Processing Magazine, September 2001.
[9] Jelena Kovacevic, Vivek K. Goyal, "Multiple Descriptions -
Source-Channel Coding Methods for Communications", Bell Labs,
Innovation for Lucent Technologies, 1998.
[10] R. Singh, A. Ortega, L. Perret, and W. Jiang, "Comparison of
Multiple Description Coding and Layered Coding based on Network
Simulations," in Proc. SPIE Image Video Proc., San Jose, CA, Jan.
2000, pp. 929-939.
[11] A. Vitali, F. Rovati, R. Rinaldo, R. Bernardini, and M. Durigon,
"Video Streaming over Lossy/Variable Bandwidth Networks by means
of Multiple Description," in Proceedings of MMSP 2004, Siena,
Italy, pp. 498--501, IEEE, September 2004.
[12] R. Bernardini, R. Rinaldo, A. Tonello, and A. Vitali, "Frame
based Multiple Description for Multimedia Transmission over
Wireless Networks," in Proceedings of WPMC 2004, Abano Terme,
A. Vitali Standards Track [Page 23]
RFC PROPOSED standard-compatible MDC/LC May 2005
Italy, July 2004.
[13] R. Bernardini, M. Durigon, R. Rinaldo, L. Celetto, and A.
Vitali, "Polyphase Spatial Subsampling Multiple Description
Coding of Video Streams with H264," in Proceedings of ICIP 2004,
Singapore, pp. 3213-3216, October 2004.
[14] N. Franchi, M. Fumagalli, R. Lancini, S. Tubaro, "Multiple
Description Video Coding for Scalable and Robust Transmission
over IP" IEEE Transactions on CSVT, vol. 15, no. 3, pp. 321-334,
March 2005.
[15] R. Bernardini, L. Celetto, R. Rinaldo, A. Vitali, P. Zontone,
"Bit Allocation and Quantizer Optimization in Multiple
Description Coding with Oversampled Filterbanks" paper 1930, IEEE
International Conference on Image Processing, Genova, Italy,
September 2005.
Author's Address:
Andrea L. Vitali
STMicroelectronics
via C. Olivetti 2
20041 Agrate Brianza (MI)
Italy
Phone: +39-039-603-7244
EMail: andrea.vitali@st.com
Marco Fumagalli
Cefriel - Politecnico di Milano
via R. Fucini, 2
20133 Milano
Phone: +39-02-2395-4208
Email: marco.fumagalli@cefriel.it
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
A. Vitali Standards Track [Page 24]
RFC PROPOSED standard-compatible MDC/LC May 2005
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property Right
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
We wish to thank Nicola Franchi of CEFRIEL - Politecnico di Milano
(ICT Center of Excellence For Research, Innovation, Education and
industrial Labs partnership) for their help in defining this
standard-compatible framework.
We also wish to thank Roberto Rinaldo and Riccardo Bernardini of
DIEGM - Universita` degli Studi di Udine for their work in the field.
A. Vitali Standards Track [Page 25]