Internet DRAFT - draft-cho-netvc-applypvq
draft-cho-netvc-applypvq
NETVC (Internet Video Codec) Y. Cho
Internet-Draft Mozilla Corporation
Intended status: Informational November 14, 2016
Expires: May 18, 2017
Applying PVQ Outside Daala
draft-cho-netvc-applypvq-03
Abstract
This document describes the Perceptual Vector Quantization (PVQ)
outside of the Daala video codec, where PVQ was originally developed.
It discusses the issues arising while integrating PVQ into a
traditional video codec, AV1.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 18, 2017.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Cho Expires May 18, 2017 [Page 1]
Internet-Draft Applying PVQ Outside Daala November 2016
Table of Contents
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Integration of PVQ into non-Daala codec, AV1 . . . . . . . . 3
2.1. Signaling Skip for Partition and Transform Block . . . . 4
2.2. Issues . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Performance of PVQ in AV1 . . . . . . . . . . . . . . . . . . 5
3.1. Coding Gain . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Development Repository . . . . . . . . . . . . . . . . . . . 8
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Informative References . . . . . . . . . . . . . . . . . 9
8.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10
1. Background
Perceptual Vector Quantization (PVQ)
[Perceptual-VQ][I-D.valin-netvc-pvq] has been proposed as a
quantization and coefficient coding tool for an internet video codec.
PVQ was originally developed for the Daala video codec [1]
[PVQ-demo], which does a gain-shape coding of transform coefficients
instead of more traditional scalar quantization. (The original
abbreviation of PVQ, "Pyramid Vector Quantizer", as in
[I-D.valin-netvc-pvq] is now commonly expanded as "Perceptual Vector
Quantization".)
The most distinguishing idea of PVQ is the way it references a
predictor. With PVQ, we do not subtract the predictor from the input
to produce a residual, which is then transformed and coded. Both the
predictor and the input are transformed into the frequency domain.
Then, PVQ applies a reflection to both the predictor and the input
such that the prediction vector lies on one of the coordinate axes,
and codes the angle between them. By not subtracting the predictor
from the input, the gain of the predictor can be preserved and is
explicitly coded, which is one of the benefits of PVQ. Since DC is
not quantized by PVQ, the gain can be viewed as the amount of
contrast in an image, which is an important perceptual parameter.
Also, an input block of transform coefficients is split into
frequency bands based on their spatial orientation and scale. Then,
each band is quantized by PVQ separately. The 'gain' of a band
indicates the amount of contrast in the corresponding orientation and
scale. It is simply the L2 norm of the band. The gain is non-
linearly companded and then scalar quantized and coded. The
Cho Expires May 18, 2017 [Page 2]
Internet-Draft Applying PVQ Outside Daala November 2016
remaining information in the band, the 'shape', is then defined as a
point on the surface of a unit hypersphere.
Another benefit of PVQ is activity masking based on the gain, which
automatically controls the quantization resolution based on the image
contrast without any signaling. For example, for a smooth image area
(i.e. low contrast thus low gain), the resolution of quantization
will increase, thus fewer quantization errors will be shown. A
succinct summary on the benefits of PVQ can be found in the
Section 2.4 of [Terriberry_16].
Since PVQ has only been used in the Daala video codec, which contains
many non-traditional design elements, there has not been any chance
to see the relative coding performance of PVQ compared to scalar
quantization in a more traditional codec design. We have tried to
apply PVQ in the AV1 video codec, which is currently being developed
by Alliance for Open Media (AOM) as an open source and royalty-free
video codec. While most of benefits of using PVQ arise from the
enhancement of subjective quality of video, compression results with
activity masking enabled are not available yet in this draft because
the required parameters, which were set for Daala, have not been
adjusted to AV1 yet. These results were achieved optimizing solely
for PSNR.
2. Integration of PVQ into non-Daala codec, AV1
Adopting PVQ in AV1 requires replacing both the scalar quantization
step and the coefficient coding of AV1 with those of PVQ. In terms
of inputs to PVQ and the usage of transforms as shown in Figure 1 and
Figure 2, the biggest conceptual changes required in a traditional
coding system, such as AV1, are
o Introduction of a transformed predictor both in encoder and
decoder. For this, we apply a forward transform to the
predictors, both intra-predicted pixels and inter-predicted (i.e.,
motion-compensated) pixels. This is because PVQ references the
predictor in the transform domain, instead of using a pixel-domain
residual as in traditional scalar quantization.
o Absence of a difference signal (i.e. residue) defined as "input
source - predictor". Hence AV1 with PVQ does not do any
'subtraction' in order for an input to reference the predictor.
Instead, PVQ takes a different approach to referencing the
predictor which happens in the transform domain.
Cho Expires May 18, 2017 [Page 3]
Internet-Draft Applying PVQ Outside Daala November 2016
input X --> +-------------+ +-------------+
| Subtraction | --> residue --> | Transform T |
predictor --> +-------------+ signal R +-------------+
P |
| v
v T(R)
[+]--> decoded X |
^ |
| v
| +-----------+ +-----------+ +-----------+
decoded <-- | Inverse | <--| Inverse | <-- | Scalar |
R | Transform | | Quantizer | | | Quantizer |
+-----------+ +-----------+ | +-----------+
v
+-------------+
bitstream <--| Coefficient |
of coded T(R) | Coder |
+-------------+
Figure 1: Traditional architecture containing Quantization and
Transforms
+-------------+ +-----------+
input X-->| Transform T |--> T(X)--> | PVQ |
|_____________| | Quantizer | +-------------+
+----> +-----------+ | PVQ |
+-------------+ | |------> | Coefficient |
predictor-->| Transform T |--> T(P) v | Coder |
P |_____________| | +-----------+ +-------------+
| | PVQ | |
+----> | Inverse | v
| Quantizer | bitstream
+-----------+ of coded T(X)
|
+-----------+ v
decoded X <--| Inverse | <--------- dequantized T(X)
| Transform |
+-----------+
Figure 2: AV1 with PVQ
2.1. Signaling Skip for Partition and Transform Block
In AV1, a skip flag for a partition block is true if all of the
quantized coefficients in the partition are zeros. The signaling for
the prediction mode in a partition cannot be skipped. If the skip
flag is true with PVQ, the predicted pixels are the final decoded
Cho Expires May 18, 2017 [Page 4]
Internet-Draft Applying PVQ Outside Daala November 2016
pixels (plus frame wise in-loop filtering such as deblocking) as in
AV1 then a forward transform of a predictor is not required.
While AV1 currently defines only one 'skip' flag for each 'partition'
(a unit where prediction is done), PVQ introduces another kind of
'skip' flag, called 'ac_dc_coded', which is defined for each
transform block (and thus for each Y'CbCr plane as well). AV1 allows
that a transform size can be smaller than a partition size which
leads to partitions that can have multiple transform blocks. The
ac_dc_coded flag signals whether DC and/or whole AC coefficients are
coded by PVQ or not (PVQ does not quantize DC itself though).
2.2. Issues
o PVQ has its own rate-distortion optimization (RDO) that differs
from that of traditional scalar quantization. This leads the
balance of quality between luma and chroma to be different from
that of scalar quantization. When scalar quantization of AV1 is
done for a block of coefficients, RDO, such as trellis coding, can
be optionally performed. The second pass of 2-pass encoding in
AV1 currently uses trellis coding. When doing so it appears a
different scaling factor is applied for each of Y'CbCr channels.
o In AV1, to optimize speed, there are inverse transforms that can
skip applying certain 1D basis functions based on the distribution
of quantized coefficients. However, this is mostly not possible
with PVQ since the inverse transform is applied directly to a
dequantized input, instead of a dequantized difference (i.e. input
source - predictor) as in traditional video codec. This is true
for both encoder and decoder.
o PVQ was originally designed for the 2D DCT, while AV1 also uses a
hybrid 2D transform consisting of a 1D DCT and a 1D ADST. This
requires PVQ to have new coefficient scanning orders for the two
new 2D transforms, DCT-ADST and ADST-DCT (ADST-ADST uses the same
scan order as for DCT-DCT). Those new scan orders have been
produced based on that of AV1, for each PVQ-defined-band of new 2D
transforms.
3. Performance of PVQ in AV1
3.1. Coding Gain
With the encoding options specified by both NETVC ([2]) and AOM
testing for the high latency case, PVQ gives similar coding
efficiency to that of AV1, which is measured in PSNR BD-rate. Again,
PVQ's activity masking is not turned on for this testing. Also,
Cho Expires May 18, 2017 [Page 5]
Internet-Draft Applying PVQ Outside Daala November 2016
scalar quantization has matured over decades, while video coding with
PVQ is much more recent.
We compare the coding efficiency for the IETF test sequence set
"objective-1-fast" defined in [3], which consists of sixteen of
1080p, seven of 720p, and seven of 640x360 sequences of various types
of content, including slow/high motion of people and objects,
animation, computer games and screen casting. The encoding is done
for the first 30 frames of each sequence. The encoding options used
is : "-end-usage=q -cq-level=x --passes=2 --good --cpu-used=0 --auto-
alt-ref=2 --lag-in-frames=25 --limit=30", which is official test
condition of IETF and AOM for high latency encoding except limiting
30 frames only.
For comparison reasons, some of the lambda values used in RDO are
adjusted to match the balance of luma and chroma quality of the PVQ-
enabled AV1 to that of current AV1.
o Use half the value of lambda during intra prediction for the
chroma channels.
o Scale PVQ's lambda by 0.9 for the chroma channels.
o Do not do RDO of DC for the chroma channels.
The results are shown in Table 1, which is the BD-Rate change for
several image quality metrics. (The encoders used to generate this
result are available from the author's git repository [4] and AOM's
repository [5].)
+-----------+----------------------+
| Metric | AV1 --> AV1 with PVQ |
+-----------+----------------------+
| PSNR | -0.17% |
| | |
| PSNR-HVS | 0.27% |
| | |
| SSIM | 0.93% |
| | |
| MS-SSIM | 0.14% |
| | |
| CIEDE2000 | -0.28% |
+-----------+----------------------+
Table 1: Coding Gain by PVQ in AV1
Cho Expires May 18, 2017 [Page 6]
Internet-Draft Applying PVQ Outside Daala November 2016
3.2. Speed
Total encoding time increases roughly 20 times or more when intensive
RDO options, such as "--passes=2 --good --cpu-used=0 --auto-alt-ref=2
--lag-in-frames=25", are enabled. The significant increase in
encoding time is due to the increase of computation by the PVQ. The
PVQ tries to find asymptotically-optimal codepoints (in RD
optimization sense) on a hypersphere with a greedy search, which has
time complexity close to O(n*n) for n coefficients. Meanwhile,
scalar quantization has time complexity of O(n).
Compared to Daala, the search space for a RDO decision in AV1 is far
larger because AV1 considers ten intra prediction modes and four
different transforms (for the transform block sizes 4x4, 8x8, and
16x16 only), and the transform block size can be smaller than the
prediction block size. Since the largest transform and the
prediction sizes are currently 32x32 and 64x64 in AV1, PVQ can be
called approximately 5,160 times more in AV1 than in Daala. Also,
AV1 applies transform and quantization for each candidate of RDO.
As an example, AV1 calls the PVQ function 632,520 times to encode the
grandma_qcif (176x144) clip in intra frame mode while Daala calls
3843 times only (for QP = 30 and 39 for AV1 and daala respectively,
which corresponds to actual quantizer used for quantization being
38). So, PVQ was called 165 times more in AV1 than Daala.
Table 2 shows the frequency of function calls to PVQ and scalar
quantizers in AV1 at each speed level (where AV1 encoding mode is
'good') for the same sequence and the QP as used in the above
example. The first column indicates speed level, the second column
shows the number of calls to PVQ's search inside each band (function
pvq_search_rdo_double() in [6]), the third column shows the number of
calls to PVQ quantization of a transform block (function
od_pvq_encode() in [7]), and the fourth column shows the number of
calls to AV1's block quantizer. Smaller speed level gives slower
encoding but better quality for the same rate by doing more RDO
optimizations. The major difference from speed level 4 to 3 is
enabling a use of the transform block smaller than the prediction
(i.e. partition) block.
Cho Expires May 18, 2017 [Page 7]
Internet-Draft Applying PVQ Outside Daala November 2016
+--------+-----------------+-----------------+----------------------+
| Speed | # of calls to | # of calls to | # of calls to PVQ |
| Level | AV1 quantizer | PVQ quantizer | search inside a band |
+--------+-----------------+-----------------+----------------------+
| 5 | 28,028 | 26,786 | 365,913 |
| | | | |
| 4 | 57,445 | 56,980 | 472,222 |
| | | | |
| 3 | 505,039 | 564,724 | 3,680,366 |
| | | | |
| 2 | 505,039 | 564,724 | 3,680,366 |
| | | | |
| 1 | 535,100 | 580,566 | 3,990,327 |
| | | | |
| 0 | 589,931 | 632,520 | 4,109,113 |
+--------+-----------------+-----------------+----------------------+
Table 2: Comparison of Frequency of Calls to PVQ and Scalar
Quantizers in AV1
4. Future Work
Possible future work includes:
o Enable activity masking, which also needs a HVS-tuned quantization
matrix (bandwise QP scalars).
o Adjust the balance between luma and chroma qualities, tuning for
subjective quality.
o Optimize the speed of the PVQ code, adding SIMD.
o RDO with more model-driven decision making, instead of full
transform + quantization.
5. Development Repository
The ongoing work of integrating PVQ into AV1 video codec is located
at the git repository [8].
6. Acknowledgements
Thanks to Tim Terriberry for his proofreading and valuable comments.
Also thanks to Guillaume Martres for his contributions to integrating
PVQ into AV1 during his internship at Mozilla and Thomas Daede for
providing and maintaining the testing infrastructure by way of the
www.arewecompressedyet.com (AWCY) website. [9].
Cho Expires May 18, 2017 [Page 8]
Internet-Draft Applying PVQ Outside Daala November 2016
7. IANA Considerations
This memo includes no request to IANA.
8. References
8.1. Informative References
[I-D.valin-netvc-pvq]
Valin, J., "Pyramid Vector Quantization for Video Coding",
draft-valin-netvc-pvq-00 (work in progress), June 2015.
[Perceptual-VQ]
Valin, JM. and TB. Terriberry, "Perceptual Vector
Quantization for Video Coding", Proceedings of SPIE Visual
Information Processing and Communication , February 2015,
<https://arxiv.org/pdf/1602.05209v1.pdf>.
[PVQ-demo]
Valin, JM., "Daala: Perceptual Vector Quantization (PVQ)",
November 2014, <https://people.xiph.org/~jm/daala/
pvq_demo/>.
[Terriberry_16]
Terriberry, TB., "Perceptually-Driven Video Coding with
the Daala Video Codec", Proceedings SPIE Volume 9971,
Applications of Digital Image Processing XXXIX , September
2016, <https://arxiv.org/pdf/1610.02488.pdf>.
8.2. URIs
[1] https://xiph.org/daala/
[2] https://tools.ietf.org/html/draft-ietf-netvc-testing-03
[3] https://tools.ietf.org/html/draft-ietf-netvc-testing-03
[4] https://github.com/ycho/aom/
commit/2478029a9b6d02ee2ccc9dbafe7809b5ef345814
[5] https://aomedia.googlesource.com/
aom/+/59848c5c797ddb6051e88b283353c7562d3a2c24
[6] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b
c79cf04f/av1/encoder/pvq_encoder.c#L84
[7] https://github.com/ycho/aom/blob/14981eebb4a08f74182cea3c17f7361b
c79cf04f/av1/encoder/pvq_encoder.c#L763
Cho Expires May 18, 2017 [Page 9]
Internet-Draft Applying PVQ Outside Daala November 2016
[8] https://github.com/ycho/aom/tree/av1_pvq
[9] https://arewecompressedyet.com/
Author's Address
Yushin Cho
Mozilla Corporation
331 E. Evelyn Avenue
Mountain View, CA 94041
USA
Phone: +1 650 903 0800
Email: ycho@mozilla.com
Cho Expires May 18, 2017 [Page 10]