Internet DRAFT - draft-davies-netvc-irfvc
draft-davies-netvc-irfvc
Network Working Group T. Davies
Internet-Draft Cisco
Intended status: Standards Track October 19, 2015
Expires: January 7, 2016
Interpolated reference frames for video coding
draft-davies-netvc-irfvc-00
Abstract
This document describes the use of interpolated reference frames in
video coding in general, and in the Thor video codec in particular.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 7, 2016.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Davies Expires January 7, 2016 [Page 1]
Internet-Draft IRFVC October 2015
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
3. The interpolation process . . . . . . . . . . . . . . . . . . 3
3.1. Interpolation framework . . . . . . . . . . . . . . . . . 3
3.2. Motion estimation process . . . . . . . . . . . . . . . . 4
3.3. Complexity considerations . . . . . . . . . . . . . . . . 5
4. Coding using interpolated reference frames . . . . . . . . . 6
5. Compression performance . . . . . . . . . . . . . . . . . . . 6
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8
9. Normative References . . . . . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
This document describes a method of generating synthetic reference
frames for video coding using a simplified frame interpolation
method. The aim is to create a reference frame that is temporally
co-located with the current frame being predicted, leveraging the
motion information already present in the previously-coded frames,
and removing the need for techniques such as motion vector scaling
in motion vector prediction.
Since the decoder will have to generate the same interpolated
reference frame as the encoder, complexity considerations are a
paramount concern. The interpolation process is therefore a highly
simplified block-matching algorithm and uses only pixel-accurate
motion vectors, for example. Worst-case complexity can be managed by
controlling the number of matches per block, per region and per
frame as well as the total vertical excursion to manage memory
bandwidth.
The method gives most gain in Thor at high quantisation (QP) levels
i.e. low bitrates. Overall, Bjontegaard delta-rate (BDR) reductions
across QP ranges 22-37 are on average 5.2% for a range of HD test
sequences. For higher QP (32-44) the reductions gains are larger:
8.8% on average.
Interpolated reference frames are enabled by default in the high
complexity random access (RA) and High Delay B (HDB) configurations
in the Thor repository github.com/cisco/thor.
Davies Expires January 7, 2016 [Page 2]
Internet-Draft IRFVC October 2015
Section 3 describes the interpolation process, which is based on
a simplified hierarchical motion estimation (HME). Section 4
describes the modifications to the Thor syntax coding processes.
Section 5 provides details of compression performance.
2. Definitions
2.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2.2. Terminology
This document frequently uses the following terms.
MV: Motion vector - a horizontal and vertical vector displacement
(x,y)
ME: Motion Estimation
HME: Hierarchical ME
SAD: Sum of Absolute Differences. A metric defined for a pair of
equal dimension blocks of numerical vaules consisting of the sum
of the absolute differences of the corresponding values in each
location in the blocks
QP: quantisation parameter
BDR: Bjontegaard Delta-Rate
3. The interpolation process
3.1. Interpolation framework
Consider two frames R0 and R1 and a frame F equidistant in time
between them which is to be interpolated (Figure 1). Image data must
be created for each block in F by combining information from R0 and
R1 using a linear model for the block motion.
Davies Expires January 7, 2016 [Page 3]
Internet-Draft IRFVC October 2015
______________________________|_______|__________________________ R0
| /\ |
/
/
/ mv0
/
/
________________________|____/__|________________________________ F
| / |
/
/
/ mv1
/
/
__________________|___\/__|______________________________________ R1
| |
Figure 1: forward and backward motion pairs for a block
For each block in the frame F there is an associated motion vector
mv0 pointing at a displaced block in R0 and a corresponding motion
vector mv1 which is equal to -mv0 pointing at R1.
Where F is not equidistant from the reference frames the linear model
can simply be scaled appropriately.
If both blocks fall within the reference frames, then the
interpolated block is just the average of the two reference blocks.
At the edges of the frames one of the reference blocks may fall off
the edge - here the other reference only is used instead.
3.2. Motion estimation process
Since F does not exist the motion estimation process consists of
matching blocks B+mv0 in R0 with blocks B+mv1 in R1. A basic block
size of 8x8 is used but the bulk of the motion estimation is done for
16x16 blocks. For UHD resolutions, perhaps a larger basic block size
would be better. The overall approach is to use hierarchical motion
estimation (HME), as this is amenable to limiting both average and
worst-case complexity.
In the HME scheme each reference frame is down-scaled vertically and
horizontally by a factor 2, using a (1/2,1/2) filter. This is done
repeatedly to get a series R0(n) and R1(n) of reference frames. Then
motion estimation is done very simply on each resolution layer n, but
Davies Expires January 7, 2016 [Page 4]
Internet-Draft IRFVC October 2015
using candidates from next layer (n+1) as well as spatial neighbours.
The block sizes are the same at each layer, so each block at layer
n+1 corresponds to 4 blocks at layer n.
For each layer, the ME stages are as follows:
1. For each 16x16 block in raster order:
a. Check if ME can be bypassed.
b. If not bypassed, determine candidates from lower layer
blocks and from neighbour blocks in raster order
c. Perform an adaptive cross search around each candidate
vector and determine the best vector
2. For each 8x8 block in raster order, find the best merge
candidate, i.e. choose which MV to use: the original 16x16
block vector, or one of 4 neighbouring block 16x16 vectors
(above, below, left or right)
The majority of blocks bypass ME at step 1a. Here a skip candidate
is generated as:
skipmv = argmin{mvx in {mv0,mv1,mv2}: sum_{i=0}^{2} |mvx-mvi|}
where mv0,mv1,and mv2 are the motion vectors for blocks above, left
and above-right the current block. If the cost for this vector is
below a fixed value for each 8x8 sub-block, no further ME is done.
In step 1c, the ranges of the cross search are restricted to just 2
steps (max 8 matches) for each candidate, if the search is not at
the lowest resolution layer. This is because vector candidates from
the lower layer or from neighbours will already be highly accurate by
this point.
In step 1, the cost metric is a combination of luma SAD and a fixed
multiple of the sum of abolute motion vector difference between the
vector mvx and the four neighbours mv0,mv1,mv2,mv3 to the left,
right, above and above right, i.e.
sum_{i=0}^{3} |mvx-mvi|
This helps make the motion estimation process less sensitive to noise
and spurious matches.
In step 2 the cost metric is SAD alone.
3.3. Complexity considerations
The ME process is not that sensitive to the selection of candidates,
Davies Expires January 7, 2016 [Page 5]
Internet-Draft IRFVC October 2015
at least in terms of the impact on coding performance. If the
interpolated frames are used directly this might not be so, but in
effect the interpolated blocks are only going to be used for
prediction if they are interpolated well: therefore effort refining
bad matches is generally wasted, so should be avoided.
This means that the ME process can be quite truncated. The only
candidates considered are up to three neighbour block candidates and
one from the layer below. The majority of motion estimation is
skipped, and so only requires a single match. For HW applications
the total number of matches would still require a hard limit, as
well as limits for the matches per block and possibly per region.
Vertical motion vector limits could also be imposed to reduce memory
bandwidth costs.
4. Coding using interpolated reference frames
In the Thor implementation, when an interpolated reference frame is
used it is inserted at the beginning of the reference pictures list
and is given the same frame number as the current frame. Typically
use of the interpolated reference frame causes a considerable
increase in uni-pred prediction, often with no residual to code, and
a reduction of bi-prediction modes. This changes the probability of
the various supermode values used in Thor. Therefore in such frames
it makes sense to modify the supermode coding to reflect this, and
this contributes a small amount to coding gains. Full details are in
[Fuld1].
5. Compression performance
Luma PSNR BDR percentage gains for standard QP ranges (22,27,32,37)
are given in Table 1. For high QP (32,36,40,44), the results are in
Table 2.
Davies Expires January 7, 2016 [Page 6]
Internet-Draft IRFVC October 2015
-------------------------------------------------------
1920x1080
-------------------------------------------------------
Kimono -3.5
ParkScene -3.1
Cactus -4.9
BasketballDrive -2.1
BQTerrace -1.9
ChangeSeats -5.8
HeaAndShoulder -6.6
TelePresence -6.6
WhiteBoard -7.5
-------------------------------------------------------
1280x720
-------------------------------------------------------
FourPeople -7.0
Johnny -6.2
KristenAndSara -7.0
-------------------------------------------------------
Average -5.2
Table 1: BDR reductions for standard QPs
-------------------------------------------------------
1920x1080
-------------------------------------------------------
Kimono -6.6
ParkScene -7.0
Cactus -8.9
BasketballDrive -5.5
BQTerrace -4.7
ChangeSeats -12.1
HeaAndShoulder -10.1
TelePresence -11.0
WhiteBoard -12.4
-------------------------------------------------------
1280x720
-------------------------------------------------------
FourPeople -9.1
Johnny -8.0
KristenAndSara -9.9
-------------------------------------------------------
Average -8.8
Table 2: BDR reductions for high QPs
Davies Expires January 7, 2016 [Page 7]
Internet-Draft IRFVC October 2015
6. IANA Considerations
This document has no IANA considerations.
7. Security Considerations
This document has no security considerations.
8. Acknowledgements
The author would like to thank Arild Fuldseth for assistance with
experimental investigations, and Mo Zanaty for reviewing this
document.
9. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[Fuld1] Fuldseth, A., Bjontegaard, G., Zanaty, M. "The Thor video
codec", draft-fuldseth-netvc-thor-01, October 2015.
Authors' Addresses
Thomas Davies
Cisco
Feltham
UK
Email: thdavies@cisco.com
Davies Expires January 7, 2016 [Page 8]
Internet-Draft IRFVC October 2015