The Vault

High Order Integer Transform Design for Video Compression
Research Paper / Jan 2012

> T-CSVT <



1



Abstract—Recent research shows that using a large-size block as the coding unit (CU) and increasing the motion-compensated
prediction (MCP) and transform sizes accordingly can significantly improve the rate-distortion (R-D) performance especially for high-

definition (HD) video coding. The trend of using large size transforms raises the interest in seeking high performance high order

integer transforms with low computation requirements. This paper presents a set of integer transforms with the sizes 4×4, 8×8, 16×16,

and 32×32. They have high energy compaction capability, almost equal L2 norms of the basis vectors, and are fully factorizable, which

guarantees the development of fast algorithms. Compared with the prior work, the proposed integer transforms have lower complexity

and more flexibility for implementation while providing comparable coding performance.


Index Terms—Integer transform, core transform, HEVC, video coding



I. INTRODUCTION

RANSFORM coding is a fundamental part in video compression systems. It de-correlates the samples to be quantized so that

the commonly used scalar quantization over individual values does not cause too much coding efficiency loss in comparison

with vector quantization. Transform coding also compacts the energy of the samples into a few transform coefficients to

facilitate efficient entropy coding.

The performance of a transform is usually evaluated by transform coding gain (TCG) [1], the maximum gain of transform

coding over Pulse-code Modulation (PCM) coding. TCG is determined by transform type, transform size, and the statistics of the

input signal, e.g., correlation. Regardless of input signal’s statistics, larger-size transform provides higher TCG, and Karhunen-

Loeve Transform (KLT) is the optimal transform, of which the basis vectors are derived based on the time- and space-variant

second order statistics of the input signal. Given the original video signal (or its residual by predictive coding) as the input,

where the correlation is relatively high, discrete cosine transform (DCT) [2] has basis vectors very close to KLT and is

considered as a fixed approximation of KLT; the ideal transform size should be the same as the frame size to fully exploit the

spatial correlation among pixels.

Usually, the gain of a real transform coder is smaller than the TCG of the transform it uses. TCG is achieved only if the

quantization and entropy coding are both optimal, which, however, is impossible for a real transform coder. In order to fully

demonstrate the performance that the transform can provide, we should not only pursue high TCG while determining the type and

size of the transform, but also consider the practical limitations of the transform coder, including the typical statistics of the input

signal, the design of quantization and entropy coding, the framework of the video coding system upon which the transform coder

is built, and the complexity restrictions due to mainstream hardware capability. How these limitations influence the transform

design is introduced below.

First, the prevailing video compression systems use block-based hybrid framework, which means each frame is divided into

non-overlapping blocks and each block is the unit where the hybrid predictive coding and transform coding apply. To enable

coding unit (CU) level rate-distortion (R-D) optimization and avoid encoding/decoding delay, the transform size should not be

larger than the CU size. Second, larger transforms generally suffer more from non-optimal entropy coding, as it is much more

difficult to accurately estimate the non-zero transform coefficients’ positions. However, larger transforms benefit the high and

ultra-high definition videos, of which the spatial correlation is very high and homogeneous regions are dominant [3]. Larger size

transforms compact the highly correlated signals into fewer transform coefficients distributed in the low frequency domain. The

positions of these transform coefficients are easily modeled and efficiently entropy coded. Third, larger transform has higher

computational complexity. 2-D order-2N transform requires four times of the arithmetic operations of 2-D order-N transform,

whereas the performance is not improved that much. Last but not the least, integer transforms are preferred to real-valued

transforms. Real-valued transform, such as DCT, is of high computational complexity and can cause the encoder and decoder

mismatch, even if implemented using high precision. Integer transform, such as Integer Cosine Transform (ICT) [4], provides

bit-exact implementation and significant complexity reduction, and, if well designed, achieves almost the same TCG as DCT. In


Manuscript received Sept. 20, 2012.
The authors are with the CTO office, InterDigital Communications LLC, San Diego, CA (e-mail: jie.dong@interdigital.com; yan.ye@interdigital.com).

High Order Integer Transform Design for

Video Compression

Jie Dong, Member, IEEE, and Yan Ye, Member, IEEE

T



> T-CSVT <



2

recent years, integer transform superseded DCT, and has been adopted in modern video coding standards, such as H.264/AVC

[5], VC-1 [6], and HEVC [7].

The core transform design for high efficiency video coding (HEVC) [7], the emerging video coding standard developed by the

Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T/SG16/Q.6/VCEG and ISO/IEC/MPEG, takes all the above

factors into consideration. HEVC is focused on coding high definition (HD) and ultra HD video, in which the area representing

certain content becomes larger than that in lower resolution video. Therefore, compared to existing video coding standards that

primarily use blocks of 16×16 pixels as coding units, HEVC enlarges the CU size up to 2
K
×2

K
(K is a positive integer set in the

sequence parameter set) to save the overhead, such as CU type and motion vector (MV), and scales the prediction and transform

sizes accordingly. The core transform design adopts 2-D order-16 and order-32 integer transforms mainly for smooth regions and

smaller order-4 and order-8 transforms mainly for detail-rich regions.

The trend of using large size transforms raises the interest of seeking high performance high order integer transforms with low

computation requirement with the following desirable features. First, to provide the performance as good as the sub-optimal

transform DCT, the basis vectors should resemble DCT’s basis vectors, including small DCT distortion [8] and orthogonality.

Second, the L
2
norms of the basis vectors are (almost) equal, in order to save the additional scaling process used to correct the

different norms of the basis vectors. Third, the transform matrix has an intrinsic symmetry structure, such that the multiplication

with the transform matrix can be fully factorized to enable fast algorithms.

Finding a high order integer transform satisfying all the conditions above is very challenging, and tremendous research effort

has been dedicated to this topic [3] [9]-[16]. Some work [3] [9]-[11] is more focused on good energy compaction capability, i.e.,

less DCT distortion, and derives the transform matrices by scaling the DCT matrix elements by a factor and then truncating the

real numbers to the nearby integers based on certain constraints, such as equal L
2
norms or orthogonality. This approach is

applicable for designing an arbitrary order integer transform, but cannot maintain the intrinsic symmetry structure in the DCT

matrix that guarantees the fast algorithm. Other work [3] [12]-[16] starts from the intrinsic symmetry structure to ensure full

factorization and fast algorithm; however the intrinsic symmetry structure can impose significant constraint when seeking the

basis vectors with high energy compaction capability. For example, with the proposed symmetry structures in [3], [12], and [13],

no integer transform solution with small DCT distortion can be found. The work in [16] is so far the closest to the ideal integer

transform that satisfies all the desired merits; however the cost is that the magnitudes of the transform matrix elements are very

large, i.e., represented by 14 bits, and the forward order-32 transform cannot be realized even by 32-bit integer arithmetic.

In this paper, we propose approaches to derive order-4, order-8, order-16, and order-32 integer transforms, which satisfy all

the desired merits except orthogonality. The symmetry structure of the transform matrix that guarantees the fast algorithm is first

developed. With this symmetry structure, the elements in the transform matrices are carefully selected, such that the basis vectors

have almost the same L
2
norms and very small DCT distortion, and the transform error introduced by non-orthogonality is

negligible. Four specific integer transforms with the sizes 4×4, 8×8, 16×16, and 32×32 are proposed as the core transform design

for HEVC. Compared with the core transform adopted in HEVC [11] and other contributions [15][16], the proposed integer

transforms have lower complexity and more flexibility for implementation, while providing comparable coding performance.

The remainder of the paper is organized as follows. Section II presents the general matrices of the proposed integer

transforms. In Section III, a specific set of integer transforms are designed based on these general matrices, and proposed as the

core transform for HEVC. Section IV reports the experimental results, followed by the conclusion in Section V.

II. GENERAL MATRICES OF THE PROPOSED INTEGER TRANSFORM

The DCT transform matrix can be fully factorized, which ensures the development of fast algorithm. Many factorization

algorithms have been proposed in the literature, which exploit different symmetry structures of the DCT matrix. Some

algorithms [19] are not every efficient in terms of arithmetic operations, but can be systematically generalized to any high order

DCT. Some algorithms require fewer computations, but are only applicable for certain order DCTs. For example, the fast

algorithm proposed in [18] is only for order-8 and order-16 DCTs. In this paper, the proposed integer transforms retain the

symmetric structures of DCT, such that they can be factorized in the same way. Like DCT, the proposed 2-D integer transform is

separable, which can be accomplished by applying two 1-D integer transforms sequentially. Therefore, in the following, we only

discuss the 1-D integer transform design.

A. Even-Odd Part Decomposition

Denote an order-2N DCT matrix as


( is a positive integer). The basis vectors of


have alternating even

and odd symmetries and thus can be decomposed to even and odd parts, where the even part is identical to


scaled by a

factor



, as shown in (1).



[











] [



] (1)



> T-CSVT <



3

In (1),


, an N×N matrix, is the odd part of


, is the order-N identity matrix and is the opposite diagonal identity

matrix, obtained by flipping the columns of horizontally. This property is kept in the integer transform design, as shown in
(2),

[



] [



] (2)

where is the transform matrix of the order-2N integer transform and its even and odd parts are denoted as and ,
respectively. To construct a larger integer transform in such a recursive process, , the initial 2×2 matrix, is defined as in (3).
Note that the scaling factor is used to ensure the L

2
norms of the basis vectors in are (almost) equal.

[



] (3)

Denote [ ] as the input signal to the order-2N transform and [ ] as the output. The
vector comprising of the even points of , denoted as [ ], is determined only by the even part ;
likewise, the vector comprising of the odd points, [ ], is determined only by the odd part.

As the even part of the transform matrix is constructed in such a recursive way, the problem of integer transform design is

narrowed down to designing the odd part. In order-4, order-8, order-16, and order-32 DCT matrices, the odd parts are

represented by (4) to (7), respectively, in which the values of the matrices’ elements are shown in TABLE I.


[




] (4)



[













] (5)




[





















































]









(6)





[














































































































































































































]


















(7)

TABLE I

VALUES OF THE ELEMENTS IN ORDER-4, ORDER-8, ORDER-16, AND ORDER-32 DCT MATRICES

[ ] [















]

[ ] [



































]

[ ] [


































……

















]

[ ] [



































……

















]





> T-CSVT <



4

In integer transform design, the general matrix of the odd part is borrowed from DCT with the matrix elements’ relative

magnitudes unchanged. A simple way to obtain the odd part of an integer transform is to scale the elements in the same order

DCT matrix by a factor, and truncate the scaled real numbers to the nearby integers based on certain constraints. However, this

approach cannot maintain the intrinsic symmetry structure in the DCT matrix that guarantees the fast algorithm. In Sections II.B,

II.C, and II.D, we propose approaches to deriving the odd parts of integer transforms from order-4 up to order-32, where DCT

matrix’s intrinsic symmetry structure is incorporated. Note that, the odd part of order-4 integer transform does not need further

factorization, since its size is 2×2, as show in (4).

B. Odd Part of Order-8 Integer Transform

The odd part of order-8 DCT, as shown in (5), can be re-written as in (8), only if the elements [ ] are defined as in
the second row of TABLE I [18], and (8) can be factorized to three sparser matrices. The equivalence of (5) and (8) does not

hold, when the real-valued numbers [ ] are replaced with integers. The reason is obvious because (5) becomes an
integer matrix, whereas the first and fourth rows of (8) still have real-valued elements. This means that the integer transform

matrix obtained by scaling [ ] by a factor and rounding cannot be factorized in the same way as DCT matrix.





[
















































]






(8)

In order to keep the factorizable structure in (8) and provide more freedom in searching good matrix elements that compose of

DCT-like basis vectors, we generalize (8), and propose the odd part of order-8 integer transform and its factorized form in (9)

and (10), respectively,

[













] (9)

[













] [













] [













] (10)

where the elements [ ] are all integers. To make resemble


in terms of the relative magnitudes among

matrix elements, the following three constraints should be imposed.



C. Odd Part of Order-16 Integer Transform

The derivation of the odd part of order-16 integer transform is similar to that for the order-8 one. Based on [18], the odd part

of order-16 DCT shown in (6) is re-written as in (11).





[


























































































































]











(11)

Extending (11) to a more general matrix that is also applicable for integer elements, we propose the odd part of order-16 integer

transform and its factorized form in (12) and (13), respectively, where [ ] are all integers.

Relation 1

1. ⁄ ⁄ ⁄ and ⁄ ⁄ ⁄

2. ⁄ ⁄ ⁄

3. and



> T-CSVT <



5



[






















































]










(12)







































































































































































88

88

88

88

88

88

88

88

88

88

88

8

8

88

88

88

8

000000

000000

000000

000000

000000

000000

000000

000000

10010000

01100000

01100000

10010000

00001001

00000110

00000110

00001001

10000010

01000001

00110000

00110000

00001100

00001100

10000010

01000001

000000

000000

000000

0000000

0000000

000000

000000

000000

de

fc

bg

ha

ah

gb

cf

ed

ll

ji

ij

k

k

ji

ij

ll

P



(13)



The following two conditions need to be satisfied to ensure the relative magnitudes among the matrix elements remain similar to

those in a DCT matrix.



D. Odd Part of Order-32 Integer Transform

The derivation of the odd part of order-32 integer transform and its factorization is inspired by Chen’s DCT fast algorithm

[19], as the LLM fast algorithm [18] used in Sections II.B and II.C are only for order-8 and order-16 DCT. Chen’s algorithm is

not the most efficient in terms of arithmetic operations, but provides a systematic approach to developing the fast algorithm for

order-2
K
DCT, where K can be arbitrarily large. In [19], the odd part of an order-N DCT is factorized into ( ) matrices

with the size 16×16 and each matrix can be calculated by a close-form expression. Hence, the odd part of order-32 DCT is

factorized to seven matrices, as expressed by (14).



(14)

In practice, transforming a 1-D 16-point signal by (14) is realized by seven serial stages with each stage accomplishing the

function of one matrix multiplication, and therefore, the latency is very high.

To reduce the latency, we propose to factorize the odd part of an order-32 integer transform into five integer matrices, as
shown in (15),

(15)

where is shown in (16). In (16), and are exactly the same as and defined in [19], since they
are binary matrices; and keep the same sign distribution and relative magnitudes among elements as in and ,
respectively, as long as the following two conditions are satisfied.



can be expressed as the product of three integer matrices, , , and , as shown in (17), and (18)-(24) shows the definitions
of the three matrices.

(17)

[













] (18)

Relation 3

1. √

2. (



) (




)

Relation 2

1.

2. (



) (




)








> T-CSVT <



6

[













] [













] (19)

[













] [













] (20)

[













] [













] (21)

[













] [













] (22)


































































































































































































































































































0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

2

1

2

1

1

2

1

2

0

0

0

0

2

1

2

1

1

2

1

2

0

0

1234

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

y

MMMM



(16)





> T-CSVT <



7



























































0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

Y



(23)





























































0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

Z



(24)



and are both binary matrices with only one non-zero element in each row/column, meaning they are used for re-ordering
the 1-D 16-point data being processed in the serial stages (15). is the only effective matrix in (17), and multiplying with can
be realized by multiplying with four 4×4 integer matrices, which in total reduces the arithmetic operations to one fourth.

Actually, can be further factorized into three sparser matrices, such that is fully factorized to seven matrices as for


in

(14). By doing this, the computation requirement is further reduced, but the latency becomes unacceptable for practical

implementation. Therefore, factorizing by (15) is proposed to balance these two aspects. The intermediate numbers
[ ], as in (19) to (22), should satisfy the following two relations, in order to keep the relative
magnitudes of DCT unchanged.

1.

2. (



) (




) (




) (




)

Consequently, the actual integer elements [ ], [ ], [ ], and
[ ] in , , , and , respectively, have the following four relations.



Relation 4

1.

2.

3.

4.



> T-CSVT <



8

III. PROPOSED CORE TRANSFORM DESIGN FOR HEVC

The proposed core transform design for HEVC includes a set of order-4, order-8, order-16, and order-32 integer transforms.

They use the factorizable general transform matrices as presented in Section II, and thus the fast algorithms can be easily

developed. This section introduces how to determine the values of the variables used to represent the general matrices.

To ensure that the energy compaction capability of the integer transform is comparable to DCT, the relative magnitudes

among the transform matrices’ elements should approximate those of DCTs, as specified by Relations 1 to 4 in Sections II.B to

II.D. We first scale the real-valued numbers on the right side of the Relations by certain factors, and then use the scaled values as

the starting points to search the integers in a certain neighborhood. The best set of the integers for the left side of the Relations

achieves the balance of three important aspects: the variance of the L
2
norms of the basis vectors, the deviation from DCT, and

the transform error introduced by non-orthogonality.

Given the transform matrix of an order-N integer transform, denoted as , the transformation process is shown in (25), where
and represent the input and output signals, respectively.


(25)

The L
2
norm of the ith basis vector of is calculated by (26), and the L

2
norms of all the basis vectors are denoted as a vector

[ ]
.

√∑

(26)

For energy conservation, should be normalized by element-wise dividing by a scaling matrix (
, which is an

additional process in comparison to DCT. However, when the L
2
norms are very close or even equal to each other, becomes a

constant matrix, and the element-wise matrix division can be approximated by a scalar matrix division, which can be combined

with the following scalar quantization. Hence, making the L
2
norms almost equal can save the normalization step and is a highly

desired feature in integer transform design.

Deviation from DCT is measured by DCT distortion [8], calculated by (27),














(27)

where denotes the main diagonal of matrix , and is the normalized transform matrix, obtained by (28).





(28)

This criterion reflects the signal energy that leaks out of DCT subbands. A smaller value of DCT distortion means less deviation

from DCT and thus implies less performance degradation compared with DCT.

Orthogonality is a strong condition in designing integer transforms. The magnitudes of the elements in the transform matrix

have to be very large especially for high order transforms to fulfill orthogonality, and can become even larger if other constrains

are imposed at the same time. If orthogonality is not fulfilled, the average energy of the reconstruction error in the spatial domain

is larger than that of the quantization error in the transform domain, and the difference is known as transform error, denoted as


. In this paper, orthogonality is not one of the design constraints, but the elements of the transform matrices are well selected,

such that the transform error
is negligible. According to [3], the transform error of an order-N integer transform can be

calculated by (29),








∑ ∑ | |

(29)

where
is the variance of the input source, and is the error matrix, defined by (30).





(30)

We conducted brute-force search with all the aforementioned aspects taken into account, and the best solution, as shown in

TABLE II, was proposed as the core transform design for HEVC [17]. The specific odd parts, obtained by substituting the

parameter values in TABLE II into the general matrices introduced in Section II, are shown in (31) to (34). Fig. 1 shows the data

flow of the fast algorithm for the forward order-32 integer transform. Since the order-N integer transform is used as the even part

of the order-2N one, as introduced in Section II, its fast algorithm is also shown as part of the fast algorithm of order-2N

transform in the same recursive manner.

[



] (31)



> T-CSVT <



9

[













] (32)



[





















































]









(33)



[












































































































































































































]


















(34)

IV. EXPERIMENTAL RESULTS

The proposed core transform design, as presented in Section III, was integrated into HEVC’s test model HM4.0 [20], using

16-bit integer arithmetic, and tested under the common test conditions [21] and the special test conditions defined by the core

experiment group 10 (CE10) [22]. The common test conditions are mainly focused on HD and ultra HD video sequences; the

mandatory part includes five classes of video sequences in terms of resolutions, i.e., Class A 2560×1600, Class B 1080p, Class C

832×480, Class D 416×240, and Class E 720p. Each sequence is coded by three types of coding structures: all intra (AI), random

access (RA), and low delay (LD), and two configurations of coding tools: high efficiency (HE) and low complexity (LOCO).

With certain coding structure and coding tool configuration, a sequence is coded at four QPs (22, 27, 32, and 37) to generate an

operational R-D curve over a wide range of bit-rate, and the average bit-rate reduction compared with HM4.0, known as BD-rate

[23], is calculated and used to measure the R-D performance of a proposed coding tool. The special test conditions defined by

CE10 require testing the R-D performance at low QPs (1, 5, 9, and 13) and high QPs (36, 42, 47, and 51) to see if there is

unusual behavior at these uncommon test points.

TABLE II

PARAMETER VALUES FOR THE GENERAL TRANSFORM MATRICES PROPOSED AS THE CORE TRANSFORM DESIGN FOR HEVC

Order-4 167 70

Order-8 3 2 5 1 144 99 72 99 1 2

Order-16
60 57 53 46 39 28 18 5

10 -22 24 17

Order-32

7 5

13 5 12

1 5 31 32 23 19 26 21

31 13 28 8 16 30 11 27

8 21 23 31 27 2 32 16

30 19 25 10 29 31 5 13

128 2 4 2





> T-CSVT <



10

With the test conditions LD and HE, the BD-rate of the proposed core transform design, together with the BD-rates of other

CE10-related contributions discussed in JCT-VC, is shown in TABLE III. For the performance under other test conditions,

interested readers are referred to [11], [15], [16], and [17] for detailed data. As can be seen, the performance among all the core

transform designs for HEVC is less than 1% BD bit-rate difference, no matter what the QP is. Similar results can be observed

under other test conditions. Therefore, the features that the existing core transform designs have and the number of arithmetic

operations they require, which are compared in TABLE IV and TABLE V respectively, become important factors in evaluation.

99

99

144

144

99

99

5

5

-12

-12

-12

-12

5

5

55

5

5

5

-5

-5

-5

-5

5

5

5

5

5

-22

22

10

22

22

-10

17

17

-17-60

18

-53

39

53

-39

57

57

-

23

5

5

3

-1

-2

167

-167

-18

60

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

70

70

-

>>1

>>1

46

28

5

28

5

46

-

-

-

-

-

-

-

-

24

24

17

10

10

<<1

<<1

<<1

<<1

<<2

<<2

<<2

<<2

<<2

<<2

<<2

<<2

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

<<1

-

-

-

-

-

-

-

-

7

7

7

7

7

7

7

7

5

5

5

-

-

-

-

-

-

-

-

13

13

13

13

-5

-5

-12

-12

12

12

5

-

-

-

-

-

-

-

-

n0

n1

n2

n3

n4

n5

n6

n7

n8

n9

n10

n11

n12

n13

n14

n15

n31

n30

n29

n28

n27

n26

n25

n24

n23

n22

n21

n20

n19

n18

n17

n16

X1

X2

X3

X4

u0

u8

u16

u24

u4

u20

u12

u28

u10

u22

u14

u30

u2

u18

u26

u6

u5

u27

u21

u11

u13

u19

u29

u3

u1

u31

u17

u15

u9

u23

u25

u7

13

13

13

13

-144

-144

<<7

<<7



Fig. 1 Fast algorithm for the forward order-32 integer transform with the fast algorithms for the forward order-4, order-8, and order-16 integer transforms

embedded.



> T-CSVT <



11

As shown in TABLE IV, all the core transform designs have small DCT distortion, and therefore their performance all

approaches that of the DCT’s, as evidenced in TABLE III. The core transform design adopted in HEVC cannot be fully

factorized, which means it does not have a fast algorithm, and thus the required arithmetic operations are much higher than other

designs especially for high order transforms (see TABLE V). In the other designs, Joshi’s [15] still has higher complexity,

because its basis vectors have quite different L
2
norms, which need to be corrected by an additional scaling process. Furthermore,

although the core transform design in [15] supports implementation by a fast algorithm, it does not have an equivalent realization

using matrix multiplication. This limits the flexibility of its implementation. Alshina’s design [16] has all the desired features.

However, the symmetry structure proposed in [16] is not as efficient as the structure proposed in this paper. As a result, it needs

many more multiplications for high order transforms and the magnitudes of the elements in the transform matrices are larger than

those in the proposed design. No matter what the transform size is, the elements are represented by 14 bits, and the forward

order-32 transform cannot even be implemented using 32-bit integer arithmetic. The core transform design proposed in this paper

achieves a good balance of R-D performance and computation requirement, and allows more flexibility for implementation.

TABLE III

COMPARISONS OF THE EXISTING CORE TRANSFORM DESIGNS IN BD-RATE REDUCTION (%)

Class Sequence
Normal QP Low QP High QP

Joshi’s Alshina’s Proposed Joshi’s Alshina’s Proposed Joshi’s Alshina’s Proposed

B

Kimono 0 0 0.1 0 0.2 0.2 0.3 0.3 0.1

ParkScene 0 0.1 0.1 0.1 0.1 0.1 0 0.4 -0.1

Cactus 0 0 0 0.1 0.1 0.1 -0.1 -0.1 0

BasketballDrive 0 0 0.1 0.3 0 0.1 0 -0.1 0

BQTerrace 0 0 0.1 0.2 0 0.1 -0.2 -0.3 -0.5

C

BasketballDrill -0.1 0.3 0.2 0.2 0.1 0 0.3 0.1 0.2

BQMall 0 -0.1 0 0.2 0 0.1 -0.1 -0.4 0.1

PartyScene 0 0 0.1 0.2 0 0.1 -0.2 0 -0.2

RaceHorses 0 0 0 0.1 0.1 0.1 -0.1 -0.1 -0.3

D

BasketballPass 0 0 0.1 0.3 0 0 1.3 1.4 -0.2

BQSquare -0.1 0 0 0.2 0 0 -0.2 -0.3 0.3

BlowingBubbles -0.1 0 0 0.2 0 0.1 0.1 0.4 -0.6

RaceHorses 0 0 0.1 0.2 0.1 0 0.1 0.1 0.2

E

Vidyo1 0.2 -0.1 0 0.4 0.1 0 1.9 0.6 -0.1

Vidyo3 0.1 0.1 0.1 0.4 0 0.1 0.8 -0.1 -0.8

Vidyo4 0 0.2 0.3 0.4 0.1 0 -0.4 0 -0.6

Average 0 0.03 0.08 0.22 0.06 0.07 0.22 0.19 -0.16


TABLE IV

COMPARISON OF THE FEATURES OFFERED BY EACH EXISTING CORE TRANSFORM DESIGN


Small DCT

distortion

Full

factorization

No scaling

process

Negligible

transform error

Flexible

implementation

HEVC core transform [7] x x x x

Joshi’s [15] x x x

Alshina’s [16] x x x x x

Proposed x x x x x


TABLE V

Numbers of Additions and Multiplications for N-Point 1-D Transform ( N = 4, 8, 16, and 32 )


HEVC core

transform
Joshi’s Alshina’s Proposed

4-point
Addition 8 9 9 9

Multiplication 6 3 3 3

8-point
Addition 28 26 29 31

Multiplication 22 12 11 11

16-point
Addition 100 72 81 93

Multiplication 86 36 31 21

32-point
Addition 372 186 229 279

Multiplication 342 92 87 56





> T-CSVT <



12

V. CONCLUSION

This paper presents a set of integer transforms with the sizes 4×4, 8×8, 16×16, and 32×32. They have high energy compaction

capability and almost equal L
2
norms of the basis vectors, and are factorizable, which guarantees the development of fast

algorithm. Compared with the prior work, the proposed integer transforms have lower complexity and more flexibility for

implementation while providing comparable coding efficiency.

REFERENCES

[1] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: Prentice-Hall, 1984.
[2] N. Ahmed, T. Natarajan, K.R. Rao, “Discrete cosine transfom,” IEEE Transactions on Computers, vol. 23, no. 1, pp. 90-93, Jan. 1974.
[3] J. Dong, K. N. Ngan, C. K. Fong, and W. K. Cham, “2D order-16 integer transforms for HD video coding,” IEEE Trans. Circuits Syst. Video Technol., vol.

19, no. 10, pp. 1463–1474, Oct. 2009.
[4] W. K. Cham, “Development of integer cosine transforms by the principle of dyadic symmetry,” IEE Proc., Part I, vol. 136, no. 4, pp. 276–282, Aug. 1989.
[5] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, Mar. 2005.
[6] S. Srinivasan et al., “Windows Media Video 9: overview and applications,” Signal Process.: Image Commun., pp. 851-875, 2004.
[7] B. Bross, W.-J. Han, G. J. Sullivan, J.-R. Ohm, and T. Wiegand, “High efficiency video coding (HEVC) text specification draft 7”, document JCTVC-

I1003, Joint Collaborative Team on Video Coding, May 2012.

[8] M. Wien and S. Sun, “ICT comparison for adaptive block transforms,” document VCEG-L12, ITU-T SG16/Q6, Jan. 2001.
[9] W. K. Cham and Y. T. Chan, “An order-16 integer cosine transform,” IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1205-1208, May 1991.
[10] P. Chen, Y. Ye, and M. Karczewicz, “Video coding using extended block sizes,” document T09-SG16-C-0123, ITU-T Q.6/SG16, Geneva, Switzerland,

Jan. 2009.

[11] A. Fuldseth, G. Bjøntegaard, and M. Budagavi, “CE10: core transform design for HEVC,” document JCTVC-G495, Joint Collaborative Team on Video
Coding, Nov. 2011.

[12] S. Ma and C.-C. Kuo, “High-definition video coding with supermacroblocks,” in Proc. SPIE Vis. Commun. Image Process., vol. 6508. Jan. 2007, pp.
650816-1–650816-12.

[13] C. K. Fong and W. K. Cham, “Simple order-16 integer transform for video coding,” in Proc. IEEE Int. Conf. Image Process., pp. 161-164, Sept. 2010.
[14] C. K. Fong and W. K. Cham, “LLM integer transform and its fast algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 6, pp. 844-854, Jun.

2012.

[15] R. Joshi, J. Sole, and M. Karczewicz, “CE10: Scaled integer transforms supporting recursive factorization structure,” document JCTVC-G579, Joint
Collaborative Team on Video Coding, Nov. 2011.

[16] E. Alshina et al., “CE10: full factorization core transforms for HEVC,” document JCTVC-G737, Joint Collaborative Team on Video Coding, Nov. 2011.
[17] J. Dong and Y. Ye, “Non-CE10: core transform design for HEVC”, document JCTVC-G272, Joint Collaborative Team on Video Coding, Nov. 2011.
[18] C. Loeffler, A. Ligtenberg, and G. S. Moschytz, “Practical fast 1-D DCT algorithms with 11 multiplications,” in Proc. IEEE Int. Conf. Acoust. Speech

Signal Process., vol. 2. May 1989, pp. 988–991.
[19] W. H. Chen, C. H. Smith, and S. C. Fralick, “A fast computational algorithm for the discrete cosine transform”, IEEE Trans. Commun., vol. 25, no. 9,

pp.1004-1009, Sept. 1977.
[20] https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-4.0/.
[21] F. Bossen, “Common test conditions and software reference configurations”, document JCTVC-F900, Joint Collaborative Team on Video Coding, Jul.

2011.
[22] P. Topiwala, M. Budagavi, R. Joshi, A. Fuldseth, I. Kim, “CE10: Core Transform Design,” document JCTVC-F910, Joint Collaborative Team on Video

Coding, Jul. 2011.

[23] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” document VCEG-M33, ITU-T SG16/Q6, Apr. 2001.