RELATIONSHIP BETWEEN DCTII, DCTVI, AND DSTVII TRANSFORMS
Yuriy A. Reznik
InterDigital Communications
9710 Scranton Rd, San Diego, CA 92121, USA
Email: yreznik@ieee.org
ABSTRACT
Discrete Sine Transfonns of type VII (DSTVII) have recently
received considerable interest in video coding. In this paper,
we show that there exists a direct connection between DST
VII and DCTII transfonns, allowing their joint computation
for certain transfonn sizes. This connection also yields fast
algorithms for constructing DCTVI and DCTVII.
Index Terms KLT, DCTII, DCTVI, DSTVII, factor
izations, video coding.
1. INTRODUCTION
The Discrete Cosine Transfonns of types II and IV are among
most fundamental, well understood, and much appreciated
tools in data compression. The DCTII is used at the core of
standards for image and video compression, such as JPEG,
ITUT H.26xseries, and MPEG 14 standards [1]. The
DCTIV is used in audio coding algorithms, such as ITUT
Rec. G.722.1, MPEG4 AAC, and others [2]. Such transforms
are very well studied, and a number of efficient technique ex
ists for their computation [1, 3, 4, 5, 6, 7].
Much less known are socalled "odd" sinusoidal trans
fonns: Discrete Cosine and Sine Transfonns of types V, VI,
VII, and VIII. Existence of some of such transfonns was dis
covered by A. Jain in 1979 [8]. A complete tabulation was de
veloped in 1985 by Wang and Hunt [9]. However, not much
work has followed. Surveys of related results can be found
in [10, 3].
Recently, DST of types VI and VII have surfaced as useful
tools in image and video coding. In 2010, Han, Saxena, and
Rose have shown that DSTVII produce good approximations
of KarhunenLoeve Transform (KLT) for model of residual
signals after Intraprediction [11]. This was subsequently val
idated in the course of experimental work on ISO/IEC/ITUT
High Efficiency Video Coding (HEVC) standard [12, l3].
The adoption of DSTVII in HEVC has prompted a dis
cussion on the existence of fast algorithms for computing of
such transfonns [12, 14]. This question was addressed in
2011 by Chivukula and Reznik[15], who have established
connection between DSTVII and DFT.
9781479903566/13/$31.00 ©2013 IEEE 5642
This paper offers an alternative solution by establishing a
mapping between DSTVII, DCTVI, and DCTII. This map
ping yields fast algorithms not only for DSTVIIVII, but also
for DCTVINII, as well as possible joint factorizations of
such transfonns. The obtained mapping may also be of inter
est from methodological standpoint, as it suggests additional
connections between DSTVII, DCTVI and KLT of residual
and mixed signals.
The rest of this paper is organized as follows. Section 2
provides definitions. Section 3 establishes mapping between
DCTII, DCTVI and DSTVII transfonns. Section 4 explains
how this mapping can be used to construct fast algorithms.
Discussion and concluding remarks are offered in Section 5.
2. DEFINITIONS
Let N be the length of data sequence. The matrices of Dis
crete Fourier Transfonn (DFT) and Discrete Cosine and Sine
transforms of types II, III, IV, VI, and VII will be defined as
follows:
DFT: [FN ] mn
DCTII: [cy] mn
DCTIII: [cyI] mn
DCTIV: [C'zn mn
DCTVI: [Ct�l] mn =
DCTVII: [Ct�l] mn =
DSTVI: [stI] mn
DSTVII: [SVII] = N mn
e_j2n;;n, m, n E [O, Nl]
cosm(2;�1)7r, m, nE[O, Nl]
(2m+ l)n7r [0 N 1] cos 2N ' m, n E , 7r (2m+l)(n+ l) E [0 Nl] cos 4N ,m, n ,
m(2n+ l)7r [0 N] cos 2N+l ' m, n E ,
(2m+ l)n7r [0 N] cos 2N+l ' m, n E ,
. (m+l)(2n+ l)7r E [0 Nl] Sill 2N+l , m, n , . (2m+l)(n+ l)7r E [0 Nl] Sill 2N+l , m, n ,
In the above definitions, we have intentionally omitted
nonnalization constants (such as y!2/N and Ai = [l/a��
O,
conventionally used in definition of DCTII) as they don't af
fect factorization structures of the transfonns. Subindices N
or N + 1 indicate lengths of the transfonns. We follow Wang
and Hunt's convention of coupling N point DSTVINII with
N + Ipoint DCTVINII [9].
As easily noticed, transfonns of types II and III, as well
as VI and VII are closely related:
(CII)T _ CIII. (CVI )T _ CVII . (SVI)T _ SVII
N  N , N+l  N+l' N  N .
ICASSP2013
x,
XN.2
XN·l
x,
XN+'
XNo2
1                                 1
i 2N+1point OCTII i
, I 1 Yo Yo 1 Xo
y,
y,
YN.2
YN·,
y,
y,
y,
N+1point
OCTVI
N point
DSTVII
X2N•2
YN., X2N•2
Y, x"
Y, x,
Y, x,
x,
r                 
N
(2N;1 )

��i�toCT =ii 1
I ,
� : 1 Xo
x,
x,
g>
.§:
E
�
�
f
2N+1point
OCTII
r ..
: Npoint : :_����V�I_!
2N+1point
OCTII
r:.
i �t;'�I� :
I _______ !
x,
x,
XN(2N+l)3 2N+1point
OCTII
XN(2N+l)3
,
,
,
 YN.2

(a)
r:.
i �t;'�I� :
1 I _______ ! 1
I ' !...:
(b)
Fig. 1. Computing DCTII of composite sizes: (a) split of 2N + Ipoint DCTII into an N + Ipoint DCTVI and Npoint
DSTVI; (b) computation of DCTII of length N(2N + 1)_
3. RELATIONSIDP BETWEEN 2N + IPOINT
DCTII, N + IPOINT DCTVI, AND NPOINT
DSTVII TRANSFORMS
In this section we prove the following statement.
Theorem 1. The following holds:
) (
where Q2N+l is a matrix, such that when applied to a vector
x, it produces the following sign alterations and reordering:
Xi
(  I ) iXN+ Hi
i = O, _ .. , N,
i = O, . . . , N  1,
(2)
and IN and J N are N x N identity and orderreversal matri
ces respectively.
Proof Let us consider a 2N + Ilong input sequence X
Xo, . . . , X2N, and apply DCTII over it:
2N
7f(2n + I)k
L Xn cos 2(2N + 1) , k = 0 , . . . , 2N. n=O
5643
We first look at even output values (k = 2i, i = 0, ... ,N):
�
7f(2n + 1)2i
L Xn cos 2(2N + 1) n=O
�
7f(2n + I)i
L Xn cos 2N + 1 . n=O
We split this sum as follows:
ell � 7f(2n + I)i
X2i  L Xn cos 2N + 1 n=O
2N
� 7f(2n + I)i L Xn cos 2N + 1 n=N+ l
�
7f(2(2N  n) + I)i
L Xn cos 2N + 1 n=N+ l
Nl
( ) � 7f 2n + 1 i L X2N n cos 2N + 1 ' n=O
which implies that
9point OFT
Yo Yo
y, y,
y, y,
y, y,
y, y,
y, y,
y, y,
y, y,
y, y,
(a)
'9:"01"10<:1:11,
r5:pointDCTVIO�5
]
, , , ,
,,: :0 Xo
:
_11 X2
x, ��_H�+,�����������:� �
Xo
:3 �

,

:
4 Xa
x, i..,LfH.�_+_<i"::..:==='''i;__r"'�___:_� x,
: :
x, i���_+_���� __ ���:�'�: � ,
,
x, �4T���������4_,��ir�t+: Xs
x, �4��, �����
,
, ,
, " ,
: sln(t ) : : : ______________________________________ J :

(b)
x,
Fig. 2. Flowgraph of Winograd's factorization of DFT of length 9, and flowgraphs of 9point DCTII, 5 point DCTVI, and
4point DSTVII implied by mappings (1,3).
where XC v I is a DCTVI transform over the first N + 1 ele
ments of input sequence x, and X,CV1 is a DCTVI transform
over the following input:
, _ [ X2Nn, xn  0 ,
if n = 0 , . . . , N  1,
ifn= N.
We now turn our attention to the odd output values (k =
2i + 1, i = 0 , . . . , N  1):
�
7r(2n + 1)(2i + 1)
� Xn cos 2( 2N + 1) n=D
( )i+l � . 7r
(N n)(2i + 1)
1 � X2N n Sill ''''
n=D 2N + 1
We split this sum as follows:
XCII ( )i �
l
. 7r(N n)(2i + 1)
2i+ 1 + 1 � X2N n Sill '2N.,....,'+'1'n=D
(_l)i+l � . 7r
(Nn)(2i+l) � X2Nn Sill 2N + 1 n=N+l
( )i
N
2:
l
. 7r(n + 1) (2i + 1)
1 XNln Sill , 2N +1 n=D
which implies that
where XSVll is an Npoint DSTVII transform over a se
quence
Xn = XN+l+n, n = 0 , . . . , N  1,
5644
'"
VII and XS is an N point DSTVII transform over a sequence
Xn=XNln, n=O, ... , N  l.
By combining all these mappings we arrive at expression (1).
D
We present a flowgraph of the resulting mapping between
DCTII, DCTVI, and DSTVII transforms in Figure 1.a. Only
2N additions, permutations and sign changes are needed to
convert output of DCTVI, and DSTVII into DCTII.
4. FAST ALGORITHMS FOR COMPUTING DCTII,
DCTVI, AND DSTVII
4. 1. Connection to DFT
From (1) it follows that fast computation of DCTVI and DST
VII can be reduced to computing subsets of DCTII. Accord
ing to Heideman [4] it is also known that computing of DCT
II of odd numbers is equivalent to computing samelength
DFT. Considering 2N + Ipoint transforms, we can summa
rize Heideman's result as follows:
eII

H ( [R(F2N+l)]rows D, ... ,N ) H (3) 2N+l  1 ['2s(F )] 2, 2N+l rows N+l, ... ,2N
where R (F2N+d and '2s (F2N+l) denote real and imaginary
parts of the DFT transform matrix of size 2N + 1, and Hl and
H2 are some permutation and signinversion matrices [4].
In combination with (1) this formula shows that an N + 1
point DCTVI, an Npoint DSTVII and an 2N + Ipoint
DCTII can be computed by mapping to a 2N + Ipoint DFT.
Since many algorithms for computing of DFT are readily
available (see e.g. [16]), this automatically leads to to fast
algorithms for computing DCTVI and DSTVII.
2N+1point OCTII 2Npoint OCTII
(a) (b)
Fig. 3. Conceptual illustration of decompositions of (a) 2N + Ipoint DCTII (1) and (b) 2N point DCTII (4).
4.2. Examples of fast algorithms for N = 4
We use Winograd DFT module of length 9 shown in Fig
ure 2.a. This particular factorization comes from [16]. By
using this ftowgraph and mappings (3) and (1) we easily ob
tain 9point DCTI1, 5point DCTVI, and 4point DSTVII.
This is shown in Figure 2.b.
We note that all these algorithms are very efficient in
terms of multiplicative complexity. Thus, obtained 9point
DCTII requires only 8 nontrivial multiplications. In con
trast, the least complex algorithms for computing DCTll of
size 8 (nearest dyadicsize) requires 11 multiplications[7].
The obtained 4point DSTVII is also very efficient: it
uses only 5 multiplications. This factorization is immediately
suitable for implementing an integer approximation of DST
VII transform defined in HEVC standard [13].
Finally, factorization of a 5point DCTVI shown in Fig
ure 2.b needs only 3 real multiplications and 2 shifts (multi
plications by factors 1/2).
4.3. Fast computing of transforms of length 2k N (2N + 1)
It is known that a transform of a composite length N = pq,
where p and q are coprime, can be decomposed into a cas
cade of p qpoint transforms and q ppoint transforms fol
lowed by pq  p  q  1 additions. This class of techniques
is called Prime Factor Algorithms (PFA) [17, 18].
In Figure 1.b, we show how to compute DCTII of length
N(2N + 1). This factorization includes 2N + 1 Npoint DCT
II subtransforms, and additionally N 2N + Ipoint DCT
II transforms, which, in turn include N point DSTVII as
part of their ftowgraph. Hence, a system that implements
and uses Npoint DCTII and DSTVII, can easily compute
an N(2N + 1) transform by reusing them. Same principle
more generally applies to computing transforms of lengths
2k N(2N + 1).
Embedded factorization structures including DSTVII
blocks in ftowgraphs for DCTII can be of interest to hard
ware implementations, as it offers potential for reducing the
area, cost, and power usage of a circuit responsible for com
puting transforms.
5645
S. DISCUSSION AND CONCLUDING REMARKS
We notice that decomposition (1) looks very similar to the
wellknown split of evensized DCTI1 (see, e.g. [3]):
(4)
where P2N is a certain permutation matrix. This split leads
to recursive construction due to reappearance of DCTll in the
upper part of decomposition. In contrast, our decomposition
of 2N + Ipoint DCTI1 (1) does not immediately lead to a
recursion.
In Figure 3 we offer conceptual illustration of both de
compositions (1) and (4). Input data samples are denoted as
YN, ... ,YI,ZO,XI, ... ,XN in a 2N + Ipoint case (a), and
YN, ... , YI, Xl, ... , XN in 2Npoint case (b). It is shown that
the lower (right) portion of DCTI1 transform becomes es
sentially equivalent to DSTVII (or DCTIV) transform over
residual samples Y; = Yi  Xi, while the upper (right) por
tion of DCTI1 transform becomes essentially equivalent to
DCTVI (or DCTII) transform over sums: x; = Xi + Yi,
(i = 1, . . . , N). In the 2N + Ipoint case, the upper trans
form also absorbs the middle sample ZOo
This illustration may be insightful for understanding
meanings of the involved transforms. For instance, in sig
nal processing, it is customary to think of DCTII as an
approximation of KLT for 1st order Markov source with
high correlation coefficient. Decomposition in Figure 3.a
shows that DSTVII, as well as DCTVI (with some permuta
tions and sign changes) can be understood as approximations
of KLT over residual or mixed signals with progressively
increasing distances between samples. Similarly, decom
position in Figure 3.b shows that in case of a 2Nsample
arrangement, it is DCTIV and DCTII that can be understood
as approximations of KLT over residual and mixed signals.
The obtained relationship (1) may also be instrumental in
showing that DSTVIIbased coding of Intraprediction resid
ual is essentially equivalent to performing L shaped DCT
II, where one part of L shape corresponds to boundary pix
els, and the other part absorbs pixels predicted based on this
boundary. A design of directionadaptive transforms based on
similar idea was proposed in [19].
6. REFERENCES
[1] K. R. Rao and P. Yip, Discrete Cosine Transform: Algo
rithms, Advantages, Applications, Academic Press, Boston,
MA,1990.
[2] R. K. Chivukula and Y. A. Reznik, "Efficient implementation
of a class of MDCTIIMDCT filterbanks for speech and audio
coding applications," in IEEE Int. Con! Acoust., Speech, Sig
nal Processing (ICASSP), July 2008, pp. pp. 213216.
[3] V. Britanak, K. R. Rao, and P. Yip, Discrete Cosine and Sine
Transforms: General Properties, Fast Algorithms and Inte
ger Approximations, Academic Press  Elsevier, Oxford, UK,
2007.
[4] M. T. Heideman, "Computation of an oddlength DCT from
a realvalued DFT of the same length," IEEE Trans. Signal
Processing, vol. 40, no. 1, pp. 5461, 1992.
[5] C. W. Kok, "Fast algorithm for computing Discrete Cosine
Transform," IEEE Trans. Signal Processing, vol. 45, no. 3, pp.
757760, 1997.
[6] E. Feig and S. Winograd, "On the multiplicative complexity
of Discrete Cosine Transforms (Corresp.)," IEEE Trans. Infor
mation Theory, vol. IT38, pp. 13871391, 1992.
[7] C. Loeffler, A. Ligtenberg, and G.S. Moschytz, "Practical
fast ID DCT algorithms with 11 multiplications," in IEEE
Int. Con! Acoust., Speech, Signal Processing (ICASSP), May
1989, vol. 2, pp. 988991.
[8] A. K. Jain, "A sinusoidal family of unitary transforms," IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. PAMI
1, no. 4, pp. 356 365, Oct. 1979.
[9] Z. Wang and B. R. Hunt, 'The discrete W transform," Applied
Mathematics and Computation, vol. 16, no. 1, pp. 19  48,
1985.
[10] S. A. Martucci, "Symmetric convolution and the Discrete Sine
and Cosine Transforms," IEEE Trans. Signal Processing, vol.
SP42, pp. 10381051,1994.
[ll] J. Han, A. Saxena, and K. Rose, 'Towards jointly optimal
spatial prediction and adaptive transform in video/image cod
ing;' in IEEE Int. Con! Acoust., Speech, Signal Processing
(ICASSP), March 2010, pp. 726729.
[12] A. Saxena and F. Fernandes, "CE7: Modedependent
DCT/DST for intra prediction in video coding," commitee in
put document JCTVCD033, ISO/IEC/ITUT Joint Collabora
tive Team on Video Coding, Daegu, Korea, January 2011.
[13] B. Bross, WJ. Han, G. Sullivan, J. Ohm, and T. Wiegland,
"High Efficiency Video Coding (HEVC) text specification
draft 6," ITUT and ISO/IEC JCTVCH1003, Feb 2012.
[14] F. Fernandes A. Saxena and Y. Reznik, "Fast transforms for
intrapredictionbased image and video coding;' in Data Com
pression Conference (DCC), March 2013  submitted.
[15] R. K. Chivukula and Y. A. Reznik, "Fast computing of discrete
cosine and sine transforms of types vi and vii," in Applications
of Digital Image Processing XXXIV, A. G. Tescher, Ed., 2011,
vol. 8135 of Proc. SPlE, pp. 813505114.
5646
[16] C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Al
gorithms and Implementation, WileyInterscience, New York,
1985.
[17] P.P.N. Yang and MJ. Narasimha, "Prime factor decompo
sition of the discrete cosine transform," in IEEE Int. Con!
Acoust., Speech, Signal Processing (ICASSP), March 1985, pp.
pp. 772775.
[18] B. G. Lee, "Input and output index mapping for a primefactor
decomposed computation of discrete cosine transform," IEEE
Trans. Acoust., Speech, Signal Processing, vol. 37, no. 2, pp.
237  244, Feb. 1989.
[19] S. S. Tsai CL. Chang, M. Makar and B. Girod, "Direction
adaptive partitioned block transform for color image coding,"
IEEE Transactions on Image Processing, , no. 7, pp. 1740
1755, July 2010.
InterDigital develops mobile technologies that are at the core of devices, networks, and services worldwide. We solve many of the industry's most critical and complex technical challenges, inventing solutions for more efficient broadband networks and a richer multimedia experience years ahead of market deployment.