<< Chapter < Page Chapter >> Page >
The KLT, which is the optimal orthogonal transform for transform coding, requires knowledge of the source statistics that may be unavailable in practice, and eigendecomposition that may be too expensive in practice. This motivates the consideration of other orthogonal transforms, like the DFT, DCT, and DHT. Here we discuss these and analyze the resulting performance when possible. In addition, we discuss practical matters like real-valued DFTs and fast DCT implementations.
  • Goal: Recall that the goal of the optimal orthogonal transform was tominimize the ratio of geometric to arithmetic output variances:
    k = 0 N - 1 σ y k 2 1 / N 1 N k = 0 N - 1 σ y k 2 .
    The ratio [link] attains its maximum value ( = 1 ) when σ y k 2 are equal for all k and takes on much smaller values when the difference between the σ y k 2 (sometimes called the dynamic range of { σ y k 2 } ) is large.
  • Problem with KLT: The KLT, i.e., the orthogonal transform minimizing [link] , is a function of the input signal statistics.Specifically, the KLT equals the eigenvector matrix of the input autocorrelation matrix.Unfortunately, realistic signals are non-stationary, requiring continual KLT redesign if optimality is to be preserved, andeigenvector computation is computationally intensive, especially for large N . Thus, the question becomes: Are there fixed orthogonal transforms that do a good job of minimizing the ratio [link] for “typical” input signals? As we will see, the answer is yes...
  • DFT Intuitions: For the sake of intuition, lets first consider choosing T as an orthogonal DFT matrix.In this case, the coefficient variances { σ y k 2 } would be samples of the power spectrum and the dynamic range of { σ y k 2 } would be determined by the relative input power in different frequency bands.Recalling that asymptotic TC ( N ) performance is determined by SFM x , which has the same geometric-to-arithmetic-average structure as [link] :
    SFM x = exp 1 2 π - π π ln S x ( e j ω ) d ω 1 2 π - π π S x ( e j ω ) d ω = lim N k = 0 N - 1 S ( e j ω k ) 1 / N 1 N k = 0 N - 1 S ( e j ω k ) ω k = 2 π k / N ,
    we might intuit that the DFT is optimal as N . The asymptotic optimality of the DFT can, in fact, be proven(see Jayant&Noll). Of course, we don't have much reason to expect that the DFT wouldbe optimal for finite transform dimension N . Still, for many signals, it performs well.(See [link] .)
  • Other Transforms: The most commonly used orthogonal transform in speech, image, audio,and video coding is the discrete cosine transform (DCT). The excellent performance of the DCT follows from the fact that it isespecially suited to “lowpass” signals, a feature shared by most signals in the previously mentioned applications.Note that there are plenty of signals for which the DCT performs poorly—it just so happens that such signals are not frequentlyencountered in speech, image, audio, or video. We will describe the DCT and provide intuition regarding it's good“lowpass” performance shortly. Like the DFT, the DCT has fast algorithms which make it extremelypractical from an implementation standpoint. Another reasonably popular orthogonal transform, requiring even lessin the way of computation, is the discrete Hadamard transform (DHT), also described below. [link] compares DFT, DCT, DHT, and KLT for various transform lengths N along with asymptotic TC performance. [link] (a) shows gain over PCM when using a lowpass autoregressive (AR) source { x ( n ) } generated from white Gaussian noise { v ( n ) } via:
    X ( z ) = 1 1 - 0 . 8 z - 1 V ( z ) ,
    while [link] (b) shows the gain for highpass { x ( n ) } :
    X ( z ) = 1 1 + 0 . 8 z - 1 V ( z ) .
    See [link] for the power spectra of these two processes.
    This figure is comprised of two cartesian graphs plotting 1/SFMx, KLT, DCT, DFT, and DHT on a graph of horizontal axis N and vertical axis Gain. These graphs all begin with a strong positive slope, and they flatten out  as they continue across the page to the right. The first to flatten out on both graphs is DHT at a vertical value of approximately 2.1.  The rest of the graphs flatten at higher levels. This figure is comprised of two cartesian graphs plotting 1/SFMx, KLT, DCT, DFT, and DHT on a graph of horizontal axis N and vertical axis Gain. These graphs all begin with a strong positive slope, and they flatten out  as they continue across the page to the right. The first to flatten out on both graphs is DHT at a vertical value of approximately 2.1.  The rest of the graphs flatten at higher levels.
    G TC , N for various transforms and various N on an AR(1) lowpass (left) and highpass (right) sources.
    This figure contains two graphs, each with a vertical axis labeled dB. The first graph, (a), contains one curve that begins at the leftmost side of the page at (-3, -5), continues in an increasing fashion to a local maximum at (0, 15), and then symmetrically decreases to (3, -5). The second graph begins at (-3, 14) and decreases shallowly to (0, -5), then increases to (3, 15) in a symmetrical shape to the left side of the graph. This figure contains two graphs, each with a vertical axis labeled dB. The first graph, (a), contains one curve that begins at the leftmost side of the page at (-3, -5), continues in an increasing fashion to a local maximum at (0, 15), and then symmetrically decreases to (3, -5). The second graph begins at (-3, 14) and decreases shallowly to (0, -5), then increases to (3, 15) in a symmetrical shape to the left side of the graph.
    Power spectra of AR(1) sources used in the transform matrix comparisons of [link] .
  • The DHT: The N × N DHT is defined below for power-of-two N :
    H 2 = 1 2 1 1 1 - 1 H 2 N = 1 2 H N H N H N - H N
    Note that H N is orthogonal Caution: outputs of the Matlab command hadamard must be scaled by 1 / N to produce orthogonal DHT matrices! , i.e., H N H N t = I . As an example
    H 4 = 1 2 1 1 1 1 1 - 1 1 - 1 1 1 - 1 - 1 1 - 1 - 1 1
    [link] illustrates DHT basis vectors for the case N = 8 . The primary advantage of the DHT is that its implementationcan be accomplished very efficiently. [link] suggests that the DHT performs nearly as well as the KLT for N = 2 and 4, but its performance falls well short of optimal for larger N .
    This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is alternating up and down arrows. Line three is up, up , down down, up, up, down, down. Line four is up down down up up down down up. Line five is up up up up down down down down. Line six is up down up down down up down up. Line seven is up up down down down down up up. Line eight is up down down up down up up down. This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is alternating up and down arrows. Line three is up, up , down down, up, up, down, down. Line four is up down down up up down down up. Line five is up up up up down down down down. Line six is up down up down down up down up. Line seven is up up down down down down up up. Line eight is up down down up down up up down.
    8 × 8 DHT basis vectors.
  • The DFT: The normalized Due to the norm-preserving scale factor 1 / N , the DFT definitions above differ from those given in most digitalsignal processing textbooks. DFT from { x n } to { y k } is defined below along with its inverse.
    y k = 1 N n = 0 N - 1 x n e - j 2 π N k n ; k = 0 N - 1 , x n = 1 N k = 0 N - 1 y k e j 2 π N k n ; n = 0 N - 1 .
    The normalized DFT can be represented by a symmetric unitary Outputs of the Matlab command dftmtx must be scaled by 1 / N to produce unitary DFT matrices. matrix W N :
    W N k , n = 1 N e - j 2 π N k n ; k , n = 0 N - 1 .
    By unitary, we mean that W N W N * t = I , where ( · ) * denotes complex conjugation. Note that a unitary matrix is the complex-valued equivalentof an orthogonal matrix. In practice, the N × N DFT is implemented using the fast Fourier transform (FFT), which requires N 2 log 2 N complex multiply/adds when N is a power of two.
  • The Real-Valued DFT: Since we assume real-valued x n , complex-valued DFT outputs y k might seem problematic since transmitting both real and imaginary components would decrease our transmission efficiency.For a real-valued DFT input, however, the DFT outputs exhibit conjugate symmetry which allows us to represent the N complex valued outputs with only N real-valued numbers. More precisely, real-valued DFT input { x n } implies that DFT output { y k } has the property
    y k = y N - k * , k = 1 , 2 , , N / 2 ,
    which implies
    Re { y k } = Re { y N - k } k = 1 , 2 , , N / 2 , Im { y k } = - Im { y N - k } k = 1 , 2 , , N / 2 , Im { y 0 } = Im { y N / 2 } = 0 .
    A good method by which to select non-redundant components of the DFT output is:
    1. Compute complex-valued { y k } using the standard DFT.
    2. Construct real-valued { y k ' } from { y k } as follows:
      y 0 ' = y 0 ( R ) y 1 ' = 2 Im { y 1 } y 2 ' = 2 Re { y 1 } y N - 3 ' = 2 Im { y N / 2 - 1 } y N - 2 ' = 2 Re { y N / 2 - 1 } y N - 1 ' = y N / 2 ( R ) .
    The method above is convenient because (i) it preserves the frequency ordering of the DFT and (ii) it preserves the norm of theDFT output vector, i.e., y ' = y . Using the conjugate symmetry property of DFT outputs, we canwrite the transformation from { y k } from { y k ' } as a matrix operation U N :
    y ' = 1 0 0 0 0 0 0 0 - j / 2 0 0 0 0 j / 2 0 1 / 2 0 0 0 0 1 / 2 0 0 - j / 2 0 0 j / 2 0 0 0 1 / 2 0 0 1 / 2 0 0 0 - j / 2 0 j / 2 0 0 0 0 1 / 2 0 1 / 2 0 0 0 0 0 1 0 0 0 U N Re { y 0 } Re { y 1 } + j Im { y 1 } Re { y 2 } + j Im { y 2 } Re { y N / 2 - 1 } + j Im { y N / 2 - 1 } Re { y N / 2 } Re { y N / 2 - 1 } - j Im { y N / 2 - 1 } Re { y 2 } - j Im { y 2 } Re { y 1 } - j Im { y 1 } y
    The normalization feature guarantees that U N is unitary (which is easily checked by inspection).Then U N W N , the product of two unitary matrices, is also unitary.Since U N W N is actually real-valued (since it takes any real-valued x to a real-valued y ' ) it should be referred to as orthogonal rather than unitary.Henceforth we rename U N W N the real-valued DFT matrix W N r
    W N r := U N W N .
    It is easily checked that the basis vectors of W N r are sampled sines and cosines at the uniformly spaced frequencies { 2 π k / N ; k = 0 , , N / 2 } . [link] gives an illustration of the real-valued DFT basis vectors for the case N = 8 .
    This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row, or on the line. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is on, down down down, on, up up up. Line three is up up on down down down on up. Line four is on down on up on down on up. Line five is up on down on up on down on. Line six is on down up down on up down up. Line seven is up down on up down up on down. Line eight is up down up down up down up down. In this figure the lengths of the small line segments differ slightly, and the lengths are dependent on a wavelike shape that the variations of the lines make. This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row, or on the line. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is on, down down down, on, up up up. Line three is up up on down down down on up. Line four is on down on up on down on up. Line five is up on down on up on down on. Line six is on down up down on up down up. Line seven is up down on up down up on down. Line eight is up down up down up down up down. In this figure the lengths of the small line segments differ slightly, and the lengths are dependent on a wavelike shape that the variations of the lines make.
    8 × 8 Real-valued DFT basis vectors.

    [dft and real-valued dft for N = 4 ]

    W 4 = 1 2 1 1 1 1 1 - j - 1 j 1 - 1 1 - 1 1 j - 1 - j , x = 3 - 1 4 2 , W 4 x = 4 - 1 / 2 + j 3 / 2 3 - 1 / 2 - j 3 / 2 .
    W 4 r = 1 2 1 1 1 1 0 - 2 0 2 2 0 - 2 0 1 - 1 1 - 1 , x = 3 - 1 4 2 , W 4 r x = 4 3 / 2 - 1 / 2 3 .
    Recall that when N is a power of 2, an N -dimensional complex-valued FFT requires N 2 log 2 N complex multiply/adds. When the input is real, however, an N -dimensional FFT may be computed using N log 2 N real multiply/adds (see Sorensen&Jones&Heideman&Burrus TASSP 87).
  • The DCT: The DCT is defined below along with its inverse
    y k = 2 N α k n = 0 N - 1 x n cos ( 2 n + 1 ) k π 2 N ; k = 0 N - 1 , for α 0 = 1 / 2 , α k 0 = 1 , x n = 2 N k = 0 N - 1 α k y k cos ( 2 n + 1 ) k π 2 N ; n = 0 N - 1 ,
    The DCT can be represented by an orthogonal matrix C N :
    C N k , n = 2 N α k cos ( 2 n + 1 ) k π 2 N ; k , n = 0 N - 1 .
    See [link] for an illustration of DCT basis vectors when N = 8 .
    This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row, or on the line. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is up up up up down down down down, although the lengths of the small line segments decrease, then increase in the negative direction to form a smooth transition from lines pointing up to lines pointing down. Line three is up up down down down down up up, and the lines move in length in a wavelike pattern. Line four is up down down down up up up down, again in a wavelike pattern. Line five is up down down up up down down up. Line six is up down up up down down up down. Line seven is up down up down down up down up. Line seven is up down up down up down up down. This figure is comprised of 8 rows containing line segments ended by small dots either above or below the row, or on the line. The directions of the lines on each row are as follows. Line one is 8 up arrows. Line two is up up up up down down down down, although the lengths of the small line segments decrease, then increase in the negative direction to form a smooth transition from lines pointing up to lines pointing down. Line three is up up down down down down up up, and the lines move in length in a wavelike pattern. Line four is up down down down up up up down, again in a wavelike pattern. Line five is up down down up up down down up. Line six is up down up up down down up down. Line seven is up down up down down up down up. Line seven is up down up down up down up down.
    8 × 8 DCT basis vectors.
  • A Fast DCT: There are a number of fast algorithms to compute the DCT.The method presented below is based on the FFT and leads to intuition concerning the good “lowpass” performance of the DCT.
    1. Create a 2 N -length mirrored version of N -length { x n } (see [link] (c)):
      x ¯ n = x n n = 0 , 1 , , N - 1 , x 2 N - 1 - n n = N , N + 1 , , 2 N - 1 .
    2. Compute { y ¯ k } , the 2 N -point DFT of { x ¯ n } :
      y ¯ k = 1 2 N n = 0 2 N - 1 x ¯ n e - j 2 π 2 N k n = 2 N e j π 2 N k n = 0 N - 1 x n cos ( 2 n + 1 ) k π 2 N .
    3. Compute { y k } , the N -point DCT outputs, from { y ¯ k } :
      y k = e - j π 2 N k α k y ¯ k ; k = 0 , 1 , , N - 1 .
    Assuming a real-valued input, the scheme outlined above can be implemented using
    2 N + 2 N log 2 2 N 2 = 2 N ( 1 + log 2 N ) real-valued multiply/adds
    where we are assuming use of the real-FFT described previously.
  • DCT vs. DFT Performance for Lowpass Signals: [link] suggests that DCT and DFT performance both equal KLT performance asymptotically, i.e., as transform dimension N . Indeed, this can be proven (see Jayant&Noll). A more practical question is: How do DCT and DFT performances comparefor finite N ? To answer this question, we will investigate the effects of input datablock length on the DCT and DFT. To start, consider the DFT of an N -length input block { x 0 , , x N - 1 } :
    y k = 1 N n = 0 N - 1 x n e - j 2 π N k n ; k = 0 N - 1 .
    It can be seen that the DFT outputs { X 0 , , X N - 1 } are samples of the discrete-time Fourier transform (DTFT) at frequencies { ω k = 2 π N k ; k = 0 N - 1 } :
    X ( ω ) = 1 N n = 0 N - 1 x n e - j ω n , y k = X 2 π N k .
    Now lets consider a periodic extension of { x 0 , , x N - 1 } which repeats this N -length sequence a total of L times:
    x n ' = 1 L x n N - N L 2 n < N L 2 0 else .
    Above, n N denotes “ n modulo N ” and L is assumed even (see [link] (a)-(b)). Here is the interesting point: the DTFT of the N L -length periodic extension equals the DTFT of the original N -length data block when sampled at the frequencies { ω k } !
    X ' ( ω k ) = 1 N L n = - N L / 2 N L / 2 - 1 x n ' e - j 2 π N k n = 1 N L = - L / 2 L / 2 - 1 m = 0 N - 1 x m + N ' e - j 2 π N k ( m + N ) = 1 L N = - L / 2 L / 2 - 1 m = 0 N - 1 x m e - j 2 π N k ( m + N ) = 1 N m = 0 N - 1 e - j 2 π N k m = X ( ω k ) .
    This implies that the overall spectral shape of X ' ( ω ) will be inherited by the DFT outputs { X 0 , , X N - 1 } . So what is the overall shape of X ' ( ω ) ? Say that { x n } is a lowpass process. Being lowpass, we expect the time-domain sequence { x n } to look relatively “smooth.”If the starting and ending points of the N -block, are different, however, the periodic extension { x n ' } will exhibit time-domain discontinuities (see [link] (b)) that are uncharacteristic of the process { x n } . These discontinuities imply that X ' ( ω ) will contain high-frequency content not present in the power spectrum of the lowpass input process.Based on our previous findings, if artificial high-frequency content exists at X ' ( ω k ) , it must also exist at X ( ω k ) = X k . In conclusion, the periodic extension { x n ' } provides an intuitive explanation of why short-block DFT analysis of lowpass signals oftenseems corrupted by “artificial” high-frequency content. So why is this important?Recall that transform performance is proportional to the dynamic range of transform output variances.If the DFT outputs corresponding to otherwise low spectral energy are artificially increased due to short-window effects,the dynamic range of { σ y k 2 } will decrease, and DFT performance will fall short of optimal.
    This figure contains four graphs, a, b, c, and d, described in the figure title. Each graph contains a common shape, which is a wave-like curve from a trough to a peak, and the area below it drawn down to the horizontal axis, highlighted in blue. Part a only contains one of these shapes. Part b contains one shape highlighted in blue, and three others simply as a dashed outline, with dotted lines indicating an infinite number of shapes going in the positive and negative direction on the horizontal axis. Part c contains two blue shapes, with the second flipped horizontally so that the peaks are facing each other and adjacent. Part d contains the same arrangement as part c, except that the two adjoined shapes are drawn three more times as dashed outlines, again with dotted lines indicating an infinite number of these shape arrangements. This figure contains four graphs, a, b, c, and d, described in the figure title. Each graph contains a common shape, which is a wave-like curve from a trough to a peak, and the area below it drawn down to the horizontal axis, highlighted in blue. Part a only contains one of these shapes. Part b contains one shape highlighted in blue, and three others simply as a dashed outline, with dotted lines indicating an infinite number of shapes going in the positive and negative direction on the horizontal axis. Part c contains two blue shapes, with the second flipped horizontally so that the peaks are facing each other and adjacent. Part d contains the same arrangement as part c, except that the two adjoined shapes are drawn three more times as dashed outlines, again with dotted lines indicating an infinite number of these shape arrangements.
    Illustration of periodic extensions inherent to DFT and DCT: (a) N -length DFT input block, (b) periodic extension inherent to DFT, (c) equivalent 2 N -length DFT-input-block for DCT, (d) periodic extension inherent to DCT.
    Now lets consider the DCT. From derivation of the fast algorithm, we know that the N DCT output magnitudes from length- N input { x 0 , , x N - 1 } are equal to the first N DFT output magnitudes from length- 2 N input { x ¯ n } —a mirrored version of { x 0 , , x N - 1 } . (See [link] (c).) Due to the mirroring effect, the periodic extension of { x ¯ n } will not have the discontinuities present in the periodic extension of { x n } , and so a 2 N -point DFT analysis of { x ¯ n } will not have “artificial” high frequency enhancement.Assuming that the process from which { x n } was extracted is lowpass, the DCT outputs will exhibit large dynamic range, andan improvement over DFT coding performance is expected. This is confirmed by [link] (a).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding. OpenStax CNX. Sep 25, 2009 Download for free at http://cnx.org/content/col11121/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding' conversation and receive update notifications?

Ask