<< Chapter < Page Chapter >> Page >
In this module, we give a brief introduction to sub-band coding, its relation to transform coding, and its use in MPEG-style audio coding.
  • Sub-band coding is a popular compression tool used in, for example, MPEG-style audio coding schemes (see [link] ).
    This is a flowchart that will be described from left to right. Beginning on the far left is an arrow pointing to the right, labeled input. This arrow points at a rounded box labeled sub-band analysis. Breaking off downward from the input arrow is a second arrow that points down, then to the right at a rounded box labeled psycho-acoustic model. To the right of the box labeled sub-band analysis is a larger arrow pointing to the right labeled freq. data. This arrow points at another box labeled bit alloc and quantization. The freq. data arrow also breaks off to point down at the aforementioned box, psycho-acoustic model. From the right of the psycho-acoustic model is another arrow pointing back up at the bit alloc and quantization box. To the right of that box is another arrow pointing directly to the right, labeled quant. data. This arrow points at a box labeled stream formatting. To the right of this box is a final arrow pointing to the right, labeled output. This is a flowchart that will be described from left to right. Beginning on the far left is an arrow pointing to the right, labeled input. This arrow points at a rounded box labeled sub-band analysis. Breaking off downward from the input arrow is a second arrow that points down, then to the right at a rounded box labeled psycho-acoustic model. To the right of the box labeled sub-band analysis is a larger arrow pointing to the right labeled freq. data. This arrow points at another box labeled bit alloc and quantization. The freq. data arrow also breaks off to point down at the aforementioned box, psycho-acoustic model. From the right of the psycho-acoustic model is another arrow pointing back up at the bit alloc and quantization box. To the right of that box is another arrow pointing directly to the right, labeled quant. data. This arrow points at a box labeled stream formatting. To the right of this box is a final arrow pointing to the right, labeled output.
    Simplified MPEG-style audio coding system.
  • [link] illustrates a generic subband coder. In short, the input signal is passed through a parallel bank ofanalysis filters { H i ( z ) } and the outputs are “downsampled” by a factor of N . Downsampling-by- N is a process which passes every N t h sample and ignores the rest, effectively decreasing the data rate by factor N . The downsampled outputs are quantized (using a potentially differentnumber of bits per branch—as in transform coding) for storage or transmission.Downsampling ensures that the number of data samples to store is not any larger than the number of data samples entering the coder;in [link] , N sub-band outputs are generated for every N system inputs.
    This is a large, complex flowchart which will be described from left to right, as this is the flow of the diagram. The diagram begins with the expression x(n), and from this expression is a line that splits into a series of arrows each pointing to the right at boxes containing the expressions H_0(z), H_1(z), and so on to a final box H_(N-1)(z). From the ends of each of these boxes are more arrows pointing to the right, this time each at an identical circle containing a down arrow and the variable N. To the right of these circles again are a series of arrows, labeled from top to bottom s_0(m), s_1(m), and so on to the final arrow, s_(N-1)(m). These arrows each point at boxes containing the variable Q. To the right of these boxes are another series of arrows pointing to the right, labeled s-tilde_0 (m). There is then a gap in the diagram, followed by a series of identical arrows to those preceding it, with the s-tilde variables. These arrows each point at circles containing an up arrow and the variable N. To the right of these circles are more arrows pointing at boxes containing the labels K_0(z), K_1(z), and so on to a final box containing K_(N-1)(z). Each of these boxes point with arrows to the right at a single circle containing a plus sign. From the plus sign is a final arrow pointing to the right, labeled u(n). This is a large, complex flowchart which will be described from left to right, as this is the flow of the diagram. The diagram begins with the expression x(n), and from this expression is a line that splits into a series of arrows each pointing to the right at boxes containing the expressions H_0(z), H_1(z), and so on to a final box H_(N-1)(z). From the ends of each of these boxes are more arrows pointing to the right, this time each at an identical circle containing a down arrow and the variable N. To the right of these circles again are a series of arrows, labeled from top to bottom s_0(m), s_1(m), and so on to the final arrow, s_(N-1)(m). These arrows each point at boxes containing the variable Q. To the right of these boxes are another series of arrows pointing to the right, labeled s-tilde_0 (m). There is then a gap in the diagram, followed by a series of identical arrows to those preceding it, with the s-tilde variables. These arrows each point at circles containing an up arrow and the variable N. To the right of these circles are more arrows pointing at boxes containing the labels K_0(z), K_1(z), and so on to a final box containing K_(N-1)(z). Each of these boxes point with arrows to the right at a single circle containing a plus sign. From the plus sign is a final arrow pointing to the right, labeled u(n).
    Sub-band coder/decoder with scalar quantization.
  • Relationship to Transform Coding:   Conceptually, sub-band coding (SC) is very similar to transform coding (TC).Like TC, SC analyzes a block of input data and produces a set of linearly transformed outputs, now called “subband outputs.”Like TC, these transformed outputs are independently quantized in a way that yields coding gain over straightforward PCM.And like TC, it is possible to derive an optimal bit allocation which minimizes reconstruction error variance for a specified average bit rate.In fact, an N -band SC system with length- N filters is equivalent to a TC system with N × N transformation matrix T : the decimated convolution operation which defines the i t h analysis branch of [link] is identical to an inner product between an N -length input block and t i t , the i t h row of T . (See [link] .)
    This is a two-part figure. part a contains a series of horizontally connected boxes in a single row, labeled h_0, h_1, h_2, h_3 from left to right, followed by a long arrow that points at the expression s_i(m). In a second row of this part of the figure, a series of horizontally connected boxes continues at the same vertical position that the first row's boxes end. These boxes are also labeled h_0, h_1, h_2, and h_3. To the right of these is a short arrow that ends at the same part of the page that the upper row ends, pointing at the variable s_i(m-1) Below these is a final row of the first part of the figure, containing a series of connected boxes that span the entire width of the page. From left to right, the expressions inside the boxes are x(Nm), x(Nm-1), x(Nm-2), x(Nm-3), x(Nm-4), x(Nm-5), x(Nm-6), and x(Nm-7). The second part of the figure is drawn in a similar fashion, except that there is one large box in place of the connected boxes from the first part. The large box in the first row contains the expression t_i^t, and the arrow points at the expression y_i(m). In the second row, the box contains the same expression as the first row, and its arrow points at the expression y_i(m-1). The bottom row contains two connected boxes rather than the eight connected boxes in the first part of this figure. The two boxes contain the expressions x(m) on the left, and x(m-1) on the right. This is a two-part figure. part a contains a series of horizontally connected boxes in a single row, labeled h_0, h_1, h_2, h_3 from left to right, followed by a long arrow that points at the expression s_i(m). In a second row of this part of the figure, a series of horizontally connected boxes continues at the same vertical position that the first row's boxes end. These boxes are also labeled h_0, h_1, h_2, and h_3. To the right of these is a short arrow that ends at the same part of the page that the upper row ends, pointing at the variable s_i(m-1) Below these is a final row of the first part of the figure, containing a series of connected boxes that span the entire width of the page. From left to right, the expressions inside the boxes are x(Nm), x(Nm-1), x(Nm-2), x(Nm-3), x(Nm-4), x(Nm-5), x(Nm-6), and x(Nm-7). The second part of the figure is drawn in a similar fashion, except that there is one large box in place of the connected boxes from the first part. The large box in the first row contains the expression t_i^t, and the arrow points at the expression y_i(m). In the second row, the box contains the same expression as the first row, and its arrow points at the expression y_i(m-1). The bottom row contains two connected boxes rather than the eight connected boxes in the first part of this figure. The two boxes contain the expressions x(m) on the left, and x(m-1) on the right.
    Equivalence between (a) N -band sub-band coding with length- N filters and (b) N × N transform coding (shown for N = 4 ) . Note: impulse response coefficients { h n } correspond to filter H i ( z ) .
    So what kind of frequency responses characterize the most-commonly used transformation matrices?Lets look at the DFT first. For the i t h row, we have
    | H i ( ω ) | = n = 0 N - 1 e - j 2 π N i n e - j ω n = n = 0 N - 1 e - j ( ω + 2 π N i ) n = sin ( N 2 ( ω + 2 π i N ) ) sin ( 1 2 ( ω + 2 π i N ) ) .
    [link] plots these magnitude responses. Note that the i t h DFT row acts as a bandpass filter with center frequency 2 π i / N and stopband attenuation of 6 dB. [link] plots the magnitude responses of DCT filters, where we see that they have even less stopband attenuation.
    This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 0. The figure contains seven disconnected peaks, each approximately one horizontal unit in width, with the exception of the fourth peak, which is nearly two units wide. The vertical values at the waves' peak are the following from left to right: -9, -8, -6, 0, -6, -8, -9. Beyond these curves are a series of dashed peaks of varying heights that are even in width and alignment with the aforementioned solid peaks, but are of different heights as if each peak's different height is drawn over every other peak in the chart. This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 0. The figure contains seven disconnected peaks, each approximately one horizontal unit in width, with the exception of the fourth peak, which is nearly two units wide. The vertical values at the waves' peak are the following from left to right: -9, -8, -6, 0, -6, -8, -9. Beyond these curves are a series of dashed peaks of varying heights that are even in width and alignment with the aforementioned solid peaks, but are of different heights as if each peak's different height is drawn over every other peak in the chart.
    Magnitude responses of DFT basis vectors for N = 8 .
    This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 2. The figure contains six disconnected peaks, although the figure is exactly symmetrical about a vertical line at omega=0. The first wave is approximately one unit wide, and reaches a vertical value of -4. The second wave is approximately 1.5 units wide and reaches a vertical value of 0. The third wave is approximately 0.5 units wide and reaches a vertical value of -9. The latter three waves follow the same progression after the reflection of symmetry. This figure is a cartesian graph, plotting the horizontal axis omega of values -3 to 3, and vertical axis dB of values -20 to 2. The figure contains six disconnected peaks, although the figure is exactly symmetrical about a vertical line at omega=0. The first wave is approximately one unit wide, and reaches a vertical value of -4. The second wave is approximately 1.5 units wide and reaches a vertical value of 0. The third wave is approximately 0.5 units wide and reaches a vertical value of -9. The latter three waves follow the same progression after the reflection of symmetry.
    Magnitude responses of DCT basis vectors for N = 8 .
  • Psycho-acoustic Motivations:   We have seen that N -band SC with length- N filters is equivalent to N × N transform coding. But is transform coding the best technique to use in high qualityaudio coders? It turns out that the key to preserving sonic quality under high levels of compression is to shape the reconstruction error so that theear will not hear it . When we talk about psychoacoustics later in the course, we'll see thatthe properties of noise tolerated by the ear/brain are most easily described in the frequency domain.Hence, bitrate allocation based on psychoacoustic models is most conveniently performed when SC outputs represent signal componentsin isolated frequency bands . In other words, instead of allocating fewer bits to sub-band outputshaving a smaller effect on reconstruction error variance, we will allocate fewer bits to sub-band outputs having a smaller contributionto perceived reconstruction error. We have seen that length- N DFT and DCT filters give a 2 π / N bandwidth with no better than 6 dB of stopband attenuation. The SC filters required for high-quality audio coding require muchbetter stopband performance, say > 90 dB. It turns out that filters with passband width 2 π / N , narrow transition bands, and descent stopband attenuation require impulse responselengths N . In N -band SC there is no constraint on filter length, unlike N -band TC. This is the advantage of SC over TC when it comes to audiocoding A similar conclusion resulted from our comparison of DPCM and TC of equal dimension N ; it was reasoned that the longer “effective” input length of DPCM with N -length prediction filtering gave performance improvement relative to TC. .
  • To summarize, the key differences between transform and sub-band coding are the following.
    1. SC outputs measure relative signal strength in different frequency bands, while TC outputs might not have a strictbandpass correspondence.
    2. The TC input window length is equal to the number of TC outputs, while the SC input window lengthis usually much greater than number of SC outputs (16 × greater in MPEG).
  • At first glance SC implementation complexity is a valid concern. Recall that in TC, fast N × N transforms such as the DCT and DFT could be performed using N log 2 N multiply/adds! Must we give up this computational efficiency for better frequency resolution?Fortunately the answer is no ; clever SC implementations are built around fast DFT or DCT transforms and are very efficient as a result.Fast sub-band coding, in fact, lies at the heart of MPEG audio compression (see ISO/IEC 13818-3).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding. OpenStax CNX. Sep 25, 2009 Download for free at http://cnx.org/content/col11121/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding' conversation and receive update notifications?

Ask