<< Chapter < Page Chapter >> Page >

Synthesis of voiced speech

Download the file coeff.mat for the following section.

Download the file coeff.mat and load it into the Matlab workspace using the load command. This will load three sets of filter coefficients: A 1 , A 2 , and A 3 for the vocal tract model in [link] and [link] . Each vector contains coefficients { a 1 , a 2 , , a 15 } for an all-pole filter of order 15.

We will now synthesize voiced speech segments for each of these sets of coefficients.First write a Matlab function x=exciteV(N,Np) which creates a length N excitation for voiced speech, with a pitch period of Np samples. The output vector x should contain a discrete-time impulse train with period Np (e.g. [1 0 0 0 1 0 0 ]).

Assuming a sampling frequency of 8 kHz (0.125 ms/sample), create a 40 millisecond-long excitation with a pitch period of 8 ms,and filter it using [link] for each set of coefficients. For this, you may use the command

s = filter(1,[1 -A],x)

where A is the row vector of filter coefficients (see Matlab's help on filter for details). Plot each of the three filtered signals.Use subplot() and orient tall to place them in the same figure.

We will now compute the frequency response of each of these filters. The frequency response may be obtained by evaluating [link] at points along z = e j ω . Matlab will compute this with the command [H,W]=freqz(1,[1 -A],512) , where A is the vector of coefficients. Plot the magnitude of each response versus frequency in Hertz.Use subplot() and orient tall to plot them in the same figure.

The location of the peaks in the spectrum correspond to the formant frequencies.For each vowel signal, estimate the first three formants (in Hz) and list them in the figure.

Now generate the three signals again, but use an excitation which is 1-2 seconds long.Listen to the filtered signals using soundsc . Can you hear qualitative differences in the signals?Can you identify the vowel sounds?

Inlab report

Hand in the following:
  • A figure containing the three time-domain plots of the voiced signals.
  • Plots of the frequency responses for the three filters. Make sure to label the frequency axis in units of Hertz.
  • For each of the three filters, list the approximate center frequency of the first three formant peaks.
  • Comment on the audio quality of the synthesized signals.

Linear predictive coding

The filter coefficients which were provided in the previous section were determined using a technique called linear predictive coding (LPC). LPC is a fundamental component of many speech processing applications,including compression, recognition, and synthesis.

In the following discussion of LPC, we will view the speech signal as a discrete-time random process.

Forward linear prediction

Suppose we have a discrete-time random process { . . . , S - 1 , S 0 , S 1 , S 2 , . . . } whose elements have some degree of correlation.The goal of forward linear prediction is to predict the sample S n using a linear combination of the previous P samples.

S ^ n = k = 1 P a k S n - k

P is called the order of the predictor. We may represent the error of predicting S n by a random sequence e n .

e n = S n - S ^ n e n = S n - k = 1 P a k S n - k

An optimal set of prediction coefficients a k for [link] may be determined by minimizing the mean-square error E [ e n 2 ] . Note that since the error is generally a function of n , the prediction coefficients will also be functions of n . To simplify notation, let us first define the following column vectors.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Purdue digital signal processing labs (ece 438). OpenStax CNX. Sep 14, 2009 Download for free at http://cnx.org/content/col10593/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?

Ask