<< Chapter < Page Chapter >> Page >
Using linear predictive coding to change the voice quality of a source speaker to a target.

Background on linear predictive coding

Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples. Similar to the method employed by the cepstrum , we can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound). The method of looking at speech as two parts which can be separated is known as the Source Filter Model of Speech .

We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:

s ^ = k = 1 p a k s [ n k ]

The number of samples (p) is referred to as the “order” of the LPC. As p approaches infinity, we should be able to predict the nth sample exactly. However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation. The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value. Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible:

e [ n ] = s [ n ] s ^ [ n ] = s [ n ] k = 1 p a k s [ n k ]

We can take the z-transform of the above equation:

E ( z ) = S ( z ) k = 1 p a k S ( z ) z k = S ( z ) [ 1 k = 1 p a k z k ] = S ( z ) A ( z )

Thus, we can represent the error signal E(z) as the product of our original speech signal S(z) and the transfer function A(z). A(z) represents an all-zero digital filter, where the ak coefficients correspond to the zeros in the filter’s z-plane. Similarly, we can represent our original speech signal S(z) as the product of the error signal E(z) and the transfer function 1 / A(z):

S ( z ) = E ( z ) A ( z )

The transfer function 1/A(z) represents an all-pole digital filter, where the ak coefficients correspond to the poles in the filter’s z-plane. Note that the roots of the A(z) polynomial must all lie within the unit circle to ensure stability of this filter.

The spectrum of the error signal E(z) will have a different structure depending on whether the sound it comes from is voiced or unvoiced. Voiced sounds are produced by vibrations of the vocal cords. Their spectrum is periodic with some fundamental frequency (which corresponds to the pitch). Examples of voiced sounds include all of the vowels. Unvoiced signals, however, do not have a fundamental frequency or a harmonic structure. Instead, they are just white noise.

Lpc in voice conversion

In speech processing, computing the LPC coefficients of a signal gives us its ak values. From here, we can get the filter A(z) as described above. A(z) is the transfer function between the original signal s[n] and the excitation component e[n]. The transfer function of a speech signal is the part dealing with the voice quality: what distinguishes one person’s voice from another. The excitation component of a speech signal is the part dealing with the particular sounds and words that are produced. In the time domain, the excitation and transfer function are convolved to create the output voice signal. As shown in the figure below, we can put the original signal through the filter to get the excitation component. Putting the excitation component through the inverse filter (1 / A(z)) gives us the original signal back.

A voice conversion algorithm

Using Linear Predictive Coding to separate the two parts of a speech signal: transfer function and excitation.

We can perform voice conversion by replacing the excitation component from the given speaker with a new one. Since we are still using the same transfer function A(z), the resulting speech sample will have the same voice quality as the original. However, since we are using a different excitation component, the resulting speech sample will have the same sounds as the new speaker.


In speech processing, a process called pre-emphasis is applied to the input signal before the LPC analysis. During the reconstruction following the LPC analysis, a de-emphasis process is applied to the signal to reverse the effects of pre-emphasis.

Pre- and de- emphasis are necessary because, in the spectrum of a human speech signal, the energy in the signal decreases as the frequency increases. Pre-emphasis increases the energy in parts of the signal by an amount inversely proportional to its frequency. Thus, as the frequency increases, pre-emphasis raises the energy of the speech signal by an increasing amount. This process therefore serves to flatten the signal so that the resulting spectrum consists of formants of similar heights. (Formants are the highly visible resonances or peaks in the spectrum of the speech signal, where most of the energy is concentrated.) The flatter spectrum allows the LPC analysis to more accurately model the speech segment. Without pre-emphasis, the linear prediction would incorrectly focus on the lower-frequency components of speech, losing important information about certain sounds.


Deng, Li and Douglas O”Shaughnessy. Speech Processing: A Dynamic and Optimization-Oriented Approach. Marcel Dekker, Inc: New York. 2003.

Gold, Ben and Nelson Morgan. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley and Sons, Inc: New York. 2000.

Lemmetty, Sami. Review of Speech Synthesis Technology. (Master’s Thesis: Helsinki University of Technology) March 1999. (External Link) .

Markel, J.D. and A.H. Gray, Jr. Linear Predition of Speech. Springer-Verlag: Berlin. 1976.

Questions & Answers

Is there any normative that regulates the use of silver nanoparticles?
Damian Reply
what king of growth are you checking .?
What fields keep nano created devices from performing or assimulating ? Magnetic fields ? Are do they assimilate ?
Stoney Reply
why we need to study biomolecules, molecular biology in nanotechnology?
Adin Reply
yes I'm doing my masters in nanotechnology, we are being studying all these domains as well..
what school?
biomolecules are e building blocks of every organics and inorganic materials.
anyone know any internet site where one can find nanotechnology papers?
Damian Reply
sciencedirect big data base
Introduction about quantum dots in nanotechnology
Praveena Reply
what does nano mean?
Anassong Reply
nano basically means 10^(-9). nanometer is a unit to measure length.
do you think it's worthwhile in the long term to study the effects and possibilities of nanotechnology on viral treatment?
Damian Reply
absolutely yes
how to know photocatalytic properties of tio2 nanoparticles...what to do now
Akash Reply
it is a goid question and i want to know the answer as well
characteristics of micro business
for teaching engĺish at school how nano technology help us
Do somebody tell me a best nano engineering book for beginners?
s. Reply
there is no specific books for beginners but there is book called principle of nanotechnology
what is fullerene does it is used to make bukky balls
Devang Reply
are you nano engineer ?
fullerene is a bucky ball aka Carbon 60 molecule. It was name by the architect Fuller. He design the geodesic dome. it resembles a soccer ball.
what is the actual application of fullerenes nowadays?
That is a great question Damian. best way to answer that question is to Google it. there are hundreds of applications for buck minister fullerenes, from medical to aerospace. you can also find plenty of research papers that will give you great detail on the potential applications of fullerenes.
what is the Synthesis, properties,and applications of carbon nano chemistry
Abhijith Reply
Mostly, they use nano carbon for electronics and for materials to be strengthened.
is Bucky paper clear?
carbon nanotubes has various application in fuel cells membrane, current research on cancer drug,and in electronics MEMS and NEMS etc
so some one know about replacing silicon atom with phosphorous in semiconductors device?
s. Reply
Yeah, it is a pain to say the least. You basically have to heat the substarte up to around 1000 degrees celcius then pass phosphene gas over top of it, which is explosive and toxic by the way, under very low pressure.
Do you know which machine is used to that process?
how to fabricate graphene ink ?
for screen printed electrodes ?
What is lattice structure?
s. Reply
of graphene you mean?
or in general
in general
Graphene has a hexagonal structure
On having this app for quite a bit time, Haven't realised there's a chat room in it.
what is biological synthesis of nanoparticles
Sanket Reply
how did you get the value of 2000N.What calculations are needed to arrive at it
Smarajit Reply
Privacy Information Security Software Version 1.1a
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Methods for voice conversion. OpenStax CNX. Dec 21, 2004 Download for free at http://cnx.org/content/col10252/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Methods for voice conversion' conversation and receive update notifications?