<< Chapter < Page Chapter >> Page >
An algorithm for modifying the pitch of a solo human voice in the frequency domain.


As with the time domain techniques for pitch shifting, we will deal with one window of the signal at a time (with the length and hop size between windows specified as inputs to the algorithm). For each window, we will multiply it by a Hanning window, and then take the fft. Since the fft is redundant for real signals, we need only work with half of the coefficients. We can then recreate the truncated coefficients at the end by complex conjugation.

Modified phase vocoder


First, we need to identify peaks in the spectrum. For computational simplicity, we define a peak as any fft bin whose magnitude is greater than the magnitude of its two nearest neighbors on either side. We then assume that the area around a peak (as far away as halfway to the next peak) is part of the peak, or is in the peaks region of influence. Thus, wherever we shift the peak, this region will move with it. In order to figure out how much the peak must be shifted by (it is different for every peak), we must identify the frequency of the underlying sinusoid that caused it. We do this by fitting a parabola to the peak bin and its neighbor to either side. This involves solving three linear equations (using a matrix multiply). We then find the vertex of this parabola and assume that point to be the frequency of the peak. We then will want to shift that peak to some multiple of its current location. In other words, lower frequency peaks will not shift as far as higher frequency peaks. That factor is determined by a ratio of the target frequency to the detected frequency.

Partial spectrum of original signal

Note the clear peaks, but note also that they are not single points but rather slightly spread over several bins.

Since it is unlikely that the amount of bins we need to shift the peak by will be an integer, we will need to use linear interpolation to figure out what values to assign in bins where the peak and surrounding regions shift to, since the fft bins are discrete. A more sophisticated method of interpolation could work as well, but it would only add a great deal of complexity to what is already a very expensive algorithm. We then add the peak with the interpolated values into its new location in the spectrum and subtract the values of the original peak in the original location (thus, cutting and pasting it, in a sense, rather than just copying it). If a shift would cause any bins to move beyond the last bin of the fft in either direction, it should be assumed that it has moved into negative frequencies and should therefore be reflected back into the positive frequencies with a complex conjugation since the signal is real.

Finally, we must adjust the phase of the peak and its surrounding region to account for the changes we have made to its frequency. We multiply by a phasor of e^(j*dw*h), where dw is delta omega, the change in frequency, and h is the hop size between windows. We apply this phasor to all the bins in the area around the peak, thus preserving the phase relationships in the original signal for each peak, and by using the phasor, ensuring maximum frame to frame phase coherence. One last difficulty that arises is that these phasors must be accumulated from one frame to the next. This requires the tracking of peaks so that these phasors may be accumulated (since every peak will have a different dw, and thus a different phasor). One simple way of dealing with this is to look up the region from the frame before at the bin where the current peak in located, then assume the peak that influences that region of the frame is the same peak as the current peak, and accumulate the phasor accordingly. This principle works under the assumption that because audio signals (produced by a singer) must change frequency smoothly, and thus the peak can't have moved far from one frame to the next as the time difference is very small.

Partial spectrum of shifted signal

Partial spectrum of shifted signal. Direct comparison is difficult, but the peaks have been shifted. The first peak was shifted by only a few bins, corresponding to about 5% of its original frequency, while the higher peaks were shifted by many more bins, corresponding to 5% of their original frequencies.

Once the phase has been adjusted, the second half of the fft can be recreated by complex conjugation. Then taking an inverse fft, we should get values for this window of the output signal. By overlapping and adding these windows in the same manner in which they were analyzed, we will create an output signal corresponding to a pitch-corrected version of the input.

Limitations of the modified phase vocoder

There are three key problems with this approach. First, it is painfully slow. Taking the transform and fitting many parabolas within every window is extremely computationally expensive. Second, the formants of the singer (seen in the Fourier domain as the spectral envelope) are stretched or compressed depending on the direction of frequency shift. In reality, a singer's formants should not change when singing a higher or lower note. For small shifts, this will not be terribly noticeable, but for large shifts it will become problematic and very detectable. Finally, even with the phase correction, the output of the algorithm still sounds "phasy". The overlapping windows interfere constructively and destructively to create an effect somewhat like reverberation in a concert hall. The output signal seems to have less presence, or to be more distant from the microphone than the input signal.


However, the frequency domain approach does have a few advantages over the time domain approaches. First, it deals well with noisy signals, which can throw off time domain techniques. Also, it can handle larger pitch shifts than time domain approaches. For instance, if you wanted to decrease the frequency by a very large amount, the period could become long enough that in PSOLA, the data that you were adding at each new pitch marker did not overlap with the other data, resulting in a very choppy and unacceptable signal. The frequency domain approach would have no problems with arbitrarily large shifts, as long as you don't mind the formant shifting that will accompany it. However, there are ways to try to restore the original formants after processing with an algorithm such as this, which would be fertile ground for further exploration. Finally, this algorithm has no difficulty in handling polyphonic signals. It could be used to shift the pitch of a track from a CD, or two voices or instruments in harmony. The time domain algorithms cannot handle anything but a monophonic input, because they require that there be a single dominant fundamental frequency.

Clearly, there are pros and cons to this algorithm, but given its complexity, and the huge difference in time it takes to process a sample with this algorithm versus a time domain algorithm, we have concluded that unless the signal is exceptionally noisy, extremely large pitch shifts are required, or the source material is polyphonic, it would be better off sticking with a time domain approach for pitch shifting, such as PSOLA.

Questions & Answers

how can chip be made from sand
Eke Reply
is this allso about nanoscale material
are nano particles real
Missy Reply
Hello, if I study Physics teacher in bachelor, can I study Nanotechnology in master?
Lale Reply
no can't
where is the latest information on a no technology how can I find it
where we get a research paper on Nano chemistry....?
Maira Reply
nanopartical of organic/inorganic / physical chemistry , pdf / thesis / review
what are the products of Nano chemistry?
Maira Reply
There are lots of products of nano chemistry... Like nano coatings.....carbon fiber.. And lots of others..
Even nanotechnology is pretty much all about chemistry... Its the chemistry on quantum or atomic level
no nanotechnology is also a part of physics and maths it requires angle formulas and some pressure regarding concepts
Preparation and Applications of Nanomaterial for Drug Delivery
Hafiz Reply
Application of nanotechnology in medicine
has a lot of application modern world
what is variations in raman spectra for nanomaterials
Jyoti Reply
ya I also want to know the raman spectra
I only see partial conversation and what's the question here!
Crow Reply
what about nanotechnology for water purification
RAW Reply
please someone correct me if I'm wrong but I think one can use nanoparticles, specially silver nanoparticles for water treatment.
yes that's correct
I think
Nasa has use it in the 60's, copper as water purification in the moon travel.
nanocopper obvius
what is the stm
Brian Reply
is there industrial application of fullrenes. What is the method to prepare fullrene on large scale.?
industrial application...? mmm I think on the medical side as drug carrier, but you should go deeper on your research, I may be wrong
How we are making nano material?
what is a peer
What is meant by 'nano scale'?
What is STMs full form?
scanning tunneling microscope
how nano science is used for hydrophobicity
Do u think that Graphene and Fullrene fiber can be used to make Air Plane body structure the lightest and strongest. Rafiq
what is differents between GO and RGO?
what is simplest way to understand the applications of nano robots used to detect the cancer affected cell of human body.? How this robot is carried to required site of body cell.? what will be the carrier material and how can be detected that correct delivery of drug is done Rafiq
analytical skills graphene is prepared to kill any type viruses .
Any one who tell me about Preparation and application of Nanomaterial for drug Delivery
what is Nano technology ?
Bob Reply
write examples of Nano molecule?
The nanotechnology is as new science, to scale nanometric
nanotechnology is the study, desing, synthesis, manipulation and application of materials and functional systems through control of matter at nanoscale
how did you get the value of 2000N.What calculations are needed to arrive at it
Smarajit Reply
Privacy Information Security Software Version 1.1a
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now

Source:  OpenStax, Ece 301 projects fall 2003. OpenStax CNX. Jan 22, 2004 Download for free at http://cnx.org/content/col10223/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Ece 301 projects fall 2003' conversation and receive update notifications?