<< Chapter < Page Chapter >> Page >
An algorithm for modifying the pitch of a solo human voice in the frequency domain.

Introduction

As with the time domain techniques for pitch shifting, we will deal with one window of the signal at a time (with the length and hop size between windows specified as inputs to the algorithm). For each window, we will multiply it by a Hanning window, and then take the fft. Since the fft is redundant for real signals, we need only work with half of the coefficients. We can then recreate the truncated coefficients at the end by complex conjugation.

Modified phase vocoder

Method

First, we need to identify peaks in the spectrum. For computational simplicity, we define a peak as any fft bin whose magnitude is greater than the magnitude of its two nearest neighbors on either side. We then assume that the area around a peak (as far away as halfway to the next peak) is part of the peak, or is in the peaks region of influence. Thus, wherever we shift the peak, this region will move with it. In order to figure out how much the peak must be shifted by (it is different for every peak), we must identify the frequency of the underlying sinusoid that caused it. We do this by fitting a parabola to the peak bin and its neighbor to either side. This involves solving three linear equations (using a matrix multiply). We then find the vertex of this parabola and assume that point to be the frequency of the peak. We then will want to shift that peak to some multiple of its current location. In other words, lower frequency peaks will not shift as far as higher frequency peaks. That factor is determined by a ratio of the target frequency to the detected frequency.

Partial spectrum of original signal

Note the clear peaks, but note also that they are not single points but rather slightly spread over several bins.

Since it is unlikely that the amount of bins we need to shift the peak by will be an integer, we will need to use linear interpolation to figure out what values to assign in bins where the peak and surrounding regions shift to, since the fft bins are discrete. A more sophisticated method of interpolation could work as well, but it would only add a great deal of complexity to what is already a very expensive algorithm. We then add the peak with the interpolated values into its new location in the spectrum and subtract the values of the original peak in the original location (thus, cutting and pasting it, in a sense, rather than just copying it). If a shift would cause any bins to move beyond the last bin of the fft in either direction, it should be assumed that it has moved into negative frequencies and should therefore be reflected back into the positive frequencies with a complex conjugation since the signal is real.

Finally, we must adjust the phase of the peak and its surrounding region to account for the changes we have made to its frequency. We multiply by a phasor of e^(j*dw*h), where dw is delta omega, the change in frequency, and h is the hop size between windows. We apply this phasor to all the bins in the area around the peak, thus preserving the phase relationships in the original signal for each peak, and by using the phasor, ensuring maximum frame to frame phase coherence. One last difficulty that arises is that these phasors must be accumulated from one frame to the next. This requires the tracking of peaks so that these phasors may be accumulated (since every peak will have a different dw, and thus a different phasor). One simple way of dealing with this is to look up the region from the frame before at the bin where the current peak in located, then assume the peak that influences that region of the frame is the same peak as the current peak, and accumulate the phasor accordingly. This principle works under the assumption that because audio signals (produced by a singer) must change frequency smoothly, and thus the peak can't have moved far from one frame to the next as the time difference is very small.

Partial spectrum of shifted signal

Partial spectrum of shifted signal. Direct comparison is difficult, but the peaks have been shifted. The first peak was shifted by only a few bins, corresponding to about 5% of its original frequency, while the higher peaks were shifted by many more bins, corresponding to 5% of their original frequencies.

Once the phase has been adjusted, the second half of the fft can be recreated by complex conjugation. Then taking an inverse fft, we should get values for this window of the output signal. By overlapping and adding these windows in the same manner in which they were analyzed, we will create an output signal corresponding to a pitch-corrected version of the input.

Limitations of the modified phase vocoder

There are three key problems with this approach. First, it is painfully slow. Taking the transform and fitting many parabolas within every window is extremely computationally expensive. Second, the formants of the singer (seen in the Fourier domain as the spectral envelope) are stretched or compressed depending on the direction of frequency shift. In reality, a singer's formants should not change when singing a higher or lower note. For small shifts, this will not be terribly noticeable, but for large shifts it will become problematic and very detectable. Finally, even with the phase correction, the output of the algorithm still sounds "phasy". The overlapping windows interfere constructively and destructively to create an effect somewhat like reverberation in a concert hall. The output signal seems to have less presence, or to be more distant from the microphone than the input signal.

Advantages

However, the frequency domain approach does have a few advantages over the time domain approaches. First, it deals well with noisy signals, which can throw off time domain techniques. Also, it can handle larger pitch shifts than time domain approaches. For instance, if you wanted to decrease the frequency by a very large amount, the period could become long enough that in PSOLA, the data that you were adding at each new pitch marker did not overlap with the other data, resulting in a very choppy and unacceptable signal. The frequency domain approach would have no problems with arbitrarily large shifts, as long as you don't mind the formant shifting that will accompany it. However, there are ways to try to restore the original formants after processing with an algorithm such as this, which would be fertile ground for further exploration. Finally, this algorithm has no difficulty in handling polyphonic signals. It could be used to shift the pitch of a track from a CD, or two voices or instruments in harmony. The time domain algorithms cannot handle anything but a monophonic input, because they require that there be a single dominant fundamental frequency.

Clearly, there are pros and cons to this algorithm, but given its complexity, and the huge difference in time it takes to process a sample with this algorithm versus a time domain algorithm, we have concluded that unless the signal is exceptionally noisy, extremely large pitch shifts are required, or the source material is polyphonic, it would be better off sticking with a time domain approach for pitch shifting, such as PSOLA.

Questions & Answers

Application of nanotechnology in medicine
what is variations in raman spectra for nanomaterials
Jyoti Reply
I only see partial conversation and what's the question here!
Crow Reply
what about nanotechnology for water purification
RAW Reply
please someone correct me if I'm wrong but I think one can use nanoparticles, specially silver nanoparticles for water treatment.
Damian
yes that's correct
Professor
I think
Professor
what is the stm
Brian Reply
is there industrial application of fullrenes. What is the method to prepare fullrene on large scale.?
Rafiq
industrial application...? mmm I think on the medical side as drug carrier, but you should go deeper on your research, I may be wrong
Damian
How we are making nano material?
LITNING Reply
what is a peer
LITNING Reply
What is meant by 'nano scale'?
LITNING Reply
What is STMs full form?
LITNING
scanning tunneling microscope
Sahil
how nano science is used for hydrophobicity
Santosh
Do u think that Graphene and Fullrene fiber can be used to make Air Plane body structure the lightest and strongest. Rafiq
Rafiq
what is differents between GO and RGO?
Mahi
what is simplest way to understand the applications of nano robots used to detect the cancer affected cell of human body.? How this robot is carried to required site of body cell.? what will be the carrier material and how can be detected that correct delivery of drug is done Rafiq
Rafiq
if virus is killing to make ARTIFICIAL DNA OF GRAPHENE FOR KILLED THE VIRUS .THIS IS OUR ASSUMPTION
Anam
analytical skills graphene is prepared to kill any type viruses .
Anam
what is Nano technology ?
Bob Reply
write examples of Nano molecule?
Bob
The nanotechnology is as new science, to scale nanometric
brayan
nanotechnology is the study, desing, synthesis, manipulation and application of materials and functional systems through control of matter at nanoscale
Damian
Is there any normative that regulates the use of silver nanoparticles?
Damian Reply
what king of growth are you checking .?
Renato
What fields keep nano created devices from performing or assimulating ? Magnetic fields ? Are do they assimilate ?
Stoney Reply
why we need to study biomolecules, molecular biology in nanotechnology?
Adin Reply
?
Kyle
yes I'm doing my masters in nanotechnology, we are being studying all these domains as well..
Adin
why?
Adin
what school?
Kyle
biomolecules are e building blocks of every organics and inorganic materials.
Joe
anyone know any internet site where one can find nanotechnology papers?
Damian Reply
research.net
kanaga
sciencedirect big data base
Ernesto
Introduction about quantum dots in nanotechnology
Praveena Reply
hi
Loga
what does nano mean?
Anassong Reply
nano basically means 10^(-9). nanometer is a unit to measure length.
Bharti
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!





Source:  OpenStax, Ece 301 projects fall 2003. OpenStax CNX. Jan 22, 2004 Download for free at http://cnx.org/content/col10223/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Ece 301 projects fall 2003' conversation and receive update notifications?

Ask