<< Chapter < Page Chapter >> Page >
An algorithm for modifying the pitch of a solo human voice in the frequency domain.

Introduction

As with the time domain techniques for pitch shifting, we will deal with one window of the signal at a time (with the length and hop size between windows specified as inputs to the algorithm). For each window, we will multiply it by a Hanning window, and then take the fft. Since the fft is redundant for real signals, we need only work with half of the coefficients. We can then recreate the truncated coefficients at the end by complex conjugation.

Modified phase vocoder

Method

First, we need to identify peaks in the spectrum. For computational simplicity, we define a peak as any fft bin whose magnitude is greater than the magnitude of its two nearest neighbors on either side. We then assume that the area around a peak (as far away as halfway to the next peak) is part of the peak, or is in the peaks region of influence. Thus, wherever we shift the peak, this region will move with it. In order to figure out how much the peak must be shifted by (it is different for every peak), we must identify the frequency of the underlying sinusoid that caused it. We do this by fitting a parabola to the peak bin and its neighbor to either side. This involves solving three linear equations (using a matrix multiply). We then find the vertex of this parabola and assume that point to be the frequency of the peak. We then will want to shift that peak to some multiple of its current location. In other words, lower frequency peaks will not shift as far as higher frequency peaks. That factor is determined by a ratio of the target frequency to the detected frequency.

Partial spectrum of original signal

Note the clear peaks, but note also that they are not single points but rather slightly spread over several bins.

Since it is unlikely that the amount of bins we need to shift the peak by will be an integer, we will need to use linear interpolation to figure out what values to assign in bins where the peak and surrounding regions shift to, since the fft bins are discrete. A more sophisticated method of interpolation could work as well, but it would only add a great deal of complexity to what is already a very expensive algorithm. We then add the peak with the interpolated values into its new location in the spectrum and subtract the values of the original peak in the original location (thus, cutting and pasting it, in a sense, rather than just copying it). If a shift would cause any bins to move beyond the last bin of the fft in either direction, it should be assumed that it has moved into negative frequencies and should therefore be reflected back into the positive frequencies with a complex conjugation since the signal is real.

Finally, we must adjust the phase of the peak and its surrounding region to account for the changes we have made to its frequency. We multiply by a phasor of e^(j*dw*h), where dw is delta omega, the change in frequency, and h is the hop size between windows. We apply this phasor to all the bins in the area around the peak, thus preserving the phase relationships in the original signal for each peak, and by using the phasor, ensuring maximum frame to frame phase coherence. One last difficulty that arises is that these phasors must be accumulated from one frame to the next. This requires the tracking of peaks so that these phasors may be accumulated (since every peak will have a different dw, and thus a different phasor). One simple way of dealing with this is to look up the region from the frame before at the bin where the current peak in located, then assume the peak that influences that region of the frame is the same peak as the current peak, and accumulate the phasor accordingly. This principle works under the assumption that because audio signals (produced by a singer) must change frequency smoothly, and thus the peak can't have moved far from one frame to the next as the time difference is very small.

Partial spectrum of shifted signal

Partial spectrum of shifted signal. Direct comparison is difficult, but the peaks have been shifted. The first peak was shifted by only a few bins, corresponding to about 5% of its original frequency, while the higher peaks were shifted by many more bins, corresponding to 5% of their original frequencies.

Once the phase has been adjusted, the second half of the fft can be recreated by complex conjugation. Then taking an inverse fft, we should get values for this window of the output signal. By overlapping and adding these windows in the same manner in which they were analyzed, we will create an output signal corresponding to a pitch-corrected version of the input.

Limitations of the modified phase vocoder

There are three key problems with this approach. First, it is painfully slow. Taking the transform and fitting many parabolas within every window is extremely computationally expensive. Second, the formants of the singer (seen in the Fourier domain as the spectral envelope) are stretched or compressed depending on the direction of frequency shift. In reality, a singer's formants should not change when singing a higher or lower note. For small shifts, this will not be terribly noticeable, but for large shifts it will become problematic and very detectable. Finally, even with the phase correction, the output of the algorithm still sounds "phasy". The overlapping windows interfere constructively and destructively to create an effect somewhat like reverberation in a concert hall. The output signal seems to have less presence, or to be more distant from the microphone than the input signal.

Advantages

However, the frequency domain approach does have a few advantages over the time domain approaches. First, it deals well with noisy signals, which can throw off time domain techniques. Also, it can handle larger pitch shifts than time domain approaches. For instance, if you wanted to decrease the frequency by a very large amount, the period could become long enough that in PSOLA, the data that you were adding at each new pitch marker did not overlap with the other data, resulting in a very choppy and unacceptable signal. The frequency domain approach would have no problems with arbitrarily large shifts, as long as you don't mind the formant shifting that will accompany it. However, there are ways to try to restore the original formants after processing with an algorithm such as this, which would be fertile ground for further exploration. Finally, this algorithm has no difficulty in handling polyphonic signals. It could be used to shift the pitch of a track from a CD, or two voices or instruments in harmony. The time domain algorithms cannot handle anything but a monophonic input, because they require that there be a single dominant fundamental frequency.

Clearly, there are pros and cons to this algorithm, but given its complexity, and the huge difference in time it takes to process a sample with this algorithm versus a time domain algorithm, we have concluded that unless the signal is exceptionally noisy, extremely large pitch shifts are required, or the source material is polyphonic, it would be better off sticking with a time domain approach for pitch shifting, such as PSOLA.

Questions & Answers

Is there any normative that regulates the use of silver nanoparticles?
Damian Reply
what king of growth are you checking .?
Renato
What fields keep nano created devices from performing or assimulating ? Magnetic fields ? Are do they assimilate ?
Stoney Reply
why we need to study biomolecules, molecular biology in nanotechnology?
Adin Reply
?
Kyle
yes I'm doing my masters in nanotechnology, we are being studying all these domains as well..
Adin
why?
Adin
what school?
Kyle
biomolecules are e building blocks of every organics and inorganic materials.
Joe
anyone know any internet site where one can find nanotechnology papers?
Damian Reply
research.net
kanaga
sciencedirect big data base
Ernesto
Introduction about quantum dots in nanotechnology
Praveena Reply
what does nano mean?
Anassong Reply
nano basically means 10^(-9). nanometer is a unit to measure length.
Bharti
do you think it's worthwhile in the long term to study the effects and possibilities of nanotechnology on viral treatment?
Damian Reply
absolutely yes
Daniel
how to know photocatalytic properties of tio2 nanoparticles...what to do now
Akash Reply
it is a goid question and i want to know the answer as well
Maciej
characteristics of micro business
Abigail
for teaching engĺish at school how nano technology help us
Anassong
Do somebody tell me a best nano engineering book for beginners?
s. Reply
there is no specific books for beginners but there is book called principle of nanotechnology
NANO
what is fullerene does it is used to make bukky balls
Devang Reply
are you nano engineer ?
s.
fullerene is a bucky ball aka Carbon 60 molecule. It was name by the architect Fuller. He design the geodesic dome. it resembles a soccer ball.
Tarell
what is the actual application of fullerenes nowadays?
Damian
That is a great question Damian. best way to answer that question is to Google it. there are hundreds of applications for buck minister fullerenes, from medical to aerospace. you can also find plenty of research papers that will give you great detail on the potential applications of fullerenes.
Tarell
what is the Synthesis, properties,and applications of carbon nano chemistry
Abhijith Reply
Mostly, they use nano carbon for electronics and for materials to be strengthened.
Virgil
is Bucky paper clear?
CYNTHIA
carbon nanotubes has various application in fuel cells membrane, current research on cancer drug,and in electronics MEMS and NEMS etc
NANO
so some one know about replacing silicon atom with phosphorous in semiconductors device?
s. Reply
Yeah, it is a pain to say the least. You basically have to heat the substarte up to around 1000 degrees celcius then pass phosphene gas over top of it, which is explosive and toxic by the way, under very low pressure.
Harper
Do you know which machine is used to that process?
s.
how to fabricate graphene ink ?
SUYASH Reply
for screen printed electrodes ?
SUYASH
What is lattice structure?
s. Reply
of graphene you mean?
Ebrahim
or in general
Ebrahim
in general
s.
Graphene has a hexagonal structure
tahir
On having this app for quite a bit time, Haven't realised there's a chat room in it.
Cied
what is biological synthesis of nanoparticles
Sanket Reply
how did you get the value of 2000N.What calculations are needed to arrive at it
Smarajit Reply
Privacy Information Security Software Version 1.1a
Good
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get the best Algebra and trigonometry course in your pocket!





Source:  OpenStax, Ece 301 projects fall 2003. OpenStax CNX. Jan 22, 2004 Download for free at http://cnx.org/content/col10223/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Ece 301 projects fall 2003' conversation and receive update notifications?

Ask