<< Chapter < Page Chapter >> Page >

Filtering and resampling

After the song data is imported, the signal is then resampled to 8000 samples per second in order to reduce the number of columns in the spectrogram. This will speed up later computations but still leaves enough resolution in the data for accurate results.

Then the data is high-pass filtered using a 30 th order filter with a cutoff frequency around 2KHz (half the bandwidth of the resampled signal). Filtering is used because the higher frequencies in songs are more unique to each individual song. The bass, however, tends to overshadow these frequencies, thus the filter is used make fingerprint include more high frequencies points. Testing has shown that the algorithm has a much easier time distinguishing songs after they are high-pass filtering.

The spectrogram

The spectrogram of the signal is then taken in order to view the frequencies present in each time slice. The spectrogram below is from a 10 second noisy recording.

The effect of the low-pass filter is clearly visible in the spectrogram. However, local maxima in the low frequencies still exist and will still show up in the fingerprint.

Each vertical time slice in the bin is then analyzed for prominent local maxima as described in the next section.

Finding the local maxima

In the first time slice, the five greatest local maxima are stored as points in the fingerprint. Then a threshold is created by convolving these five maxima with a Gaussian curve, creating a different value for the threshold at each frequency. An example threshold is shown in the figure below. The threshold is used to spread out the data stored in the fingerprint, since peaks that are close in time and frequency are stored as one point.

The initial threshold, formed by convolving the peaks in the first time slice with a Gaussian curve.

For each of the remaining time slices, up to five local maxima above the threshold are added to fingerprint. If there are more than five maxima, then the five greatest in amplitude are chosen. The threshold is then updated by adding new Gaussian curves centered at the frequencies of the newly found peaks. Finally the threshold is scaled down so that it decays exponentially over time. The following figure shows how the threshold changes over time.

The threshold increases whenever a new peak is formed around that peak’s frequency and decays exponentially over time.

The final list of the time and frequencies of the local maxima above the threshold are returned as the song’s fingerprint.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Digital song analysis using frequency analysis. OpenStax CNX. Dec 19, 2009 Download for free at http://cnx.org/content/col11148/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Digital song analysis using frequency analysis' conversation and receive update notifications?

Ask