<< Chapter < Page Chapter >> Page >

Onset detection

Onset detection is the detection of attacks, or the start of each note. When a song is plotted over time, the easily identifiable spikes represent the attack of a new note. We exploit this property to divide the song into individual notes in sequence. (It is important to note that there are songs where all the notes do not move in sequence, e.g. a chord is sustained while the melody moves through many different notes. This problem is discussed later.)

Input signal "baa baa black sheep"

.

At first sight, we can see that note attacks are clearly defined by large spikes. Though these spikes are visually easy to identify, when we take a closer look at the signal, we see that it is fluctuating rapidly, meaning that there is a lot of high frequency content in each note. Thus, to identify the peaks of each note, we can’t simply look at all the high points or the points where the value of the data rises exponentially. We’ll have to do some pre-processing and filtering to identify the edges computationally.

Zooming in on one note

.

We decided to try out a low-pass filter to smooth out the sharp changes in the song so all that’s left are smooth increases and decreases, where each rise and fall corresponds to a single note. First we decided that the length of the filter should be proportional to the sampling frequency, which fixes its length in time. Then we decided to try out a simple boxcar filter and convolve that with the square (i.e. the power) of our signal. It turned out that we decided to convolve it with a boxcar 3 times over, which is the same as convolve it once with 3 boxcars convolved together. Visually, this was our filter:

Thrice convolved boxcar

.

And here’s our smoothed signal superimposed on our original file:

Original signal with smoothed curve

.

We proceed to use MATLAB’s function “findpeaks.” We twiddled with parameters like the minimum height of the peaks listed and the minimum distance between peaks to ensure that small ripples in some attacks did not get counted twice. The algorithm then takes the distance between the peaks as the lengths of the notes.

There is a second part of the algorithm that calculates the beats per minute and attempts to characterize each note in terms of its beats, i.e. a half-note, quarter-note, eight-note, etc. This is done by finding the smallest interval, corresponding to the shortest note, and assumes all the note durations are multiples of this length. We assume the bpm of our song ranges from 100 bpm to 200 bpm – if we don’t, then we could say a quarter note at 80 bpm is the same as a half note at 160 bpm. There is a degree of tolerance (that can be varied) to determine whether two note durations correspond to the same note, as the lengths of each note we get are not perfect multiples of each other. This mapping works very well for synthesized music, but is not practical for live recordings of music or overtly rubato music, as the length of each beat is not fixed.

Below is a sample output of our note divider function where data1 is a synthesized audio file of Hot Cross Buns, fs is the sampling frequency of 44.1kHz, bpm is the beats per minute, notes is the estimated length of each note, and notetimes has is the length of each note in samples:

[bpm, notes, notetimes] = notedivider(data1, fs)

bpm notes note times
120.8288 1.000 22754
1.000 21846
2.000 43401
1.000 22925
1.000 21847
2.000 43819
0.500 10987
0.500 11054
0.500 10563
0.500 11136
0.500 10793
0.500 11321
0.500 10803
0.500 10938
1.000 23082
1.000 21847
3.000 61401

Gain compression

In order to make our detection algorithm robust, we performed gain compression on the signal before processing it. Compression flattens large spikes and amplifies rest of the signal. It therefore allows us to "clean up" the signal especially when the pianist plays certain notes softly and other notes loudly.

Below is our algorithm. x[n]is our input signal, y[n] is the compressed signal, lambda is a parameter greater than 1. Increasing it equalizes the note heights more and more. For our implementation we used a lambda value of 2.

Compression of a signal
Compression of a signal

After compression, we smooth the signal using a thrice convolved boxcar filter. This will produce a single continuous line that outlines the edge of the signal. This makes detecting the onset of a note easier by allowing us to locate the peak of a note with more ease.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Elec 301 projects fall 2013. OpenStax CNX. Sep 14, 2014 Download for free at http://legacy.cnx.org/content/col11709/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 projects fall 2013' conversation and receive update notifications?

Ask