5.1 Onset detection

Elec 301 projects fall 2013 Page 1 / 1

Onset detection

Onset detection is the detection of attacks, or the start of each note. When a song is plotted over time, the easily identifiable spikes represent the attack of a new note. We exploit this property to divide the song into individual notes in sequence. (It is important to note that there are songs where all the notes do not move in sequence, e.g. a chord is sustained while the melody moves through many different notes. This problem is discussed later.)

Input signal "baa baa black sheep"

At first sight, we can see that note attacks are clearly defined by large spikes. Though these spikes are visually easy to identify, when we take a closer look at the signal, we see that it is fluctuating rapidly, meaning that there is a lot of high frequency content in each note. Thus, to identify the peaks of each note, we can’t simply look at all the high points or the points where the value of the data rises exponentially. We’ll have to do some pre-processing and filtering to identify the edges computationally.

Zooming in on one note

We decided to try out a low-pass filter to smooth out the sharp changes in the song so all that’s left are smooth increases and decreases, where each rise and fall corresponds to a single note. First we decided that the length of the filter should be proportional to the sampling frequency, which fixes its length in time. Then we decided to try out a simple boxcar filter and convolve that with the square (i.e. the power) of our signal. It turned out that we decided to convolve it with a boxcar 3 times over, which is the same as convolve it once with 3 boxcars convolved together. Visually, this was our filter:

Thrice convolved boxcar

And here’s our smoothed signal superimposed on our original file:

Original signal with smoothed curve

We proceed to use MATLAB’s function “findpeaks.” We twiddled with parameters like the minimum height of the peaks listed and the minimum distance between peaks to ensure that small ripples in some attacks did not get counted twice. The algorithm then takes the distance between the peaks as the lengths of the notes.

There is a second part of the algorithm that calculates the beats per minute and attempts to characterize each note in terms of its beats, i.e. a half-note, quarter-note, eight-note, etc. This is done by finding the smallest interval, corresponding to the shortest note, and assumes all the note durations are multiples of this length. We assume the bpm of our song ranges from 100 bpm to 200 bpm – if we don’t, then we could say a quarter note at 80 bpm is the same as a half note at 160 bpm. There is a degree of tolerance (that can be varied) to determine whether two note durations correspond to the same note, as the lengths of each note we get are not perfect multiples of each other. This mapping works very well for synthesized music, but is not practical for live recordings of music or overtly rubato music, as the length of each beat is not fixed.

Below is a sample output of our note divider function where data1 is a synthesized audio file of Hot Cross Buns, fs is the sampling frequency of 44.1kHz, bpm is the beats per minute, notes is the estimated length of each note, and notetimes has is the length of each note in samples:

[bpm, notes, notetimes] = notedivider(data1, fs)

bpm	notes	note times
120.8288	1.000	22754
	1.000	21846
	2.000	43401
	1.000	22925
	1.000	21847
	2.000	43819
	0.500	10987
	0.500	11054
	0.500	10563
	0.500	11136
	0.500	10793
	0.500	11321
	0.500	10803
	0.500	10938
	1.000	23082
	1.000	21847
	3.000	61401

Gain compression

In order to make our detection algorithm robust, we performed gain compression on the signal before processing it. Compression flattens large spikes and amplifies rest of the signal. It therefore allows us to "clean up" the signal especially when the pianist plays certain notes softly and other notes loudly.

Below is our algorithm. x[n]is our input signal, y[n] is the compressed signal, lambda is a parameter greater than 1. Increasing it equalizes the note heights more and more. For our implementation we used a lambda value of 2.

After compression, we smooth the signal using a thrice convolved boxcar filter. This will produce a single continuous line that outlines the edge of the signal. This makes detecting the onset of a note easier by allowing us to locate the peak of a note with more ease.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Elec 301 projects fall 2013. OpenStax CNX. Sep 14, 2014 Download for free at http://legacy.cnx.org/content/col11709/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 projects fall 2013' conversation and receive update notifications?

Ask

	Social Work midterm By Katy Pratt Start Exam
©flickr: Justin	Music Appreciation Final Practice By Madison Christian Start Exam
	Microeconomics Practice MCQ By Frank Levy Start Test
	Art History ARTH209 20th Century By Rebecca Butterfield Start Quiz
	4 BOD Hemolymphatic -Dr. Han By Brooke Delaney Start Exam
	Vocabulary for "A Rose for Emily" By Bonnie Hurst Start Quiz
	Basic Of Computer Exam By Naveen Tomar Start Quiz
	PE Power Enigeering Safety By Gerr Zen Start Quiz
	U.s. history By OpenStax Read Online Course
	6 Arts Society: Theater 6 By Jonathan Long Start Quiz