Harmonic pitch class profile
The key to note recognition is analyzing the frequency consistence of the audio clip. Suppose the clip we are analyzing consists of one or several individual notes, then unsurprisingly, the FFT of the audio should have several "peaks". HPCP provides a way to pin down the note(s) by analyzing these "peaks" in the FFT plot.
Big picture
The HPCP algorithm returns a vector of length 12, representing the 12 notes within an octave. The elements will in the end be normalized, representing the likelihood that the corresponding note is actually in the audio. The values are obtained as follows:
where n = 1, ... 12, ai is the linear magnitude of the ith peak, and fi is the frequency value of the ith peak. i = 1, ... nPeaks, where nPeaks is the number of spectral peaks that we consider, and w is the weight of the frequency fi.
Basically, the note represented by integer n is compared with every peak in FFT. The weighting function represents the similarity between the note and the peak. This correlation is then multiplied by the square of the amplitude of the peak. We do this for every peak and add all the correlation up to get the "likelihood" that note n is compatible with the FFT graph. We repeat the same operations for all 12 values of n, and the HPCP vector is complete.
Weighting function
The weighting function mentioned above is determined by the following three steps.STEP ONE:
STEP TWO:
STEP THREE:
Normalization
After we have obtained the original HPCP results, we normalize the biggest term to 1. Now we have a vector of length 12 and each element is between 0 and 1, each representing the "possibility" that the corresponding note is in that audio clip.
Example
Here is an HPCP vector for a C Major chord:
Here is the input file:
Octave detection
One problem with the HPCP algorithm is that it ignores what octave the original note was in. We have written a function findOctave to rectify this.
First off, we exploit the fact that a note's fft is the same regardless of what pitch it is: spikes the the note's pitch's fundamental frequency and all its multiples. So an A4 starts at 440 Hz, and has spikes at 880, 1320, etc. Our HPCP will identify this note as an A. To find out what octave it is in, we just look at the location of the lowest spike because all other spikes are multiples of this one frequency. So we find it at 440 Hz.
This process is repeated for every pitch detected by the HPCP, so for the C Major chord above, it'd look for the harmonics of C and find the lowest at 261 Hz (C4), then E at 330 Hz (E4), and G at 392 Hz (G4).
This method is fast and accurate -- however there is one limitation: if there are multiple notes of the same pitch but in different octaves (e.g. C4 and C5 and C6), their spectra would overlap and the algorithm would only detect the lowest note, C4.