<< Chapter < Page Chapter >> Page >

Design assumptions

Due to the complexity and proliferation of musical notations, our music notation analysis focuses on a few elementary and widely used musical symbols, providing the foundation for further research.

  • Perfect Image Orientation: The music notation analysis assumes a preprocessing stage that scales an image of sheet music and corrects orientation.
  • Time Signature Independence: Music speed is dependent on both time-signature (e.g. 4-4 time) and temp (e.g. Moderato). Current algorithm implementation assumes a Moderato, 4-4 time, with 120 beats-per-minute.
  • Key Signature Independence: The program assumes a preprocessing stage that determines key signature, which then shifts note values by +/- a half note frequency. In this design we assumed the C-major.
  • Treble Clef: The program assumes a preprocessing stage that determined the clef as treble clef.
  • Common Notations: Notations matches occur with whole, half, quarter, eighth, rests, sharps, and flats.

Cross-correlation matched filter algorithm:

  • Filters are used to remove unwanted frequency components from a signal, in order to sift out components of interest. In this case we want to determine what segments of a sheet of music have similar frequency content to that of an image of a certain note. In signal processing, a matched filter detects the presence of a known signal in a template signal. Matched filtering is widely used in communications for determining the presence of one signal or another (e.g. the representation of a one vs the representation of a zero in transferring bits). In this case we want to determine what kinds of notes are present in our image and where they are located.
  • There are a number of ways to do matched filtering, one of which is to compute the cross-correlation between the signal you’re looking for and a template signal (the signal in which you’re searching for your signal of interest). Performing a cross-correlation is essentially the same as performing a convolution, but without first “flipping” one of the signals: y[n]= k h[k+n]x[k]
    Where y[n]denotes the matched output, h[n] denotes template image, and x[n]denotes the image.
  • Performing the 2D cross correlation of an inverted, binarized image (black pixels=1, white pixels=0, with pixels converted to either one or the other by thresholding) of a certain note with the inverted, binarized image of a segment of sheet music yields a matrix wherein the maximum entries correspond to locations on the template image where the similarity between the image of the note and that section of the template image (of the same dimensions) is at a maximum. Hence, by filtering with different note images (i.e. half, quarter, whole) we can determine not only the location of each note relative to the bars (and therefore it’s type: a, b, c, etc.) but also the length of the note (half, quarter, whole, etc.).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Optical character recognition. OpenStax CNX. Apr 15, 2011 Download for free at http://cnx.org/content/col11296/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Optical character recognition' conversation and receive update notifications?

Ask