<< Chapter < Page Chapter >> Page >
This method describes the course tutorial supplied by the professor.

Vi – class tutorial

This section is based on the class tutorial stated here: (External Link) .

For this tutorial, we analyze the sequence shown in figure 6.1; this figure shows two recordings of Nicholas saying the word “two”. We denote these sequences as two1 and two2.

Figure 6.1: Two recordings of Nicholas saying the word “two”.

We first compute the L2 norm of the difference of two signals as shown in (6.1).

f 1 f 2 = i min ( N 1 , N 2 ) f 1 ( i ) f 2 ( i ) 2 size 12{ ldline f rSub { size 8{1} } - f rSub { size 8{2} } rdline = Sum cSub { size 8{i} } cSup { size 8{"min" \( N rSub { size 6{1} } ,N rSub { size 6{2} } \) } } { left (f rSub { size 8{1} } \( i \) - f rSub { size 8{2} } \( i \) right ) rSup { size 8{2} } } } {} (6.1)

We naively cut off the comparison of the two data sequences when the shorter signal ends. The norm of the difference between these two sequences is approximately 15.4. To gain an understanding of whether this value is large, we compute the energy in the individual signals. The energy in two1 and two2 are approximately 12.0 and 9.3, respectively. We see that the norm of the difference is greater than 100% of the energy in each individual signal. This is very large for two signals that produce the same sound (where “same” here means that both signals are interpreted by a human as having the same meaning).

We now compare the norm of the first “two” sequence to itself. Shown in figure 6.2 are two sequences: two1 and two3, where two3 = 5 * two1. Note the difference in the values of the y axes. As one can see, the difference in the signals is large (as was expected).

Figure 6.2: Plots showing a sequence “two” stated by Nicholas, that signal multiplied by 5, and the difference of the two.

Were two1 and two3 different recordings of the same person saying the phrase “two”, we could first make the sequences comparable by normalizing the amount the two sequences. As suggested in the tutorial, we could normalize by the maximum value in the signal. This is done according to the formula shown in (6.2).

normalized data = data max ( data ) size 12{"normalized data"= { {"data"} over {"max" \( "data" \) } } } {} (6.2)

In this case this procedure works perfectly, and in fact the L2 norm of the difference vector between two1 and the normalized two3 is 0. However, this procedure only works because one signal is exactly a multiple of the other. If the signals were slightly misaligned, or if there were noise added to the signal, then the energy in the difference signal would again be on the order of the energy in the signal itself. There would not have to be a lot of noise to corrupt this procedure. If two3 equaled 5*two1 at all points except the maximum, and that point were corrupted such that it were 2*5*two1, then the average value for the ratio between the two1 and the normalized data would be approximately 2.

A more robust normalization procedure is to normalize by the energy in the signal. This is done according to the formula shown in (6.3); the 2 subscript denotes that the 2 norm is used.

normalized data = data data 2 size 12{"normalized data "= { {"data"} over { ldline "data" rdline rSub { size 8{2} } } } } {} (6.3)

Though this procedure does not make the comparison robust to alignment issues, it does make the procedure slightly robust to spurious noise, as long as that noise has a 0 temporal mean. Again, in our example where no noise is added to the system and the signals are perfectly aligned, the L2 norm of the difference between two1 and the normalized two3 is 0.

Comparing the norms as performed above is interesting; this procedure reveals just how adaptable the human brain is. The same phrase emitted by the same person while changing the amount of contraction in the diaphragm, the amount of contraction of the intercostals muscles, the spectrum emitted by the vocal cords (changing the pitch), and the shape of the respiratory tract (e.g. the shape of the mouth) are easily interpreted by the human brain to have the same meaning.

For a computer to perform similarly, we will need a more sophisticated processing than a comparison of norms.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Analysis of speech signal spectrums using the l2 norm. OpenStax CNX. Dec 12, 2009 Download for free at http://cnx.org/content/col11143/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Analysis of speech signal spectrums using the l2 norm' conversation and receive update notifications?

Ask