<< Chapter < Page Chapter >> Page >

D/a and output drive

The output of the synthesizer required a digital to analog converter (D/A), an anti-aliasing filter and amplifier. All of these components needed to be integrated onto the synthesis device to keep the cost down. What we chose to do was to create a current summing, sign/magnitude 8 bit D/A. With this output it was easy to integrate on the device an amplifier circuit such that we effectively got two times the rail voltage to drive to the speaker. At this point you should note that we didn’t integrate the anti-aliasing filter onto the device, nor did we include one externally. The first time I told Richard that I was not going to use an anti-aliasing filter in the design, he got a bit irritable and accused me of not understanding how a Digital Signal Processing system worked. It took quite a bit of discussion and then a demonstration to show that I knew what I was doing. I had decided that the anti-aliasing filter came free with a cheap speaker.

The speaker I chose was a $0.50 two inch speaker that had a frequency bandwidth of 300 Hz to 3.3KHz. It also had a very nice roll off at 3.3KHz. In my discussion with the speaker vendor, I told them that if they ever sent me a speaker with a better frequency response than the specification I would send it back. I truly needed all of the “features” of a cheap speaker to make the system cost effective.

Word spelling

Along with the speech data was the correct spelling of the word being spoken. Even this aspect of the data was compressed. Because the alphabet could be completely captured in six bits rather than the normal 8 bit ASCII format, we chose to do just that. Saving memory was important to us.

Problems with accents

As I suggested earlier there were many issues that resulted from the compromises we made and the professional speaker we chose to pronounce the words.

Four, for

The pronunciations of the words became problematic. TI has a very diverse engineering team. We came from all parts of the US, not to speak of the non-US members. All had a different “correct English dialect”. I was from the midwest (the actual correct dialect), Richard was from Lousiana, Larry was from Arkansas and Paul was from East Texas. What’s worse we were doing the engineering in Lubbock, Texas (the panhandle of Texas). Among us we had five different dialects of the English language (I use dialect here perhaps improperly as it is actually the accents that were different – mostly). Then we had to determine if the product would use an American or a British accent. Our team had expanded by this point to include the linguists, market communications experts and educators. The decision was simple: we would use the standard broadcasting accent used in the US (yes, Midwestern accent) and would chose a dictionary that would be our final authority. Once the decision was made it became easy to overcome disagreements on proper pronunciation of words.

This brings me to an interesting discussion Paul and I had over the word four. Paul was raised in a region where the pronunciation of the words “for” and “four” were different. He came to me to point out that the Speak N Spell mispronounced the number “fo-wer”. I replied that it was correctly pronounced. He let me know that it was pronounced differently from the word “for” and that we needed to fix it. As was our practice, I pulled out the dictionary and looked up the two words and pointed out that both had the same phonetic spelling. Paul’s reply was simple, “well the dictionary is wrong”. Before we think badly of Paul, let me explain that we had the same issues with words I grew up with along with words Richard and Larry grew up with. The dictionary was constantly being used to guarantee we had a consistent word pronunciation in the product.

Female versus male voice

As I stated earlier, the LPC-10 implementation we used did not synthesize female voices well at all. The technical reason that Richard pointed out was that the spectral lines for a female voice were spread further apart than a male voice. As a result, some of the formants (the envelope made up of the peaks of the spectral lines) of the voiced sounds could be missed. Over the years when I presented this aspect of the Speak N Spell development I would tell the story with a bit of a violation to the theory, but it is always remembered by the audience. Here is how I told it:

“Because the pitch of a woman’s voice is higher than a man’s voice, the spectral lines are further apart in the frequency domain. Another way of looking at it is that there isn’t as much information in a woman’s voice as there is in a man’s voice, given a constrained bandwidth. Perhaps that is why women talk more than men – trying to make up for the lack of information in their voices.”

This explanation always seems to irritate the women in the audience and causes a bit of laughter among the men. Although it doesn’t accurately explain the reason we couldn’t do female voices, the people in the audience always remembered the explanation.

Conclusion

This section is a good overview of the LPC-10 speech synthesizer we used in the Speak N Spell along with several of the issues we had to overcome with it. In a later chapter I’ll talk further about some of the issue we faced with a product that talks.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The speak n spell. OpenStax CNX. Jan 31, 2014 Download for free at http://cnx.org/content/col11501/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The speak n spell' conversation and receive update notifications?

Ask