<< Chapter < Page | Chapter >> Page > |
In 1838, Samuel Morse was struggling with the problem of designing an efficient code for transmitting information over telegraph lines. He reasoned that an efficient code would use short code words for common letters and long code words for uncommon letters. (Can you see the profit motive at work?) In order to turn this reasoned principle into workable practice, Morse rummaged around in the composition trays for typeface in a printshop. He discovered that typesetters use many more than s's. He then formed a table that showed the relative frequency with which each letter was used. His ingenious, variable-length Morse code assigned short codes to likely letters (like “dot” for ) and long codes to unlikely letters (like “dash dash dot dot” for ). We now know that Morse came within about 15% of the theoretical minimum for the average code word length for English language text.
A Variation on Morse's Experiment. In order to set the stage for our study of efficient source codes, let's run a variation on Morse's experiment to see if we can independently arrive at a way of designing codes. Instead of giving ourselves a composition tray, let's start with a communication source that generates five symbols or letters . We run the source for 100 transmissions and observe the following numbers of transmissions for each symbol:
We will assume that these “source statistics” are typical, meaning that 1000 transmissions would yield 500 and so on.
The most primitive binary code we could build for our source would use three bits for each symbol:
This code is inefficient in two ways. First, it leaves three illegal code words that correspond to no source symbol. Second, it uses the same code word length for an unlikely symbol (like ) that it uses for a likely symbol (like ). The first defect we can correct by concatenating consecutive symbols into symbol blocks, or composite symbols. If we form a composite symbol consisting of source symbols, then a typical composite symbol is . The number of such composite symbols that can be generated is . The binary code for these composite symbols must contain binary digits where
The number of bits per source symbol is
This scheme improves on the best variable length code of Table 2 from "Binary Codes: From Symbols to Binary Codes" by 0.08 bits/symbol.
Now let's reason, as Morse did, that an efficient code would use short codes for likely symbols and long codes for unlikely symbols. Let's pick code#5 from Table 2 from "Binary Codes: From Symbols to Binary Codes" for this purpose:
This is a variable-length code. If we use this code on the 100 symbols that generated our source statistic, the average number of bits/symbol is
Notification Switch
Would you like to follow the 'A first course in electrical and computer engineering' conversation and receive update notifications?