Desurvire E. Classical and Quantum Information Theory: An Introduction for the Telecom Scientist

Подождите немного. Документ загружается.

8 Information coding

This chapter is about coding information, which is the art of packaging and formatting

information into meaningful codewords. Such codewords are meant to be recognized by

computing machines for efﬁcient processing or by human beings for practical under-

standing. The number of possible codes and corresponding codewords is inﬁnite, just

like the number of events to which information can be associated, in Shannon’s meaning.

This is the point where information theory will start revealing its elegance and power.

We will learn that codes can be characterized by a certain efﬁciency, which implies that

some codes are more efﬁcient than others. This will lead us to a description of the ﬁrst

of Shannon’s theorems, concerning source coding. As we shall see, coding is a rich

subject, with many practical consequences and applications; in particular in the way we

efﬁciently communicate information. We will ﬁrst start our exploration of information

coding with numbers and then with language, which conveys some background and ﬂa-

vor as a preparation to approach the more formal theory leading to the abstract concept

of code optimality.

8.1 Coding numbers

Consider a source made of N different events. We can label the events through a set of

numbers ranging from 1 to N , which constitute a basic source code. This code represents

one out of N ! different possibilities. In the code, each of the numbers represents a code-

word. One refers to the list of codewords, here

{

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,...,N

}

as a dictionary.

We notice here that our N codewords use decimal numbers. In fact, these codewords

are generated by some unique combinations of characters, as selected from a smaller

dictionary, here

{

1, 2, 3, 4, 5, 6, 7, 8, 9, 0

}

. The smallest dictionary is referred to as a

codeword alphabet. In Roman antiquity, one would have used instead the alphabet

{

I, II, III, IV, V, VI, VII, VIII, IX, X, L, C, D, M

}

which corresponds to Roman numerals. As we know from school, the corresponding

codewords are formed according to certain rules

(can we recognize in MDCCLXXXIX

The correspondence being {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100, 500, 1000} ≡ {I, II, III, IV, V, VI, VII, VIII,

IX,X,L,C,D,M}, which does not include any character for zero. Note the subtractive rule 9 ≡ IX and

128 Information coding

the date of the French Revolution?). Despite the oddity of their code rules, it is noteworthy

that Roman numerals are still in use, for instance to represent the hours on clock

dials, number pages in book prefaces, express copyright dates, enumerate cases in

mathematical descriptions, or count series in games, such as US football. It is also

interesting to note that the Roman-numeral system was in fact inspired by the Greek

system in use in 400 BC.

The ten-character dictionary of our decimal system, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}was ﬁrst

used by the Hindus in 400 BC, and then transmitted later to the West by the Arabs, hence

the misnomer Arabic numerals, which should rather be Hindu-Arabic numerals.The

introduction of a 0 character made it possible to greatly simplify numerals. Indeed, with

only two- or three-character codewords (00–99 or 000–999), up to 100 or 1000 numerals

can be generated. The hexadecimal system is based on the 16-character alphabet

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F}.

The advantage of the hexadecimal system is that the codeword length is shorter than

in the decimal case, at the expense of using a greater number of alphabet characters. The

drawback of both decimal and hexadecimal codes is that they are based on relatively

large alphabets. In practice, such characters may not be simple to generate, write, or

faithfully interpret, as we all know from handwriting experience. Such characters need,

in turn, to be coded through a simpler alphabet. The most basic code (and number

representation) corresponds to the binary system, which uses the two-character {0, 1}

alphabet.

A single-character, binary codeword is referred to as a bit, short for binary

digit. In the following, we will equivalently refer to codewords as “numbers.”

Binary and decimal numbers are conventionally written as the ordered charac-

ter sequences B = ...b

and D = ...d

, respectively. The conversion

between decimal and binary numbers is given by the following power expansion:

...d

≡ ...d

× 10

+ d

× 10

+ d

× 10

+ d

× 10

= ...b

× 2

+ b

× 2

+ b

× 2

+ b

× 2

(8.1)

≡ ...b

not 9 ≡ VIII. To generate numbers greater than 10 ≡ X,theruleisasfollows:11≡ XI, 12 ≡ XII, . . . ,

18 ≡XVIII, 19 ≡XIX, and 20 ≡XX, and. Then 30 ≡XXX, 31 ≡XXXI, . . . , up to 39 ≡XXXIX. Because

of the absence of a zero, the powers of ten would be character-consuming should they have to be repeated by

as many X. To make numbers more compact, the Romans chose to represent 50, 100, 500, and 1000 by the

symbols L, C, D, and M, respectively, with the subtractive rule 40 ≡ XL, 400 ≡ CD, 90 ≡ XC, 900 ≡ CM.

Thus, 2006 is represented by MMVI, while 1999 is represented by MCMXCIX. There are additional rules

for representing greater numbers, see for instance:

http://en.wikipedia.org/wiki/Roman_numerals#IIII_or_IV.3 F, and

http://ostermiller.org/calc/roman.html.

For more on the numeral systems used by different civilizations through history, see

http://en.wikipedia.org/wiki/Arabic_numerals.

See: http://en.wikipedia.org/wiki/Greek_numerals.

The unary system uses a single-character alphabet {1}. Numbers are represented by this character’s repeti-

tion, starting from zero: 0 = 1, 1 = 11, 2 = 111, 3 = 1111, etc. Albeit not practical, such a code can have

interesting applications, for instance, in Turing machines (see Chapter 7).

8.2 Coding language 129

Thus, the decimal number D = 3 = 1 ×2

+ 1 ×2

corresponds to the binary number

B = 11, and the decimal

D = 1539

= 1 ×2

+ 1 ×2

+ 0 ×2

+ 0 × 2

+ 0 ×2

+ 1 ×2

corresponds to the binary B = 11 000 000 011, for instance. It is easy to establish that

the maximum decimal value for an n-bit binary number is D = 2

− 1. For instance,

D = 7 = 2

− 1 corresponds to B = 111, and D = 15 = 2

− 1 corresponds to B =

1111, which represent the maximum decimal values for 3-bit and 4-bits binary numbers,

respectively.

For practical handling, long binary numbers are usually split into subgroups of eight

bits, which are called bytes. The byte itself can be divided into two subgroups of four

bits. Since B = 1111 is equal to D = 15, one can represent a byte through a two-

character codeword based on the alphabet

{

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

}

which corresponds to the hexadecimal representation. For instance, the hexadecimal

H = 7A corresponds to the binary B = 01111010 and the decimal D = 122. The

hexadecimal system is just a convenient way of representing binary numbers through a

16-character alphabet. It is also a base-16 representation, which immediately translates

into the binary system by blocks of four bits. A byte thus covers the decimal-number

range 2

− 1 = 255, which is also conveniently represented by hexadecimal numbers

from 00 to FF. Note that with the zero, the number of symbols that can be represented

by n bits is actually 2

8.2 Coding language

In Chapter 4, we have analyzed the entropy in language. Languages, especially the

ones we can’t read, can be viewed as random sources of alphabet characters. As

we have seen, alphabet characters do not have the same probability of occurrence,

because words make preferential use of certain letters, like E, T, A, O, N ...inEnglish.

The probability distribution of most languages’ alphabets is exponential (see Fig. 4.3).

The distribution varies according to geographic derivatives, and the type of commu-

nication (e.g., informal, technical, literary). The language dictionary slowly evolves

through generations, as new words are introduced, and older ones are abandoned. Even

the alphabet somewhat evolves, for instance with the new character @, which became

much more common with the Internet generation. In this section, we shall consider the

issue of coding language by substituting the conventional A–Z alphabet with a decimal,

and then a binary coding system.

A ﬁrst coding approach could consist in attributing a decimal number to each alphabet

character: for instance, A = 1, B = 2, and so on, down to Z = 26. This requires 26

codewords with two decimal symbols varying from 0 to 9, i.e., going from 00 to 99. This

makes 100 coding possibilities, leaving extra room for 100 – 26 = 74 other alphabetical

130 Information coding

or alphanumerical symbols. This reserve would be adequate to code other symbols, such

as lower-case a–z characters, characters with accents and alterations (

a, ç,

n...), space and punctuation characters, parentheses and quotes, math operands and

symbols (+, −, ×, ÷, =, >, <), the decimals 0–9, and other special characters (

∗

,%,

#, §, &, ‘, ∼, ∧,

◦

,_,|,/,\,[,],{, },$,€, £, @...). This whole character bank

forms our computer-keyboard alphabet, to be completed with several other computer

commands. It can, thus, be theoretically coded through 100 decimal numbers (00 to

99). But such a decimal coding has never been of any practical use, except in the early

times of cryptography, with the unique character–decimal correspondence representing

the “secret code.”

A second and more powerful coding approach consists in converting the whole key-

board alphabet into binary codewords. The advantage is that the latter can be processed

by computers, without any other form of encoding. Since we have 2

= 128, we observe

that codewords of only seven-bit length are more than sufﬁcient to cover a full keyboard-

symbol alphabet. This is the reason why most computers use ASCII,

a standard code

invented in 1961 and based on seven-bit codewords. Note that ASCII is not the only

possible code for computers: indeed, EBCDIC

is another standard based on eight-bit

codewords, or bytes. The extra bit of EBCDIC makes it possible to code twice as many

alphabet characters as in ASCII, namely 2

= 256 codewords. In the 1970s, ASCII was,

in fact, extended to eight-bit codewords, which, in particular, makes it possible to include

all language-speciﬁc characters. The correspondence table between ASCII and keyboard

characters can be found on the Internet.

Table 8.1 illustrates the correspondence for the

most commonly used keyboard characters. For instance, the letter A is coded as 1000001,

the letter b is coded as 1100010, and the character @ is coded as 1000000. We observe

from the table that all alphabetical characters or letters (lower or upper case) begin with

1 as the leftmost (highest-weight or seventh) bit. We also observe that all upper-case

letters have 0 in the second leftmost (or sixth) bit position, while lower-case letters have

1 instead. Thus, the 2 × 26 = 52 upper and lower-case letter alphabet is actually using

six-bit codewords, while the 26-letter alphabet is using ﬁve-bit codewords.

A ﬁve-bit codeword can cover 2

= 32 characters, which is apparently not sufﬁcient

for representing the 26 letters and the 10 numbers, unless numbers can be written

as words (e.g. 3 = three ). To increase the possibilities of a ﬁve-bit code, a trick is to

introduce two shift characters, one to announce a shift from letters to ﬁgures (concerning

codewords to follow) and the other for the reverse operation. The introduction of these

two shift characters, thus, virtually permits one to re-use the set of 26 letter codewords

as another set of 26 ﬁgure codewords, including numbers and punctuation, leaving four

extra symbols to use for space, carriage return, line feed, and blank. This is the principle

For a detailed introduction to early secret codes and cryptography, see for instance E. Desurvire, Wiley

Survival Guide in Global Telecommunications, Broadband Access, Optical Components and Networks, and

Cryptography, (New York: J. Wiley & Sons, 2004).

American Standard Code for Information Interchange.

Extended Binary Coded Decimal Interchange Code.

See, for instance, http://–wikipedia.org/wiki/ASCII.

8.2 Coding language 131

Table 8.1 ASCII code table for common keyboard characters (extract).

← Bit word 7 0 0 0 0 1 1 1 1

6001 1 0 0 1 1

432 15010 1 0 1 0 1

000 0 space0 @P \ p

000 1 ! 1 A Q a q

001 0 " 2 B R b r

001 1 # 3 C S c s

010 0 $ 4 D T d t

010 1 % 5 E U e u

011 0 & 6 F V f v

011 1 ' 7 G Wg w

100 0 ( 8 H X h x

100 1 ) 9 I Y i y

101 0

∗

:JZjz

101 1 + ;K[ k{

110 0



< L \ l |

110 1 - = M] m}

111 0 . > N

∧

n ∼

111 1 / ? O − o

of the Baudot code, now better known as International Alphabet IA2 and still in use in

telex machines.

Consider now the code-source entropy. If all ASCII codewords were equally likely

(having a uniform probability distribution), the corresponding entropy would be H =

log 128 = log 2

≡ 7 bit/symbol, which is precisely the codeword length. In the case

where the codeword length matches the entropy of the code source, it is said that the code

is optimal. This important concept of code optimality will be met repeatedly throughout

this chapter.

But, as we are aware, language characters do not have a uniform probability, and,

therefore, the source entropy must be less than the above maximum (7 bit/symbol). For

ordinary text ﬁles, the most likely symbols are spaces and lower-case letters, which follow

an exponential distribution. In Chapter 4, we established that the plain 26-character

English alphabet (A–Z) has an entropy of 4.185 bit/symbol (1982 survey). We would

then expect the entropy of the ASCII source to be somewhat greater than this value,

considering the greater diversity of characters, but substantially less than 7 bit/symbol,

because of the nonuniformity of the code source. Thus, as applied to language, ASCII

can be regarded as being a nonoptimal code, since its codeword length is greater than

the actual source entropy! For computer ﬁles, like tabulated data, source programs, or

HTML Internet pages, however, the character statistics are quite different from that of

the English language, and the corresponding probability distribution is somewhat closer

to uniform. In this respect, ASCII is closer to code optimality.

Once a codeword length has been ﬁxed, and is, by deﬁnition, the same for all code-

words, there is no possibility of further optimizing the code. On the other hand, code

optimization seeks for optimal codes which are based on variable-length codewords.

132 Information coding

We will look at the issue of code optimization in the next two sections, starting with an

analysis of the Morse code.

8.3 The Morse code

Another approach for coding language is to use codewords with variable lengths. The

rationale is to make the length of the most frequently-used characters the shortest

possible, and the reverse for the least frequently used ones. With this approach, the

average codeword length is shorter than that of a ﬁxed-length code, and this will bring

us closer to coding optimality.

The Morse code is a historical illustration of the above concept. Such a code has been

widely used in pre-computer ages for military communications (from the American Civil

War to the First World War), for the early beginnings of the public telegraph, which, as

a true revolution of the time, brought the telegram,

and for maritime communications

and safety.

Today, its use is only restricted to nostalgic amateur groups.

The Morse

code is a binary-like or pseudo-binary code based on the two character values dit =•

and da =

___

(or dot and dash, respectively). While anybody knows the meaning of

SOS, fewer people know that it actually means “Save Our Souls,” and maybe even fewer

people know the Morse transcription:

•••/−−−/ •••(dit dit dit/da da da/dit dit dit),

as repeated several times.

As symbolized above by the slash, each Morse codeword must in fact be separated by

short pauses. Such pauses are meant for unambiguous identiﬁcation of the codewords.

This is because Morse messages are meant to be generated, to be written, and to be read

in real time by human operators, and not by a machine.

Table 8.2 shows the “Continental” international correspondence of the 44 Morse

symbols with alphanumerical characters and various punctuation signs. The list is com-

pleted with another ﬁve symbols for messaging commands. Note that there is no Morse

symbol for “space,” for consideration of economy. Morse messages make sense with-

out spaces, just like HELLOHOWAREYOU or HAPPYBIRTHDAYTOYOU. It does not

preclude that the sending operator may use the “wait” symbol once in a while, to take

a breath or if he is accidentally interrupted whilst broadcasting a message. Yet full

texts can be coded with all punctuation marks (except the exclamation point, !), which

makes the Morse code very complete as a communication means. As one observes from

Table 8.2, the shortest Morse symbols are attributed to the most common letters in

The ﬁrst telegram was sent from Baltimore to Washington, DC over electrical wires by Morse in 1844, see

http://en.wikipedia.org/wiki/Electrical_telegraph.

According to international maritime safety regulations, ships at sea no longer need to be equipped with

Morse-based alarm systems with SOS signaling as in the past. Since 1999, indeed, the regulatory alter-

native is now the Global Maritime Distress and Safety System (GMDSS), which uses satellite and other

communication principles.

The Morse code still has fans world-wide, who collect and use old machine and even organize High Speed

Telegraphy Championships.

8.3 The Morse code 133

Table 8.2 Correspondence of the Continental International Morse code with alphanumerical characters,

punctuation, and other command characters. The nine letters most frequently used in European languages

are placed at the left.

E • B −••• . •−•−•− 0 −−−−− call T −•−

T − C −•−• , −−••−− 1 •−−−− error •••••••

A •− D −•• ? ••−−•• 2 ••−−− wait •−•••

N −• F ••−• : −−−••• 3 •••− − end M •−•−•

I •• G −−• ; −•−•−• 4 ••••− end B •••−•−

M −− H •••• - −•••− 5 •••••

S ••• J •−−− / −••−• 6 −••••

O −−− K −•− " •−••−• 7 −−•••

R •−• L •−•• 8 −−−••

P •−−• 9 −−−−•

Q −−•−

U ••−

V •••−

W •−−

X −••−

Y −•−−

Z −−••

call T = call to transmit, end M = end message, end B = end broadcasting.

European languages, i.e., E, T, A, N, I . . . while the longest symbols are attributed to

the least frequent letters, such as J, Q, X, Y, Z. In this way, operators save lots of time

when generating or writing down Morse codewords.

Another trick in the Morse code is that letter symbols take a maximum of four dit/da

characters, while number symbols are exactly ﬁve characters long. This makes it easier for

operators to distinguish between letters and numbers, and avoids any risk of confusion for

numbers (mistakes in numbers having potentially more important consequences, unlike

with letters, which can be intuitively corrected, or whose mistakes are immediately

noticeable). The Morse code has proven quite efﬁcient for rapid messaging between

“human entities” having limited telecommunications equipment. Certain civilian and

military boats still carry on-board Morse machines as light guns: in adverse conditions

when the radio is down because of power failure or enemy scrambling, a point-to-point

and “radio-silent” Morse communication by day or by night may be the only solution.

And even a small piece of mirror with the sun or a ﬂashlight works very well to

communicate over distances of kilometers, and can be included as part of any survivor’s

equipment, for vital SOS messaging.

The Morse is, thus, a ﬁrst example of a variable-length code. Since the codeword

length is decreased in proportion to the symbol frequency, we should expect that the

entropy of Morse-code source is quite smaller than that of an ASCII code reduced to the

same A–Z letters. In fact, the entropy analysis of the Morse code is not as straightforward

as it may ﬁrst appear. Earlier, I referred to the code as being pseudo-binary,evenifituses

only two characters, which provided a hint. Indeed, the code makes use of short pauses

or blanks between two codewords, without which the code would be unintelligible. For

134 Information coding

instance, the beginning message HELLO

••••/ • / •−••/ •−••/ −−−

transmitted without blanks would look like

••••••−•••−••−−−,

which from Table 8.2 can be interpreted in several different ways (e.g., 5ELRJ or

SVEFAM or EEEEEETIA2, etc.). This illustrates the property that, without such blanks,

the Morse code is not uniquely decodable and is useless (say, except for mere SOS

purposes). This notion of unique decodability will be further addressed in the next

section. Here, we shall analyze what these information-less, but indispensable blanks

represent in terms of code entropy.

The idea to begin our analysis is to look at the blanks (/) as representing an extra

symbol character in the Morse code, which is systematically present at the end of any

codeword. Thus dit/ and da/ actually form digrams (two-character symbols) as opposed

to monograms (single-character symbols). Two possibilities exist for introducing this

extra blank symbol.

The ﬁrst possibility is to set the blank to a binary-code (dit/da) value, which must

meet two requirements: (a) it is not already taken by any Morse code symbol, and (b) the

concatenation of the blank to any Morse codeword, forming the new “digram” codeword,

should be uniquely decodable. Referring to Table 8.2, the smallest binary symbol for

“blank” should be −−−−−−. With such a convention, whenever one hears six das,

it is deﬁnitely a blank without ambiguity or error, and one knows for sure which other

conventional Morse codeword precedes or follows. The detection of −−−−−−as

a new symbol is also an indication that blanks are now being coded! But we now have

a tax to pay: the minimum symbol size of this new Morse code is 7 (digram symbols

“E-blank” and “T-blank”), and the maximum size is 13 (digram symbol “error-blank”).

We shall therefore discard this effective, but poorly economical approach.

The second possibility is to convert the pseudo-binary Morse code into a ternary one.

In base three, the characters are 0, 1, and 2, which are called trits for ternary digits).

A single trit, thus, codes three numbers, and n trits make up 3

coding possibilities. Our

extended Morse code having 49 + 1 = 50 symbols, we see that n = 4 trits are required,

although 3

= 81 is far in excess of what we actually need. Here, we shall not attempt

to optimize the length of this ternary coding system, but only to use the property offered

by a third alphabet character to represent blanks uniquely. Setting the convention da = 0,

dit = 1, we can only have blank = 2. The ternary codeword is simply generated by

appending 2 at the end of the binary Morse codeword. The ﬁrst two columns in Table 8.3

show the ternary codeword correspondence with the A–Z letters of the Morse code. For

instance R =•−•becomes 1012 in the proposed ternary representation. Actually, this

alternative Morse code is not different from the conventional one, if one conceives of

To my knowledge, the following (including in the next section) constitutes an original information-theory

analysis of the Morse code.

Likewise, in the base-4 or quaternary system, the characters 0, 1, 2, and 3 are called quads.

8.3 The Morse code 135

Table 8.3 Ternary representation of the Morse-code letters into

trit

codewords (CW), with the introduction of a character,

2, to signal the blank immediately following conventional Morse symbols (•=1, −=0, blank = 2), as shown in

the ﬁrst two columns. Column 3 shows the source probability distribution

(

), which is the same as used in Fig. 4.5

for English-language reference (1982 survey). Columns 4 and 5 show the detailed calculation of the bit/symbol (

)

and trit/symbol (

) entropy, using base-2 and base-3 logarithms, respectively. Column 6 shows the codeword length

(

) associated with each trit symbol, and Column 7 shows the calculation of the mean codeword length

(

effective

code entropy

). The last two columns represent the same as Columns 6 and 7, but with a different coding solution with

codeword length



(

) yielding the mean



(see text for description). The calculation results (source entropy, effective

code entropy, and coding efﬁciency) are shown at bottom.

Morse trit CW Mean Other Mean CW

Morse CW length CW CW length

symbol (x) p(x) p log

(p) p log

(p) l(x) l(x) p(x) l



(x ) l



(x )p(x)

E 12 0.127 0.378 0.239 2 0.254 2 0.254

T 02 0.091 0.314 0.198 2 0.181 2 0.181

A 102 0.082 0.295 0.186 3 0.245 2 0.64

O 0012 0.075 0.280 0.177 4 0.299 2 0.150

I 112 0.070 0.268 0.169 3 0.209 2 0.140

N 012 0.067 0.261 0.165 3 0.200 2 0.134

S 1112 0.063 0.251 0.158 4 0.51 3 0.188

H 11112 0.061 0.246 0.155 5 0.304 3 0.182

R 1012 0.060 0.243 0.153 4 0.239 3 0.179

D 0112 0.043 0.195 0.123 4 0.171 3 0.129

L 10112 0.040 0.185 0.117 5 0.199 3 0.120

C 01012 0.028 0.144 0.091 5 0.140 3 0.084

U 1102 0.028 0.144 0.091 4 0.112 4 0.112

M 002 0.024 0.129 0.081 3 0.072 4 0.096

W 1002 0.024 0.129 0.081 4 0.096 4 0.096

F 11012 0.022 0.121 0.076 5 0.110 4 0.088

G 0012 0.020 0.113 0.071 4 0.080 4 0.080

Y 11102 0.020 0.113 0.071 5 0.100 4 0.080

P 10112 0.019 0.108 0.068 5 0.095 5 0.095

B 01112 0.015 0.091 0.057 5 0.075 5 0.075

V 11102 0.010 0.066 0.042 5 0.050 5 0.050

K 0102 0.008 0.056 0.035 4 0.032 5 0.040

J 10002 0.002 0.018 0.011 5 0.010 5 0.010

X 01102 0.002 0.018 0.011 5 0.010 5 0.010

Q 00102 0.001 0.010 0.006 5 0.005 5 0.005

Z 00112 0.001 0.010 0.006 5 0.005 5 0.005



1.000

Entropy H

= 4.185 H

= 2.640 L = 3.544 L



= 2.744

bit/symbol trit/symbol trit/symbol trit/symbol

(source) (source) (code) (code)

Coding 74.49% 96.23%

efﬁciency

136 Information coding

the last character 2 in each codeword as another sound, which is different from dit or

da (say, do or du). For the purposes of entropy analysis, we shall consider here only

the symbols x corresponding to A–Z, for which we know the probability distribution,

p(x), taking the English-language PDF described in Chapter 4. It is easily established

that the PDF of the trit codewords is the same as that of the bit (or conventional Morse)

codewords.

Table 8.3 also shows the source entropy in either base-2 (bit/symbol) or

base-3 (trit/symbol) logarithms. By convention, entropy in logarithm base M will be

called here H

. Consistently, it is deﬁned according to:

=−



p(x)log

p(x). (8.2)

Recalling that log

x = ln x/ ln M, where ln is the natural logarithm, the relation

between base-M entropy and conventional (base-2) entropy is the following:

ln(M)H

= ln(2)H

. (8.3)

Looking at Table 8.3, the calculations for base 2 and base 3 source entropies yield

=4.185 bit/symbol (English-language entropy) and H

=2.640 trit/symbol, respec-

tively. The next step in our analysis is to deﬁne a way to measure how efﬁcient a given

code is in using the most concise codewords, regardless of the logarithmic base. This

issue is addressed in the next section, which will also use our new Morse code by way

of an illustrative example.

8.4 Mean code length and coding efﬁciency

Let’s introduce the mean codeword length (also called expected length) according to the

deﬁnition

L(X ) =l



x∈X

p(x)l(x), (8.4)

where l(x) is the codeword length corresponding to symbol x from source X. With a

binary code, the unit of L is bit/symbol. This deﬁnes the mean or expected codeword

length as effective code entropy.

We shall, again, use the ternary Morse code as an illustrative example. In this case,

L is in units of trit/symbol. Columns 6 and 7 in Table 8.3 detail the calculation of

the mean codeword length, as based on the above deﬁnition, the ternary codewords

previously introduced (Column 2) and the distribution p(x). As Table 8.3 shows,

the mean codeword length is L = 3.544 trit/symbol. Since L has the dimensions of

entropy, we can compare it with the codeword source, using the ratio η = H

/L, with

This is because the joint and conditional probabilities of the Morse/blank digrams x/y satisfy p(y =

blank

x ) = p(x)and p(y = blank, x ) = p(x), with p(y = blank) = 1.

More accurately the unit of the mean codeword length is bit/codeword or trit/codeword, but, for simplicity

and clarity, I shall use here the names bit/symbol or trit/symbol, it being understood that there is a one-to-one

correspondence between codewords and symbols.