Desurvire E. Classical and Quantum Information Theory: An Introduction for the Telecom Scientist

Подождите немного. Документ загружается.

10.3 Adaptive Huffman coding 197

Number of message symbols

Entropy Efficiency

Figure 10.4 Evolution of entropy H and coding efﬁciency η in adaptive Huffman coding as a

function of the number of symbols received k, which corresponds to the example in Fig. 10.3 and

Tables 10.3 and 10.4.

In case (a), the ∅-node’s current codeword acts as a signal to the decoder that a new

symbol must be entered in the tree (initially, the ∅-node codeword is set to zero). The

previous ∅-node is split into a new ∅-node and a new leaf for this symbol. The other

information concerns the symbol description, for instance its ASCII value. In case (b),

the weight of the symbol’s terminal node (leaf) simply needs to be incremented by one

unit, which is sufﬁcient to recompute the coding tree. Note that the information in case

(b) is pure payload (previously-seen symbol transmission), while that in case (a) is both

payload and overhead (new symbol transmission with deﬁnition). The data listed in Table

10.4 represent the information provided by the encoder to the decoder for the message

string GOODTOSEEYOU, as based on the codewords of Table 10.3, to be sequentially

used by the decoder for coding-tree updating and decoding. The table also shows the

number of information bits that the encoder outputs at each step. Assume that a ﬁxed-

length codeword with n bits is required to deﬁne a new symbol, for instance n = 7for

the reduced ASCII code. As this table indicates, the seven-bit codes for symbols G and

Oare1000111and 1001111, respectively. So in Table 10.4, the encoder, thus, needs

to output

01000111=01000111 at stage k = 1(new symbol G),

1001111=11001111 at stage k = 2(new symbol O),

00 at stage k = 3(old symbol O),

and so on. Since the ﬁrst part of each of these codewords is a preﬁx code known by

the decoder (from the previous stage) and the second part is a standard ﬁxed-length

code, the decoder can make sense of the codewords as an uninterrupted stream, without

Table 10.4 (Top) information provided by the encoder at each coding step k for the GOODTOSEEYOU example described in Table 10.3, with corresponding codewords

and symbol deﬁnitions

. For clarity, the last two lines indicate whether the incoming symbols in the message string are new or old, and the message status; (Bottom)

number of bits required to convey the information at each step, where n is the ﬁxed codeword length used to deﬁne the symbols.

k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 k = 11 k = 12

0 1 00 11 011 000 100 0001 110 0101 01 0011

GO D T S E Y U

New New Old New New Old New New Old New Old New

G GO GOO GOOD GOOD

GOOD

TOS

GOODTOSE GOOD

TOSEE

GOODTOSEEY GOOD

TOSEE YO

GOODTOSEE

YOU

1+n 1+n 22+n 3+n 33+n 4+n 34+n 24+n

10.3 Adaptive Huffman coding 199

needing any codeword delimiters or blanks. The data shown at the bottom of Table 10.4

indicate that the total information required to encode GOODTOSEEYOU takes 32 + 8n

bits, which, for n = 7, comes to 88 bits.

For comparison purposes, consider the case of static Huffman coding. In this case, the

encoder ﬁrst reads the full message GOODTOSEEYOU and then computes the Huffman

coding tree, which yields the same codeword assignment as shown in Table 10.3, except

that the symbol ∅is not used. The encoder must then deﬁne the coding tree as overhead

information. From the table data, and assuming n bits to deﬁne each of the eight symbols

(G, O, D, T, S, E, Y, U), the overhead size comes to 2x(2 +n) + 3x(3 + n) + 3x(4+

n) = 29 +8n. With n = 7, the overhead size is, therefore, 85 bits. On the other hand, the

GOODTOSEEYOU message payload represents 2x(2) +3x(3) +3x(4) = 25 bits. The

total message length (overhead + payload) is, therefore, 85 + 25 = 110 bits. This result

compares with the 88-bit full message length of the adaptive FGK coding, which is 20%

shorter. The FGK performance can be even further improved by decreasing the size of

the overhead information, namely the deﬁnition of the N identiﬁed source symbols. Such

a deﬁnition requires N log

N bits. Using seven-bit ASCII, the source alphabet size is

= 128, which covers more than the ensemble of computer-keyboard characters. If

the message to be encoded uses fewer than 128 symbols, the overhead can be reduced

to fewer than seven bits. Regardless of the source size, it is possible to make a list of

all possible symbols and to attribute to each one a code number of variable length, for

instance deﬁned by



−log p(x

)



, where p(x

) is a conservative estimate of the symbol

probability distribution with long messages. With this approach, the average overhead

size is minimal, the most frequent symbols having shorter code-number deﬁnitions,

and the reverse for the least frequent ones. Both encoder and decoder must share this

standard symbol codebook.

We conclude from this tedious but meaningful exercise that when taking into account

the overhead, adaptive coding may yield a performance signiﬁcantly greater than that

of static coding. Such an advantage must be combined with the fact that encoding

and decoding can be performed dynamically “on the ﬂy,” unlike in the static case.

Furthermore, changes in the source’s characteristics (symbol alphabet and distribution)

do not affect the optimality of the code, which, as its name indicates, is adaptive. The

price to pay for these beneﬁts is the extensive computations required (updating the coding

tree for each symbol received). In contrast, static Huffman coding is advantageous as

the message-source characteristics are ﬁxed, or only slowly evolving in time. In this

case, signiﬁcantly longer messages can be optimally encoded with the same coding

tree, without needing updates. Static-coding-tree updates can, however, be forwarded

periodically, representing a negligible loss of coding performance, due to the large

information-payload size transmitted in between.

As an example of an application,

FGK is used for dynamic data compression and ﬁle archival in a UNIX-environment

utility known as compact.

It is an interesting class project to perform the comparison between static and adaptive Huffman codings,

based on the same English-text message but considering extracts of different sizes. The goal is to ﬁnd

the break-even point (message size) where the performance of adaptive coding, taking into account the

overhead of transmitting the 26-letter alphabet codeword, becomes superior to that of static coding.

200 Integer, arithmetic, and adaptive coding

An improvement on FGK is provided by the Vitter algorithm,or“algorithm V.”

The difference with FGK is that the coding tree recomputations are subject to certain

constraints. The nodes are still numbered according to the sibling rule, but in the

numbering all leaves of weight w are put under the internal node of the same weight.

The ensemble of nodes of the same weight is said to form a block, always with an

internal node as the leader with the highest number. The coding-tree update can be seen

as moving certain nodes from one block to another with greater weight. The rule is that

moving nodes automatically take the place of the leading node of the target block. A

second constraint concerns the codeword assignment. The algorithm seeks to minimize

both functions deﬁned by s =



l(x

) (called external path length) and m = max[l(x

)]

(called tree height), where l(x

) is the codeword length assigned to symbol x

. Note that

the decoder is equipped with the same program, and is, therefore, able to reconﬁgure the

coding tree and codeword assignment in a way strictly identical to that of the encoder.

The above constraints guarantee a coding tree with minimal height and the minimal

number of codeword bits. This Vitter tree is then best suited to the next update, but only

under the assumption that the next symbol in the message sequence is not new (or is not

already a tree leaf) and that all symbols are nearly equally probable. For these reasons,

algorithm V generally outperforms FGK in terms of number of transmitted bits, and it

can be shown that this number is, at worse, one extra bit greater than that of the static

Huffman code payload.

10.4 Lempel–Ziv coding

The adaptive coding devised in 1977 by Lempel and Ziv, now widely referred to as

Lempel–Ziv (LZ), or Ziv–Lempel, or more commonly LZ77, is fundamentally different

from the previous Huffman/FGK/algorithm-V approach. In a way that recalls the prin-

ciple of static arithmetic coding, the LZ codewords are generated by symbol blocks.As

previously described, arithmetic coding is a deﬁned-word scheme: it maps all possible

source-symbol blocks into a ﬁnal codeword set, as allowed by the maximum codeword

size (or arithmetic resolution). With LZ, symbol blocks are dynamically analyzed and

corresponding codewords are generated on the ﬂy. One sometimes refers to LZ as a

free-parse algorithm, meaning that codewords are generated as the incoming message

sequence is parsed.

Here, the word parsing deﬁnes the action of breaking up a message string into different

substring patterns, which are called phrases.Thetermdistinct parsing refers to the case

where no two phrases are identical.

Basically, the LZ algorithm consists of analyzing the source-message string by blocks

of variable size (substrings). The maximum block size is deﬁned by some prescribed

integer L

, for instance L

= 16 bits. The substring analysis makes it possible to identify

previously observed patterns or phrases. These phrases are assigned a codeword whose

length, called L

,isﬁxed, for instance L

= 8 bits. The phrases are deﬁned so that their

probabilities of occurrence are nearly equal. As a result, the most frequently occurring

symbols are grouped into longer phrases, and the reverse for the least frequently occur-

ring ones. Therefore, the same codeword length is used indifferently to represent long

10.4 Lempel–Ziv coding 201

or short phrases (and even single symbols), all of them having nearly equal occurrence

probability. This should not be confused with the principle of block codes (Chapter 9)

where the codeword length is ﬁxed but the corresponding distribution is nonuniform.

Another key difference is that LZ is an adaptive algorithm, which is capable of learning

from the source’s characteristics (the most frequently used symbol-sequence patterns,

like English words or their fractions) and to generate optimal codewords on the ﬂy,

without having to parse the entire message sequence, or to assume any probabilistic

model for the source.

We shall now analyze the details of the LZ algorithm by considering a binary sequence

and parsing it into distinct phrases. In the following, we shall use binary message strings

with 1 and 0 bits as the symbol alphabet, but this does not remove the generality of the

LZ algorithm in terms of alphabet size. As a working example, assume the following

25-bit sequence:

0101101101001000101101100.

Parsing the sequence consists of generating a list of all distinct phrases that can be

identiﬁed in the sequence, each one being different from any one previously observed.

Each phrase is then assigned an address number k = 1, 2, 3,..., which is called a

pointer. Using the above sequence, this parsing action gives:

Phrase:

0 1 01 10 11 010 0100 0101 101 100;

Pointer (decimal):

12345678910;

Pointer (binary):

0001 0010 0011 0100 0101 0110 0111 1000 1001 1010.

We note that the 10 identiﬁed phrases have equal probability, namely p = 1/10. We

also note that each phrase is the preﬁx of some other phrase coming up in the list: for

instance, phrase k = 3 (01) is the preﬁx of phrase k = 6 (010); in turn, phrase k = 6

is the preﬁx of phrases k = 7 (0100) and k = 8 (0101), and so on. We can, therefore,

associate a preﬁx phrase with the two other phrases that differ only by the last bit,

for instance 7 ↔ (6, 0) and 8 ↔ (6, 1), where the ﬁrst number is the preﬁx pointer

and the second number is the differing bit (referred to as extension character). The LZ

codeword is given by concatenating the pointer and the extension character, with the

pointer expressed in binary. The ﬁrst two phrases, which are made of a single bit, are

concatenated with pointer k = 0, which corresponds to the empty phrase. From these

rules, we obtain from our example:

Phrase:

0 1 01 10 11 010 0100 0101 101 100;

Pointer (decimal):

12345678910;

(Preﬁx pointer, bit), decimal:

(0, 0) (0, 1) (1, 1) (2, 0) (2, 1) (3, 0) (6, 0) (6, 1) (4, 1) (4, 0).

202 Integer, arithmetic, and adaptive coding

Table 10.5

Lempel–Ziv

(LZ) codeword dictionary generated from

the message example

0101101101001000101101100

. The last

bit of the LZ codewords is highlighted for reading clarity.

Pointer Phrase (Preﬁx pointer, bit) Codeword

0 ∅ ––

10(0, 0) 00000

21(0, 1) 00001

301 (1, 1) 00011

410 (2, 0) 00100

511 (2, 1) 00101

6 010 (3, 0) 00110

7 0100 (6, 0) 01100

8 0101 (6, 1) 01101

9 101 (4, 1) 01001

10 100 (4, 0) 01000

The sequence is, thus, coded into the following decimal representation:

(0, 0) (0, 1) (1, 1) (2, 0) (2, 1) (3, 0) (6, 0) (6, 1) (4, 1) (4, 0),

which must be converted into a string of binary codewords, as listed in Table 10.5.The

set of LZ codewords is referred to as a dictionary. Given the ﬁxed codeword length

, which is the number of pointer or address bits plus one, the maximum dictionary

size is, therefore, 2

L(2)−1

. In this example, we allot four bits to the pointer, which limits

the dictionary size to 2

= 16 codewords of length L

= 5 bits. Based on Table 10.5,

the initial 25-bit message is thus converted into the following coded string (underscores

being introduced for reading clarity):

00000

00001 00011 00100 00101 00110 01100 01101 01001 01000,

which is 50 bits long. The LZ code, thus, realizes a twofold expansion of the source

message, which is a characteristic feature of the initialization stage. As the message

length increases, the bit phrases become more redundant, and the beneﬁts of LZ coding

for data compression begin to appear, as discussed later.

Decoding an LZ sequence is straightforward. A nice feature is that the decoder does

not need to know the codeword size. As we have seen, the LZ algorithm imposes the

constraint that the ﬁrst two codewords in the sequence be of the form 000 ...00 or

000 . . . 01, namely a 0 or 1 bit preceded by a preﬁx of size L

− 1 bits. This information

automatically provides the codeword length L

. The decoder then registers successive

codewords up to rank 2

−1

, which represent the LZ code alphabet, and puts them into

a dictionary along with their corresponding source-message contents. For instance, the

codeword 10001 at line 10 of the decoder’s memory (pointer = 10) means 1 preceded by

the message contents of line 1000

≡8

(pointer = 8). According to the same rule, this

content has been established to be 0101, so the content of line 10 is 0101 + 1 = 01011.

The whole process is strictly equivalent to reconstructing the equivalent of Table 10.5

from top to bottom, but starting from each new codeword received up to 2

−1

.It

is a remarkable feature that the decoder is able to interpret the LZ code without any

dictionary being ever transmitted.

10.4 Lempel–Ziv coding 203

01011011010010001010100101011010010010111010010101

01110010111010101110001001010101110001011101010111

00010110101110101011101001001011101010111010100101

01011100010111010111000100101010111000101110101011

-1

-01

-10

-11

-010

-0100

-0101

-01001

-01011

-010010

010111

-0100101

-0101110

-01011101

-01011100

0100101-01011100-01011101-01011100-01011-01011101-

010111-010010-01011101-01011101-0100101-01011100-

010111-01011100-0100101-01011100-01011101-01011-

(a)

(b)

00000000010001100100001010011001100011010111110001

10010101011011111000111011110010111111001110111100

10001111011010110010111011110110111111001010111100

10111111001110110001

(c)

Figure 10.5 Coding a message bit sequence with the Lempel–Ziv (LZ) algorithm:

(a) Input message (200 bits including 95 0s and 105 1s);

(b) Parsing of message (a) into distinct phrases, as shown in alternative black and gray colors; the

ﬁrst 16 phrases generate the LZ codewords (see correspondence in Table 10.6); the 16 subscripts

correspond to pointer addresses;

message (a).

If the LZ code must be reconﬁgured, the encoder can send either of the two codewords

000 . . . 00 or 000 . . . 01, recalling that these two have only appeared once at the very

beginning of the coded sequence. In our example (Table 10.5), note that only 00001

can be used for this purpose.

Receiving an unexpected codeword is an anomaly,

which instructs the LZ decoder that a new dictionary has been used in the sequence

bits to follow. The decoder then reads the next codeword to check out the (possibly

new) codeword size, and the process of reconstructing the dictionary from scratch starts

again. Because the dictionary construction is sequential and as rapid as the codeword

acquisition, the ﬂow of message or code data remains uninterrupted, even in the process

of dictionary reconstruction. This feature makes LZ coding a truly dynamic, adaptive,

and “free-parse” algorithm.

A puzzling question remains: how is a code with ﬁxed-length codewords ever

able to achieve any data compression? Before going through the details of a formal

Indeed, 00000 may appear several times in the sequence, since we don’t have a codeword for the contents

00, and 00 is the preﬁx of no codeword in the dictionary. In the event that any of the 16 source-message

phrases is followed by 00x, the code necessarily uses 00000 to represent the ﬁrst and single 0. The next

codeword depends on the third bit x.Ifx =0, the code uses again 00000to represent the second and single

0. If x = 1, the second and third bits, 01 form the preﬁx of several content possibilities in the dictionary,

including the stand-alone 01 (Table 10.5).

204 Integer, arithmetic, and adaptive coding

Table 10.6

Lempel–Ziv

(LZ) codeword dictionary generated from distinct parsing of the 200-bit

message example shown in Fig. 10.5. The codeword frequency is also shown.

Pointer Phrase (Preﬁx pointer, bit) Codeword Frequency

0 ∅ –– 0

10 (0, 0) 00000 1

21 (0, 1) 00001 1

301 (1, 1) 00011 1

410 (2, 0) 00100 1

511 (2, 1) 00101 1

6 010 (3, 0) 00110 1

7 0100 (6, 0) 01100 1

8 0101 (6, 1) 01101 1

9 01001 (7, 1) 01111 1

10 01011 (8, 1) 10001 3

11 010010 (9, 0) 10010 2

12 010111 (10, 1) 10101 3

13 0100101 (11, 1) 10111 4

14 0101110 (12, 0) 11000 1

15 01011101 (14, 1) 11101 6

16 01011100 (14, 0) 11100 6

demonstration, let us consider a signiﬁcantly longer message, for instance the 200-

bit-long sequence shown in Fig. 10.5. In this second example, I have purposefully

overemphasized the bit-pattern redundancy at the beginning of the message to obtain 16

individual phrases of rapidly increasing size. The corresponding LZ codeword dictionary,

as based on four-bit pointer addresses, is shown in Table 10.6. I have also continued the

rest of the message by preferentially using the longest phrases previously identiﬁed and

repeating them once in a while, again to emphasize redundancy. As a result, we obtain

34 phrases. Since each phrase corresponds to a ﬁve-bit LZ codeword (Table 10.6),

the total LZ sequence length is 34 × 5 = 170 bits, as shown in Fig. 10.5. Thus, the

LZ code has achieved a compression equal to 1 − 170/200 = 25% of the initial, 200-

bit source message. From the codeword frequency analysis (Table 10.6), we can also

calculate the entropy of the compressed-message source. As easily calculated from the

table data, the result turns out to be H = 3.601 bit/word. Since the codeword length is

= 5 bits, this result corresponds to a coding efﬁciency of η = 3.601/5 = 70.03%.

In this example, we have ﬁrst generated a LZ codeword dictionary until the number

of available pointer addresses was exhausted. We have then re-used the dictionary as a

“static code” to process the rest of the message, which provided the compression effect.

In reality, compression occurs even as the dictionary is generated and before memory-

size exhaustion, but this effect is only observed in relatively long message sequences

(see further). To emphasize, the LZ algorithm is not meant to be implemented as a

static code past the point of memory saturation (although this remains a valid option),

but to work as a dynamic code, regardless of the sequence length. Since the number of

codeword entries increases indeﬁnitely, the process of updating the dictionary should be

periodically reinitiated. A new dictionary can be reconstructed from scratch, or cleaned

up from the earlier entries to make up new space.

10.4 Lempel–Ziv coding 205

Given the maximum codeword size L

, the LZ algorithm can identify as many as 2

−1

individual phrases, for instance 2

= 32 768 phrases with L

= 16. Such a number is

more than sufﬁcient to cover all possible patterns in a redundant source message and,

therefore, ensure effective code compression. This remains true in the case where the

source characteristics evolve over time. If the memory size is assumed to be sufﬁciently

large to ﬁt a virtually unlimited number of entries, a key question is whether or not the

coded sequence can ever be shorter than the original message. To answer this question,

we need to know the number of possible phrases, c(n), and the LZ codeword size, L

given a message length of n bits. The Lempel and Ziv analysis of distinct parsing, which

provides the answer, is detailed in Appendix I, and we shall use the result here. Given

a message of n bits, the operation of distinct parsing yields c(n) phrases to which c(n)

unique codewords are associated. The number of pointer-address bits required to deﬁne

such a parsing is log c(n), which gives L

(n) = log c(n) + 1 for the codeword size.

Therefore, the coded version of the message has a full sequence length of:

(n) = c(n)[log c(n) + 1]. (10.4)

The code compression (or expansion) is given by the ratio:

R(n) =

(n)

c(n)

[log c(n) + 1]. (10.5)

The issue now is to determine the upper bound of η

in the limit of inﬁnitely long

messages (n →∞). In Appendix I, it is shown that the number of distinct-parsing

phrases c(n) is bounded according to:

c(n) ≤

(1 − ε

)logn

, (10.6)

where ε

vanishes as n →∞. We, thus, observe that the growth of the number of phrases

with message size is sublinear, with an absolute upper bound of n/ log n. Another useful

information is that the codeword size is bounded according to

(n) = log c(n) + 1 ≤ log



(1 − ε

)logn



+ 1 ≤ log(2n). (10.7)

The codeword length, thus, increases somewhat slower than the logarithm (base 2) of

the message length n. Both properties concerning the growth of c(n) and L

(n)are

advantageous for memory size considerations. Replacing next the property in Eq. (10.6)

in Eq. (10.5) yields for the compression ratio:

R(n) ≤

(1 − ε

)logn



log



(1 − ε

)logn



+ 1

≡

(1 − ε

)



1 +

1 − log log n − log(1 − ε

)

log n

(10.8)

Taking the inﬁnite limit in Eq. (10.8) leads to the conclusion R(n →∞) ≤ 1,

which proves that compression is possible (the compression rate being deﬁned as

r = 1 − R(n)). Yet, we don’t know exactly how much compression can be achieved.

In the example of Fig. 10.5, for instance, we have n = 71 bits, c(n) = 16 and L

= 5.

206 Integer, arithmetic, and adaptive coding

A more general and ﬁner-grain analysis, which requires a very involved demonstration,

is provided in.

Here, I shall just provide the conclusion, which takes the form

lim sup

n→∞

R(n) ≤ H (X ), (10.9)

where H(X ) is the entropy of the binary source X =

{

0, 1

}

from which the message

string is generated.

This property shows that the LZ code is asymptotically optimal.

In Section 3.2, indeed, we have seen that a code is optimal if, given the source entropy

H(X ), the mean codeword length

L satisﬁes:

H(X ) ≤

L < H (X) + 1. (10.10)

This inequality also applies to any optimal block code with ﬁxed length L

(n), as applied

to an n-bit message from extended source X

(n)

H(X

(n)

) ≤ L

(n) < H(X

(n)

) + 1. (10.11)

Dividing Eq. (10.11) by n and taking the limit n →∞yields:

lim

n→∞

H(X

(n)

)

≤ lim

n→∞

(n)

< lim

n→∞

H(X

(n)

)

+ lim

n→∞

, (10.12)

or, equivalently,

lim

n→∞

(n)

= lim

n→∞

H(X

(n)

)

= H (X). (10.13)

The last equality in Eq. (10.13) is made under the assumption that, in the inﬁnite limit,

the extended source becomes memoryless. This concept means that the source events

become asymptotically independent as their number increases, which justiﬁes the limit.

Returning to the earlier issue, we can, thus, conclude from the result in Eq. (10.9)

that LZ coding is asymptotically optimal, with an average codeword length (bit/source-

symbol) ultimately no greater than the source entropy.

The original LZ77 algorithm (also called LZ1) was further developed into a LZ78 ver-

sion (also called LZ2). The latter version corresponds to the dictionary-based approach

that I have described. The original LZ77 uses, instead, a sliding-window approach. The

LZ77 algorithm checks out the input symbol sequence, and searches for any match in a

sliding-window buffer (the dictionary equivalent). The algorithm output, called a token,

is made of three numbers: the offset of the sliding window, where the matched sequence

is found to begin; the length of the matched sequence; and the unique code of the last

unmatched symbol. Both LZ77 and LZ78 have generated a proliﬁc family of variants

(and related patents), which can be listed as:

(For LZ77): LZR, LZSS (Lempel–Ziv–Storer–Szymanski), LZB, and LZH (Lampel–

Ziv–Haruyasu);

(For LZ78): LZW (Lempel–Ziv–Welch), LZC, LZT, LZMW, LZJ, and LZFG.

T. M. Cover and J. A. Thomas Elements of Information Theory (New York: John Wiley & Sons, 1991).

Recall that the entropy of a binary source is bounded to the maximum H

max

(X ) = 1, which corresponds to

a uniformly distributed source.