Desurvire E. Classical and Quantum Information Theory: An Introduction for the Telecom Scientist

Подождите немного. Документ загружается.

9.4 Exercises 177

100

Number of games

Coding efficiency (%)

Figure 9.10 As in Fig. 9.9, with additional results from Shannon–Fano coding up to n = 90 (bold

line), showing a global convergence trend towards Huffman-coding efﬁciency.

In this example, we pushed the investigation relatively far, but this was for the sake

of mathematical curiosity. In practice, there is little interest nor anything practical to

implement Huffman or Shannon–Fano codings when the source has a number of events

signiﬁcantly greater that 2

= 65 536. Should we take this number as a reference,

this means that compression codes may not be applied to binary sources with lengths

greater that n = 16, for which the uncompressed representation is 16 bits (two bytes).

In contrast, a block code such as used in this roulette example can be reasonably

and efﬁciently applied to sources up to 2

= 4 294 967 296 elements, which requires

the incredibly shorter length of 5.77 bit/word on average to encode, as opposed to

32 bit/word! How this last solution can be effectively implemented is a question of

memory space (i.e., 2

addresses for table look-up) and considerations of computer

speed, both having economical impacts in practical applications. Block coding remains

advantageous because of the possibility of splitting events into many categories and

types, which separates the tasks of code assignment between a header and a trailer.

Each of these code sub-blocks has fewer bits to handle, and this is what makes the

approach more practical when dealing with sources with relatively large numbers of

events.

9.4 Exercises

9.1 (M): Assign a Huffman code to the two-dice roll distribution described in Chapter

1 (as listed in Table 9.6 overleaf), with the results of the roll {2, 3, 4,...,12}being

symbolized by the characters {A, B, C,...,K} and calculate the coding efﬁciency.

178 Optimal coding and compression

Table 9.6 Data for Exercise 9.1.

Symbol x Probability p(x)

A = 2 0.028

B = 3 0.056

C = 4 0.083

D = 5 0.111

E = 6 0.139

F = 7 0.167

G = 8 0.139

H = 9 0.111

I = 10 0.083

J = 11 0.056

K = 12 0.028



1.000

9.2 (T): Prove that Huffman coding for uniformly distributed sources of N = 2

sym-

bols (n an integer) yield a mean codeword length of l(x) = n.

9.3 (T): Show that for dyadic sources, the Huffman code is 100% efﬁcient.

Clue: Prove this ﬁrst using two-element and three-element sources, then conclude

in the general case.

9.4 (M): Find a block code to describe the outcome of ﬁve successive coin tosses, and

determine the corresponding coding efﬁciency.

10 Integer, arithmetic, and

adaptive coding

This second chapter concludes our exploration tour of coding and data compression.

We shall ﬁrst consider integer coding, which represents another family branch of opti-

mal codes (next to Shannon–Fano and Huffman coding). Integer coding applies to

the case where the source symbols are fully known, but the probability distribution is

only partially known (thus, the previous optimal codes cannot be implemented). Three

main integer codes, called Elias, Fibonacci, and Golomb–Rice, will then be described.

Together with the previous chapter, this description will complete our inventory of static

codes, namely codes that apply to cases where the source symbols are known, and the

matter is to assign the optimal code type. In the most general case, the source sym-

bols and their distribution are unknown, or the distribution may change according to

the amount of symbols being collected. Then, we must ﬁnd new algorithms to assign

optimal codes without such knowledge; this is referred to as dynamic coding.Thethree

main algorithms for dynamic coding to be considered here are referred to as arithmetic

coding, adaptive Huffman coding, and Lempel–Ziv coding.

10.1 Integer coding

The principle of integer coding is to assign an optimal (and predeﬁned) codeword to a

list of n known symbols, which we may call

{

1, 2, 3,...,n

}

. In such a list, the symbols

are ranked in order of decreasing frequency or probability, or mathematically speaking,

in order of “nonincreasing” frequency or probability. This ranking assumes that at least

the ranks of the most likely symbols are known beforehand; the remaining less-likely

symbols being arranged in the list in any order. In this case, neither Shannon–Fano coding

nor Huffman coding can be implemented, and we must then be looking for new types

of “heuristic” code, which will exhibit minimal redundancy (the difference between the

obtained mean codeword length and the source entropy). Such is the rationale for integer

coding. The most frequently used algorithms are given by the Elias, Fibonacci, and

Golomb–Rice codes, which I shall describe next.

Elias codes come in two different types, which are named Elias-gamma and Elias-

delta. The correspondence between the ﬁrst 32 integers, their uncompressed binary

representation, and the Elias-gamma or delta codewords is shown in Table 10.1.An

explanation of the codeword assignment follows.

180 Integer, arithmetic, and adaptive coding

Table 10.1 Various types of integer coding: (a) nonparameterized with

Elias

codes (

gamma

and

delta

) and (b) parame-

terized with

Fibonacci

code (m = 2)and

Golomb

code (simple with m = 8 and actual with m = 6).

Elias codes Fibonacci Golomb

m = 2

i log

i Uncompressed Gamma Delta Simple Actual

m = 8 m = 6

1 0 0000 0001 1 1 11 1 000 1 00

2 1 0000 0010 0 10 0 100 011 1 001 1 01

3 1 0000 0011 0 11 0 101 0011 1 010 1 100

4 2 0000 0100 00 100 0 1100 1011 1 011 1 101

5 2 0000 0101 00 101 0 1101 00011 1 100 1 110

6 2 0000 0110 00 110 0 1110 10011 1 101 1 111

7 2 0000 0111 00 111 0 1111 01011 1 110 01 00

8 3 0000 1000 000 1000 00 100000 000011 1 111 01 01

9 3 0000 1001 000 1001 00 100001 100011 01 000 01 100

10 3 0000 1010 000 1010 00 100010 010011 01 001 01 101

11 3 0000 1011 000 1011 00 100011 001011 01 010 01 110

12 3 0000 1100 000 1100 00 100100 101011 01 011 01 111

13 3 0000 1101 000 1101 00 100101 0000011 01 100 001 00

14 3 0000 1110 000 1110 00 100110 1000011 01 101 001 01

15 3 0000 1111 000 1111 00 100111 0100011 01 110 001 100

16 4 0001 0000 0000 10000 00 1010000 0010011 01 111 001 101

17 4 0001 0001 0000 10001 00 1010001 0001011 001 000 001 110

18 4 0001 0010 0000 10010 00 1010010 1001011 001 001 001 111

19 4 0001 0011 0000 10011 00 1010011 0101011 001 010 0001 00

20 4 0001 0100 0000 10100 00 1010100 1101011 001 011 0001 01

21 4 0001 0101 0000 10101 00 1010101 00000011 001 100 0001 100

22 4 0001 0110 0000 10110 00 1010110 10000011 001 101 0001 101

23 4 0001 0111 0000 10111 00 1010111 01000011 001 110 0001 110

24 4 0001 1000 0000 11000 00 1011000 00100011 001 111 0001 111

25 4 0001 1001 0000 11001 00 1011001 00010011 0001 000 00001 00

26 4 0001 1010 0000 11010 00 1011010 00001011 0001 001 00001 01

27 4 0001 1011 0000 11011 00 1011011 10001011 0001 010 00001 100

28 4 0001 1100 0000 11100 00 1011100 01001011 0001 011 00001 101

29 4 0001 1101 0000 11101 00 1011101 11001011 0001 100 00001 110

30 4 0001 1110 0000 11110 00 1011110 00101011 0001 101 00001 111

31 4 0001 1111 0000 11111 00 1011111 10101011 0001 110 000001 00

32 5 0010 0000 00000 100000 00 11000000 11101011 0001 111 000001 01

The Elias-gamma codeword of an integer i is given by its binary representation

up to the 1 bit of highest weight, prefaced by a number of zeros equal to log

i.

The expression





(ﬂoor(x)) means the smallest integer near or equal to x. Thus, we

have log

1=0, log

2=1, log

3=1, log

4=2, and so on. According to

the above rules, the Elias-gamma codewords for i = 4 and i = 13 are gamma(4) =

100 and gamma(13) = 000 1101, respectively (the underscore _ being introduced

for clarity).

The Elias-delta codeword is deﬁned by gamma(log

i+1), followed by the minimal

binary representation of i with the most signiﬁcant 1 bit being removed. With i = 4,

10.1 Integer coding 181

for instance, we have i = 100 2 and gamma(log

4+1) = 011, which, with this rule,

yields delta(4) = 011

00. With i = 14, we have i = 1110 2 and gamma(log

14+

1) = gamma(4) = 00100, which, with this rule, gives delta(14) = 00100

110. Table

10.1 shows that for small integers (except i = 1) the Elias-delta codewords are longer

than the gamma codewords. The lengths become equal for i = 16 to i = 31. The situation

is then reversed for i ≥ 32. This shows that Elias-delta coding is preferable for sources

with i ≤ 31, while Elias-gamma coding is preferable for larger sources. The important

difference between the two codes is their asymptotic limit when the source entropy goes

to inﬁnity. It can be checked, using a short tabulating program, that the asymptotic limit

of the coding efﬁciency, η = H/L, is equal to 50% for the Elias-gamma code, while it is

equal to 100% for the Elias-delta code.

Therefore, the Elias-delta code is asymptotically

optimal.

The fact that the two Elias codes are not optimal (except asymptotically for Elias-

delta) does not preclude their use for data compression. For instance, taking the English-

language source with the distribution listed in Table 8.3 and the corresponding entropy

H = 4.185 bit/symbol, we ﬁnd that both Elias-gamma and Elias-delta codings have a

mean codeword length of L = 5.241 bit/word, corresponding to an efﬁciency of η =

H/L = 79.83%. For comparison, Huffman coding yields

L = 4.212 bit/word and η =

99.33%. The Elias codes, thus, make it possible to achieve a nonoptimal but acceptable

coding performance on limited-size sources. The same conclusion applies to Elias-delta

coding for large sources, with the advantage of being straightforward to implement, in

contrast with Huffman coding. An Elias-delta variant, known as Elias-omega or recursive

Elias coding makes it possible to shorten the codeword lengths, but with limited beneﬁts

as the source size increases.

The Elias coding approach is referred to as nonparameterized. This means that the

symbol–codeword correspondence is ﬁxed with the code choice (gamma, delta, recur-

sive). In parameterized coding, an integer parameter m is introduced to create another

degree of freedom in the choice and optimization of codeword lengths. This is the case

of the Golomb codes and the Fibonacci codes, which I describe next.

The Fibonacci codes are based on the Fibonacci numbers of order m ≥ 2.

Fibonacci numbers form a suite of integer numbers F(−m + 1), F (−m + 2),...,F(0),

F(1) ...F(k), which are deﬁned as follows:

(a) The number F(k) with k ≥ 1 is equal to the sum of all preceding m numbers;

(b) The numbers F(−m + 1) to F(0) are all equal to unity.

Taking, for instance, m = 2, we have F(−1) = F(0) = 1, thus, F(1) = F(−1) +

F(0) = 2, F(2) = F(1) + F(0) = 3, and so on, which yields the Fibonacci-number

suite 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . The construction of a Fibonacci code based on

the m = 2 parameter for the integer set {1, 2,...,34} is illustrated by the example

shown in Table 10.2. As the table indicates, the integer set {1, 2,...,34} is ﬁrst listed

As intermediate values, we ﬁnd for a 64-element, uniformly distributed source (H = log 64 = 6), the coding

efﬁciencies of η(gamma) = 64.8% and η(delta) = 68.8%.

See, for instance: http://en.wikipedia.org/wiki/Elias_omega_coding.

182 Integer, arithmetic, and adaptive coding

Table 10.2 Construction of a Fibonacci code of order m = 2

from the suite of Fibonacci numbers shown at the bottom.

iF(i)

1111

2 1 0 011

3 1 0 0 0011

4 1 0 1 1011

5 1 0 0 0 00011

6 1 0 0 1 10011

7 1 0 1 0 01011

8 1 0 0 0 0 000011

9 1 0 0 0 1 100011

10 1 0 0 1 0 010011

11 1 0 1 0 0 001011

12 1 0 1 0 1 101011

13 1 0 0 0 0 0 0000011

14 1 0 0 0 0 1 1000011

15 1 0 0 0 1 0 0100011

16 1 0 0 1 0 0 0010011

17 1 0 0 1 0 1 0001011

18 1 0 1 0 0 0 1001011

19 1 0 1 0 0 1 0101011

20 1 0 1 0 1 0 1101011

21 1 0 0 0 0 0 0 00000011

22 1 0 0 0 0 0 1 10000011

23 1 0 0 0 0 1 0 01000011

24 1 0 0 0 1 0 0 00100011

25 1 0 0 0 1 0 1 00010011

26 1 0 0 1 0 0 0 00001011

27 1 0 0 1 0 0 1 10001011

28 1 0 0 1 0 1 0 01001011

29 1 0 1 0 0 0 0 11001011

30 1 0 1 0 0 0 1 00101011

31 1 0 1 0 0 1 0 10101011

32 1 0 1 0 1 0 0 11101011

Fibonacci: 21 13 8 5 3 2 1

in increasing order. The suite of Fibonacci numbers, {1, 2, 3, 5, 8, 13, 21}, starting with

F(0) = F(1) = 1uptoF(7) = 21, is written at bottom, from right to left, deﬁning seven

columns. It is easily checked that all integer numbers are given by a sum of the Fibonacci

numbers, for instance:

7 = 5 + 2 = 1 × F(4) + 0 × F(3) + 1 × F (2) +0 × F(1),

12 = 8 + 3 +1 = 1 × F(5) + 0 × F(4) + 1 × F(3) +0 × F(2) + 1 × F(1),

which can be coded as 7

≡ 1010

Fibonacci

and 12

≡ 10101

Fibonacci

, respectively.

The second column in Table 10.2 shows the codewords obtained according to such

It is left as an exercise to show that a Fibonacci code of order m = 3 requires level-three coding.

10.1 Integer coding 183

a decomposition into Fibonacci numbers. The actual Fibonacci code is obtained by

taking the mirror image of this initial codeword and appending a 1 postﬁx, as seen

from the last column at right: thus f (i = 1) = 11, f (i = 2) = 011, f (i = 3) = 0011,

f (i = 5) = 00011, and so on. The result of this operation is a preﬁx code, i.e., a code

for which no codeword is the preﬁx of another codeword. This code example is also

listed in Table 10.1, for comparison with the Elias codes. The comparison shows that

the Fibonacci codewords are signiﬁcantly shorter. Using the English-language source

(Table 8.3), we ﬁnd that our Fibonacci code has a mean codeword length of L =

4.928 bit/word, which corresponds to an efﬁciency of η = H/L = 4.184/4.928 =

84.90%, and represents an improvement on the previous Elias-gamma and delta codes

(η = 79.83%).

It can be shown that Fibonacci codes are not asymptotically optimal, like Elias-gamma

codes but unlike Elias-delta codes. Higher-order Fibonacci codes (m > 2) have better

compression rates, provided that the source size is large and the probability distribution

nearly uniform. Even if the Elias-delta codes are asymptotically optimal, Fibonacci codes

of order two perform better with any source of size up to 10

/2 (precisely, n = 514 228).

This fact illustrates that asymptotic code optimality is not the only criterion for selecting

the most efﬁcient code. Such a feature was also illustrated in the previous example. The

comparative code performance also depends on the type of source distribution. Recalling

that the Fibonacci code is parameterized by an integer m ≥ 2, one must ﬁnd an optimal

value of m for each source distribution under consideration. Such an optimization is

advantageous, but case-speciﬁc. This represents both an advantage and a drawback, in

comparison with nonparameterized codes (such as Elias codes), which are ﬁxed once

and for all.

The Golomb codes constitute a second important category of parameterized codes.

For a Golomb code with parameter m, the codeword G(i) of integer i is made of two

parts:

(a) A preﬁx, which is made of 1 preceded by q zero bits, with q being the quotient

q =(i − 1)/m,

(b) A sufﬁx, which is the binary representation in log

m bits of the remainder r =

i − 1 −qm.

The ﬁrst rule (a) can also be changed with the deﬁnition q =



i/m



if, by convention,

the list of integers i is made to start from zero. The second rule (b) represents a simpliﬁed

version and is not the one actually used in Golomb codes. However, we shall use it as

a ﬁrst step for easily introducing the concept. To provide a practical example, assume

m = 4. We have, for the ﬁrst 12 integers:

i = 1,...,4 → q = 0 → preﬁx = 1,

i = 5,...,8 → q = 1 → preﬁx = 01,

i = 9,...,12 → q = 2 → preﬁx = 001, etc.,

D. A. Lelewer and D. S. Hirschberg, Data compression. Computing Surveys, 19 (1987), 261–97, see

www.ics.uci.edu/∼dan/pubs/DataCompression.html.

184 Integer, arithmetic, and adaptive coding

and

i = 1,...,4 → r = 0, 1, 2, 3 → sufﬁx = 00, 01, 10, 11,

i = 5,...,8 → r = 0, 1, 2, 3 → sufﬁx = 00, 01, 10, 11,

i = 9,...,12 → r = 0, 1, 2, 3 → sufﬁx = 00, 01, 10, 11.

We, thus, observe that the preﬁx increases by one bit at each multiple of m = 4 and that

the sufﬁx has a constant length and changes with a periodicity of m = 4. From the above

rules and with this example we obtain G(3) = 1

10, G(5) = 01 00 and G(12) = 001 11,

for instance (the underscore

being introduced for clarity). Note the preﬁx rule of 1

preceded by q zero bits is only conventional.

We can also deﬁne the preﬁx as 0 preceded

by q one bits, which represents the complement of the previous preﬁx (e.g., G(3) = 0

10,

G(5) = 10

00, and G(12) = 110 11). Another convention for the sufﬁx is to take the

smallest number of bits for the binary representation of r , which only changes 00 into

0. Thus, we have G(3) = 1

0 instead of 1 00, G(5) = 01 0 instead of 01 00, and so on.

This convention reduces the code length by one bit each time i = km + 1(k an integer).

With their preﬁx increase by blocks of m and their sufﬁx m periodicity, the Golomb

codes are straightforward to generate. Table 10.1 shows the nonoptimized Golomb code

for m = 8.

Consider next the actual Golomb code, which uses a more complicated rule (b) for

deﬁning the sufﬁx. This rule consists of coding the sufﬁx with c =log

m bits for the

ﬁrst c values of r (with 0 as the leading bit), and with c + 1 bits for the other values

(with 1 as the leading bit). Consider, for instance, m = 5. We have c =log

5=2.

Thus, we have, for the ﬁrst 15 integers:

i = 1,...,5 → q = 0 → preﬁx = 1,

i = 6,...,10 → q = 1 → preﬁx = 01,

i = 11,...,15 → q = 2 → preﬁx = 001, etc.,

and

i = 1,...,5 → r = 0, 1, 2, 3, 4 → sufﬁx = 00, 01, 100, 101, 110,

i = 6,...,10 → r = 0, 1, 2, 3, 4 → sufﬁx = 00, 01, 100, 101, 110,

i = 11,...,15 → r = 0, 1, 2, 3, 4 → sufﬁx = 00, 01, 100, 101, 110,

showing that in each period, the ﬁrst two sufﬁxes are coded with two bits with a leading

0, and the other sufﬁxes are coded with three bits with a leading 1. We note that the

three-bit sufﬁx codes do not correspond to a binary representation of r, unlike with our

previous deﬁnition. Table 10.1 shows the actual Golomb code corresponding to the case

m = 6. We observe that the second deﬁnition makes it possible to shorten the length of

Coding a number n by n − 1 zeros followed by a one bit is referred to as unary coding. For instance, the

numbers n = 2, n = 5, and n = 7 are represented in unary coding as 01, 00001, and 0000001, respectively.

This deﬁnition should not be confused with the “unary number representation,” which uses only one

symbol character (e.g., 1) and is deﬁned according to the rule 0

decimal

≡ 1

unary

decimal

≡ 11

unary

decimal

≡

111

unary

, etc.

10.2 Arithmetic coding 185

most codewords in the list. If we apply the Golomb code to the English-symbol source

(Table 8.3), we obtain mean codeword lengths of

L = 4.465 bit/word (ﬁrst, simple deﬁnition with m = 8),

L = 4.316 bit/word (second, actual deﬁnition with m = 6),

corresponding to efﬁciencies (η = H/L ≡ 4.184/L)ofη = 93.69% and η = 96.94%,

respectively. These two results represent a signiﬁcant improvement on the previous Elias

(gamma or delta) codes (η = 79.83%) and second-order Fibonacci codes (η = 84.90%).

It is clear that the Golomb-code parameter m must be optimized according to the source

size and distribution type. Golomb codes with low m have relatively small codewords

for the ﬁrst few integers (owing to the short sufﬁx of length log

m), but the length

rapidly increases because of the fast preﬁx increment. On the contrary, Golomb codes

with high m have relatively large codewords for the ﬁrst few integers (owing to the long

sufﬁx), but the length increases slowly because of the slow preﬁx increment. Golomb

codes with m = 2

(k an integer) have also been known as Rice codes. For this reason

one generally refers to Golomb–Rice codes to designate them altogether (with m = 2

corresponding to Rice codes). For sources of speciﬁc distribution types, it is possible

to determine the optimal exponent parameter k that minimizes the mean codeword

length. In the general case, this parameter can also be determined through heuristic

methods.

Integer coding based on various Elias or Golomb–Rice derivatives ﬁnds many applica-

tions in the ﬁeld of database management, ensuring rapid access to and optimal indexing

of library ﬁles. An illustrative application is the indexing of very large databases, such

as the inventories of nucleotide sequences in biology. Golomb–Rice codes are also used

in sound compression standards (see Appendix G).

10.2 Arithmetic coding

In static codes (Shannon–Fano, Huffman, block, and integer codings), it is implicitly

assumed that the source characteristics (events and probabilities) are known, with the

exception of integer coding, which only requires knowledge of the most likely events. In

any case, these codes are ﬁxed and optimized once and for all, so this approach is called

deﬁned-word coding. In the general case, one may not have such a prior knowledge. It is

also possible that the source properties change from time to time (nonstationary source),

in which case static codes lose their optimality or become inadequate. Nontypical English

texts, such as those analyzed in Chapter 9 (Table 9.2), provide an illustrative example

of deviation from the stationary source reference.

As we saw in that chapter, we

A famous example of a very unusual English-text source is the 1939 novel Gadsby (E. V. Wright), which, in

over 50 000 words, does not contain any character E whatsoever! Here is an extract from page 1:

If youth, throughout all history, had a champion to stand up for it; to show a doubting world that a child

can think; and, possibly, do it practically; you wouldn’t constantly run across folks today who claim that

“a child don’t know anything.” A child’s brain starts functioning at birth; and has, amongst its many infant

186 Integer, arithmetic, and adaptive coding

can use universal codebooks, which contain libraries of best codes for a variety of

sources, such as text, programming-language codes, or dataﬁles, or a statistical mix

of all possible combinations thereof. But the codebook approach is inadequate if the

source under investigation escapes any of these known types. Static coding is, therefore,

intrinsically limited, although it is most convenient because the symbol or codeword

correspondence only requires a one-time calculation and optimization. While Huffman

coding is ultimately optimal (overlooking overhead information), it is computationally

intensive for large sources. Block codes have been shown in Chapter 9 to offer some

simpliﬁcation advantage by coding symbols into “super-symbol” groups, but with the

drawback that the number of codewords rapidly becomes intractable with increasing

group sizes, for basic considerations of memory space and read–write times.

In the general situation, where the source characteristics are unknown, one must

implement what is equivalently referred to as dynamic or adaptive or stream coding,

which evokes the time-changing character of the codeword assignment according to the

evolving source characteristics. The basic philosophy of dynamic coding is to devise an

optimal code “on the ﬂy” for any sequence of incoming symbols, while minimizing the

number of operations required at each intermediate step. One does not need to know the

distribution of the single symbols forming the sequence, or the length of the sequence to

encode, which represents a signiﬁcant advantage over static coding. The symbol alphabet

and the distribution may also radically change from one input sequence to the next, and

the code is able to dynamically adapt to this. Arithmetic coding, which is described in

this section, represents an intermediate case where the code is dynamically conﬁgured

from the source, but the resulting codeword dictionary is then kept for extensive use,

just as in a static code. Two other approaches, referred to as adaptive Huffman coding,

and Lempel–Ziv (LZ) coding, are truly dynamically adaptive coding algorithms. These

are described in the following two sections.

In arithmetic coding, it is assumed that both encoding and decoding machines have

identical programs, which make it possible to compute the source’s probability dis-

tribution and associated joint probabilities of all orders (see further). A speciﬁcity of

arithmetic coding is that a speciﬁc “end-of-sequence” symbol is always required, as I

shall illustrate.

convolutions, thousands of dormant atoms, into which God has put a mystic possibility for noticing an adults

act, and ﬁguring out its purport.

Also well known in this genre is the later 1969 French novel La Disparition (G. Perec), translated into

English in 1995 as AVoid, while fully respecting the author’s spirit and using no E. Other translations of

Perec’s novel also exist in German and Danish, although the stunt was too hard in this last case for a full

translation. To complete the story of such literary oddities, an early pioneer of the genre is reportedly H.

Holland, who wrote a short 1928 novel called Eve’s Legend, which uses no vowels other than E. In the same

style, the author Perec also published Les Revenentes (1972). Here is a sample of Eve’s Legend:

Men were never perfect, yet the three brethren Veres were ever esteemed, respected, revered, even when the

rest, whether the select few, whether the mere herd, were left neglected . . .

See:

www.webrary.org/Maillist/msg/2001/2/Re.missingletterquotEquot.html,

www.ling.ed.ac.uk/linguist//issues/11/11-1701.html#?CFID=18397914&CFTOKEN=46891046.