10.3 Adaptive Huffman coding 199
needing any codeword delimiters or blanks. The data shown at the bottom of Table 10.4
indicate that the total information required to encode GOODTOSEEYOU takes 32 + 8n
bits, which, for n = 7, comes to 88 bits.
For comparison purposes, consider the case of static Huffman coding. In this case, the
encoder first reads the full message GOODTOSEEYOU and then computes the Huffman
coding tree, which yields the same codeword assignment as shown in Table 10.3, except
that the symbol ∅is not used. The encoder must then define the coding tree as overhead
information. From the table data, and assuming n bits to define each of the eight symbols
(G, O, D, T, S, E, Y, U), the overhead size comes to 2x(2 +n) + 3x(3 + n) + 3x(4+
n) = 29 +8n. With n = 7, the overhead size is, therefore, 85 bits. On the other hand, the
GOODTOSEEYOU message payload represents 2x(2) +3x(3) +3x(4) = 25 bits. The
total message length (overhead + payload) is, therefore, 85 + 25 = 110 bits. This result
compares with the 88-bit full message length of the adaptive FGK coding, which is 20%
shorter. The FGK performance can be even further improved by decreasing the size of
the overhead information, namely the definition of the N identified source symbols. Such
a definition requires N log
2
N bits. Using seven-bit ASCII, the source alphabet size is
2
7
= 128, which covers more than the ensemble of computer-keyboard characters. If
the message to be encoded uses fewer than 128 symbols, the overhead can be reduced
to fewer than seven bits. Regardless of the source size, it is possible to make a list of
all possible symbols and to attribute to each one a code number of variable length, for
instance defined by
−log p(x
i
)
, where p(x
i
) is a conservative estimate of the symbol
probability distribution with long messages. With this approach, the average overhead
size is minimal, the most frequent symbols having shorter code-number definitions,
and the reverse for the least frequent ones. Both encoder and decoder must share this
standard symbol codebook.
We conclude from this tedious but meaningful exercise that when taking into account
the overhead, adaptive coding may yield a performance significantly greater than that
of static coding. Such an advantage must be combined with the fact that encoding
and decoding can be performed dynamically “on the fly,” unlike in the static case.
Furthermore, changes in the source’s characteristics (symbol alphabet and distribution)
do not affect the optimality of the code, which, as its name indicates, is adaptive. The
price to pay for these benefits is the extensive computations required (updating the coding
tree for each symbol received). In contrast, static Huffman coding is advantageous as
the message-source characteristics are fixed, or only slowly evolving in time. In this
case, significantly longer messages can be optimally encoded with the same coding
tree, without needing updates. Static-coding-tree updates can, however, be forwarded
periodically, representing a negligible loss of coding performance, due to the large
information-payload size transmitted in between.
12
As an example of an application,
FGK is used for dynamic data compression and file archival in a UNIX-environment
utility known as compact.
12
It is an interesting class project to perform the comparison between static and adaptive Huffman codings,
based on the same English-text message but considering extracts of different sizes. The goal is to find
the break-even point (message size) where the performance of adaptive coding, taking into account the
overhead of transmitting the 26-letter alphabet codeword, becomes superior to that of static coding.