Measurement, Representation and Analysis of Temporal Signals 387
text is not a random sequence of letters, but rather a sequence of words which we
sort and store in a dictionary, indexing them by order of their appearance in the
dictionary. The compressed file is thus the dictionary, and the text is encoded with
the order numbers of the words used. While the words of a text are easy to read,
numerical structures (sequences of similar bytes) of an image file must be sought
with a suitable algorithm. It is also possible to establish a partial dictionary by not
encoding isolated values, which are not recognized as part of a repeated structure.
This method is applicable to all types of files.
– the Huffman method, which is entirely statistical, is based on the fact that in
language, all the letters are not used with the same frequency. In French, for
example, the probability of encountering the vowel “a” is 17.3%, whereas that of
encountering the consonant “w” is 0.05%. Now, letters are encoded on 8 bits
(ASCII characters). In general, a byte file contains variable occurrences for the
different bytes which are possible, while the Huffman method consists of encoding
the bytes encountered in a source file with variable binary lengths such that the most
frequent data are encoded on a very short binary length, rare bytes being represented
by a binary length which is greater than the average. The few bits lost on the rare
bytes are quickly recovered for the more frequent bytes (“a” is 346 times more
frequent than “w”). As the number of bits encoded is now variable, it is necessary to
establish a criterion that allows us to distinguish between successive encoded
elements. The encoded file will finally comprise the used source code file and the
encoded message. Its establishment requires the implementation of a suitable
algorithm; data reading, in other words the reconstruction of the initial file, is
performed by means of a decoding algorithm (decompression).
The Huffman method is applicable to all kinds of file (text, image, music, etc.)
since it can establish a table of byte frequencies when the file is read. Despite its age
(it dates from 1952), this method remains competitive, as research has improved its
capacity to compress data.
All of the above methods of data compression are no-loss methods, as it is
possible to completely reconstruct the initial file. They do not use any underlying
“physical” property of the file structure, the algorithms detecting the structure of
repetitions in a purely logical manner. A suitable data compression code leads to a
reduction of the file volume. Its efficiency is related to the degree of repetition of
the file entities (bits, bytes, structures, etc.).
7.3.7.3.
Analytical methods
Let us note first of all that the representation of a signal by an analytical formula
can be considered as signal encoding, its decoding being performed by numerical
calculation with formulae used for analytical representation. However, in most