2
But the essence of this ability to correct errors
is
that
we
make use of
redundancy
in
the information. One
or
two key words
in
a garbled message enable
us
to guess the rest because the rest was hardly necessary in the first place. A
Spaniard can read an Italian newspaper without knowing Italian, because he
recognises the stems of a
few
basic nouns and verbs. Most
of
the rest of the written
text
is
superfluous to him. Redundancy, often huge redundancy, exists
in
almost all
information. Consider, for example, ordinary text.
There
is
redundancy
• In the orthography. For example,
in
English the U following a Q
is
quite
unnecessary. More drastically, it
is
possible to leave out most vowels from
written English and to reduce
all double letters to single ones and still leave
the text intelligible.
• In the syntax. For example, definite and indefinite articles
in
English can
often be omitted harmlessly;
in
languages
in
which the verbs are inflected the
personal pronoun
is
usually superfluous.
• In the semantics.
"At
this moment in time" can be cut down to
"At
this
moment" or
"At
this time" or indeed discarded
in
favour of "now."
(We do not consider complete redundancy in which the message
is,
for example, a
repetition
of
what
we
know already
or
perhaps just irrelevant.}
But some information
is
less redundant and more compact than
other
information. It
is
usually impossible to remove a single instruction
or
parameter
from a computer program in machine language without destroying the program's
meaning,
at
least to the computer
that
interprets it, if not to a skilled and critical
programmer reading it.
From certain viewpoints redundancy
is
a weakness.
It
is
a waste
of
space. In
particular it
is
a waste of capacity
on
communication channels, which are often
bottlenecks in computer systems if redundant information
is
transmitted. Many
compression techniques exist for removing redundancy from text and
other
data
streams; in the case
of
voice and video these techniques are complex
and
clever
and compress raw digital signals
by
factors of 10, or 100, or even more.
A second way
of
looking
at
redundancy reveals another associated weakness.
Essentially redundancy means that
out
of
all possible strings
of
characters (e.g.,
in
a text) only a small proportion are valid. Thus, a Q in English not followed
by
a U
is
invalid, and a message containing such a Q
is
not possible in the normally
written language. This aspect
is
explored marvellously in Jorge Luis Borges'
"La
Biblioteca"
("The
Library"), which contains all combinations
of
letters and
texts possible, and hence contains (in
all languages) all knowledge, including, for
example, the future history of the world.
The
problem
is
to find the text you are
looking for.
If
you do find it,
is
it true? Another text, if you could find it, would tell
you if it
is
or not (correctly
or
not?). Perhaps the searching procedure could be
simplified
by
looking up the catalogue, which must be in the library-if you could
find it. This fantasy conceals an important truth, namely, that
in
searching for