13.8 Compression of Hyperspectral Data 383
with such reductions in the spectra significant information loss (allowing the spectra
to be used over a large number of applications) could be expected.
More sophisticated codes minimise information loss while compressing the data.
The principal components transformation is an example. The higher order compo-
nents with low variance can be discarded without significant information loss and
yet with a reduction in storage requirement in proportion to the number of bands
discarded. Also, the original spectral or image data can be reconstructed from the
reduced representation (using an inverse principal components transform) although
with loss of information. Sometimes the information loss is referred to as distortion
since the reconstructed data will differ, depending on the level of loss of detail, from
the original.
An alternative transformation widely used in the television and video industry is
the Discrete Cosine Transform (Rao and Yip, 1990). The DCT is similar in principle
to the Discrete Fourier Transform of Sect. 7.7, but with cosine expansion functions
instead of complex exponentials as seen in (7.16).
If the user can tolerate substantial amounts of distortion then significant compres-
sion of remote sensing imagery is possible; figures as high as 100 times reduction in
volume have been reported, but one is then led to question the integrity of the com-
pressed data. Generally, those compression schemes that allow the original image to
be reconstructed without error (so-called lossless compression algorithms) will give
compression ratios of about 2 to 3.
A compression scheme well matched to the needs of remote sensing is referred
to as vector quantisation, based upon the use of a so-called code book. That book
contains a number of representative pixel vectors (for example class means) that
could be obtained from training data, or possibly could even be prototypical reference
spectra. Each code book vector is given a label (such as a number or even a class
symbol).
Now imagine an image has to be transmitted over a telecommunications channel.
If the spectrum matches exactly one of the stored spectra then only the label need
be transmitted. The receiver also has a copy of the code book and can retrieve the
spectrum in question through matching the label. If the spectrum does not match a
code book entry exactly then transmitting the label of the nearest match will incur an
error. Whether that error is acceptable, or whether a correction needs to be transmitted
with the label of closest match, will depend on the application. The efficacy of the
scheme depends upon how well the code book represents the range of pixel vectors
in the image. A good code book will give rise to small differences (errors) between
code book entries and pixel vectors to be transmitted. Such small differences can be
encoded using a small number of bits (substantially smaller than the number of bits
in the original pixel vector), so that good data compression is achieved.
A simple illustration is given in Table 13.2 in which 10 SPOT multispectral vectors
are to be sent over a channel. Ordinarily, with each band represented by 8 bits, the
ten pixels require 10 × 3 × 8 = 240 bits to be transmitted. However, recognising
there are two clusters in the data and using the cluster means as code book vectors,
it is possible to represent each of the pixels to be transmitted by their difference