296 11 Image Classification Methodologies
are treated separately in Chap. 13. Nevertheless, it is possible to condition hyper-
spectral data (for example, through feature selection) so that the material outlined
here is still relevant.
Signatures generated from the training data will be of a different form depending
on the classifier type to be used. For parallelepiped classification the class signatures
will be the upper and lower bounds of brightness in each spectral band. For minimum
distance classification the signatures will be the mean vectors of the training data for
each class, while for maximum likelihood classification both class mean vectors and
covariance matrices constitute the signatures. For neural network and support sector
machine classifiers the collection of weights define the boundaries between classes.
While they do not represent class signatures as such they are the inherent properties
of the classifier, learnt from training data, that allow classes to be discriminated.
By having the labelled training data available beforehand, from which the sig-
natures are estimated, the analyst is, in a relative sense, teaching the classification
algorithm to recognise the spectral characteristics of each class, thereby leading to
the term supervised as a qualification relating to the algorithm’s learning about the
data with which it has to work.
As a proportion of the full image to be analysed the amount of training data would
represent less than 1% to 5% of the pixels. The learning phase therefore, in which
the analyst plays an important part in the a priori labelling of pixels, is performed
on a very small part of the image. Once trained, the classifier is then asked to attach
labels to all the image pixels by using the class estimates provided to it.
The steps in this fundamental outline are now examined in more detail, noting
the practical issues that should be considered to achieve reliable results.
11.2.2
Determination of Training Data
The major step in straightforward supervised classification is the prior identification
of training pixels. This may involve the expensive enterprise of field visits, or may
require use of reference data such as topographic maps and air photographs. In the
latter, a skilled photointerpreter may be required to determine the training data. Once
training fields are suitably chosen they have to be related to the pixel addresses in
the satellite imagery. Sometimes training data can be chosen by photointerpretation
from image products formed from the multispectral data to be classified. Generally
however this is restricted to major cover types and again can require a great deal of
photointerpretive skill if more than a simple segmentation of the image is required.
Some image processing systems have digitizing tables that allow map data –
such as polygons of training pixels, i.e. training fields – to be taken from maps and
superimposed over the image data. While this requires a registration of the map
and image, using the procedures of Sect. 2.4, it represents an unbiased method for
choosing the training data. It is important however, as with all training procedures
based upon field or reference data, that the training data be recorded at about the
same time as the multispectral data to be classified. Otherwise errors resulting from
temporal variations may arise.