11.5 Assessment of Classification Accuracy 305
of labelling small classes will therefore be prejudiced. To avoid this it is necessary to
ensure small classes are represented adequately. An approach that is widely adopted
is stratified random sampling in which the user first of all decides upon a set of strata
into which the image is divided. Random sampling is then carried out within each
stratum. The strata could be any convenient area segmentation of the thematic map,
such as gridcells. However the most appropriate stratification to use is the actual
thematic classes themselves. Consequently, the user should choose a random sample
within each thematic class to assess the classification accuracy of that class.
If one adopts random sampling, stratified by class, the question that must then be
answered is how many test pixels should be chosen within each class to ensure that
the results entered into the confusion matrix of Table 11.1 are an accurate reflection
of the performance of the classifier, and that the percentage correct classification so-
derived is a reliable estimate of the real accuracy of the thematic map. To illustrate
this point, a sample of one pixel from a particular class will suggest an accuracy
of 0% or 100% depending on its match to ground truth. A sample of 100 pixels
will clearly give a more realistic estimate. A number of authors have addressed this
problem, using binomial statistics, in the following manner.
Let the pixels from a particular category in a thematic map be represented by the
random variable x that takes on the value 1 if a pixel is correctly classified and 0
otherwise. Suppose the true map accuracy for that class is θ (which is what we wish
to estimate by sampling). Then the probability of x pixels being correct in a random
sample of n pixels from that class is given by the binomial probability
p(x;n, θ ) =
n
C
x
θ
x
(1 − θ)
n−x
x = 0, 1,... ,n . (11.1)
Van Genderen et al. (1978) determine the minimum sample size, by noting that if
the sample is too small there is a finite chance that those pixels selected could all be
labelled correctly (as for example in the extreme situation of one pixel considered
above). If this occurs then a reliable estimate of the map accuracy clearly has not been
obtained. Such a situation is described by x = n in (11.1), giving as the probability
for all n samples being correct
p(n;n, θ ) = θ
n
.
Van Genderen et al. have evaluated this expression for a range of θ and n and have
noted that p(n;n, θ ) is unacceptably high if it is greater than 0.05 – i.e. if more than
5% of the time there is a chance of selecting a perfect sample from a population in
which the accuracy is actually described by θ. A selection of their results is given
in Table 11.2. In practice, these figures should be exceeded to ensure representative
outcomes are obtained. Van Genderen et al. consider an extension of the results in
Table 11.2 to the case of encountering set levels of error in the sampling, from which
further recommendations are made concerning desirable sample sizes.
Rosenfield et al. (1982) have also determined guidelines for selecting minimum
sample sizes. Their approach is based upon determining the number of samples
required to ensure that the sample mean – i.e. the number of correct classifications
divided by the total number of samples per category – is within 10% of the population