which may be somewhat overlapped (some ink pixels may be brighter than some
paper pixels) but which can be best separated by setting a single threshold value.
The definition of best is a function of what statistical test is applied. For example,
the Trussell method uses the statistician’s t-test to compare the two populations
divided by every possible threshold setting to calculate the t-statistic. This is a
function of the mean values, standard deviations, and number of pixels in each
segment of the histogram. When the t-statistic is maximum the probability that the
two populations are different is greatest, so that is the threshold value used.
The Trussell method works quite well for most text-reading applications, and in
fact it is generally applied to many situations where it is known beforehand (or assumed)
that there are just two populations of pixels present. That is often the case for stained
tissue, for example (stained vs. not stained). It may also apply to porous material (solid
or void), and a variety of meat and vegetable products. It is surprising that it works so
well because one of the underlying assumptions in the t-test is that the populations of
pixels have brightness values that have a normal (or Gaussian) distribution, so that the
mean and standard deviation fully characterize the data. Few real images (even ones of
print on paper) present histograms consisting of Gaussian peaks. Note in the figure that
the histogram has one more-or-less symmetrical peak for the white paper pixels, whereas
the darker ink pixels do not produce a peak at all, but rather a broad sloping shelf with
no obvious place to position a threshold.
There are a variety of nonparametric statistical tests that can be applied to the
two-population model that do not make an assumption of normality, and most of
them have been used to program other threshold-selection algorithms. The Shannon
method, for example, calculates the entropy for the two populations that are treated
as fuzzy sets to determine the threshold setting that minimizes the uncertainty of
the setting. This method selects values that are typically slightly different from the
Trussell method, but also generally satisfactory. All of the examples of thresholding
based on the histogram that follow in this text use either the Trussell or Shannon
method. Many of the other methods (whose mathematical algorithms are summarized
(c) (d)
FIGURE 3.54 (continued)
2241_C03.fm Page 191 Thursday, April 28, 2005 10:28 AM
Copyright © 2005 CRC Press LLC