where p
1
and p
2
are the probabilities and for which we can obtain values of x
2
from the published tables.
2.6.6 Central limit theorem
In the foregoing discussion of confidence limits (Section 2.6.3) we have
restricted the formulae to those for the normal distribution, the properties of
which are so well established. It lends we ight to our argument for transforming
variables to normal if that is possible. However, even if a variable is not
normally distributed it is often still possible to use the tabulated values and
formulae when working with grouped data. As it happens, the distributions of
sample means tend to be more nearly normal than those of the original
populations. Further, the bigger is a sample the closer is the distribution
of the sample mean to normality. This is the central limit theorem. It
means that we c an use a l arge body of t he ory w he n stud ying s am ple s from
the r eal world.
We might, of course, have to work with raw data that cannot readily be
transformed to normal, and in these circumstances we should see whether the
data follow some other known distribution. If they do then the same line of
reasoning can be used to arrive at confidence limits for the parameters.
2.6.7 Increasing precision and efficiency
The confidence limits on means computed from simple random samples can be
alarmingly wide, and the sizes of sample needed to obtain satisfactory precision
can also be alarmingly large. One reason when sampling space with a simple
random design is that it is inefficient. Its cover is uneven; there are usually parts
of the region that are sparsely sampled while elsewhere there are clusters of
sampling points. If a variable z is spatially auto correlated, which is likely at some
scale, then clustered points duplicate information. Large gaps between sampling
points mean that information that could have been obtained is lacking.
Consequently, more poin ts are needed to achieve a given precision, as measured
by s
2
ð
zÞ, than if the points are spread more evenly. There are several better
designs for areas, and we consider the two most common one s, stratified random
and systematic.
Stratified sampling
In stratified designs the region of interest, R, is divided into small subdivisions
(strata). These are typically small squares, but they may be other shapes, of
equal area. At least two sampling points are chosen ran domly within each
stratum. For this scheme the largest possible gap is then less than four strata.
32 Basic Statistics