128 CHAPTER 9
Thinking about the confidence and precision we need in making specific esti-
mates is one sound way to approach the always vexing question of how large a
sample is needed. Following this approach, of course, requires deciding specifically
what we want to find out, how precise our results need to be, and how confident
we want to be of our conclusions. These parameters are not absolutes. They vary
from one situation to the next. What is sufficient precision in one context may be
hopelessly imprecise in another. And what is sufficient confidence for some pur-
poses may be altogether inadequate for others. If we cannot state our aims clearly
enough to at least approximate how large a sample may be needed to achieve them,
however, it is probably premature to be selecting a sample. We should go back and
think harder about exactly what we are trying to find out.
ASSUMPTIONS AND ROBUST METHODS
The use of most of the tools discussed in this and subsequent chapters requires
making some assumptions. These will be discussed at the close of each chapter.
Most of the techniques are already fairly robust. That is, they can be applied to
samples that only approximately meet the assumptions. And there are things we can
do even with samples that violate the assumptions drastically.
Once we have decided that we are willing to treat a batch of numbers as a random
sample from a larger population we wish to know about, the only assumption we
must make in order to estimate the population mean and attach error ranges to it
in the manner described here is that the special batch must have an approximately
normal distribution. The central limit theorem tells us that this will always be the
case for large samples (that is, larger than 30 or 40 elements). When working with
a smaller sample, it is wise to look at the stem-and-leaf plot to check for a roughly
symmetrical and single-peaked shape. If a small sample has a single-peaked and
roughly symmetrical shape, then we can count on its special batch to have a normal
shape. If a small sample has a badly skewed shape we might try to correct this with
transformations, but this is not very useful for estimating means because we would
wind up estimating something like the mean of the logarithm of the measurement
in the population, and such a quantity is not very easy to relate to what we want to
know.
Looking at a stem-and-leaf plot should always be the initial step anyway, even
with a large sample. This is because the sample might have outliers or a badly
skewed shape that would make the mean and standard deviation meaningless as
numerical indexes of level and spread, as discussed in Chapters
2 and 3. If a sample
has outliers or a badly skewed shape, then the population the sample was selected
from probably does too. In such a case, the mean will likely not be a good index
of center for the population, and is thus not what we want to estimate. If the prob-
lem is outliers, the trimmed mean is a better index of the center. If the problem is
skewness, then the median is a better index of the center. In such cases, it makes
sense to estimate, not the regular mean of the population, but the trimmed mean or