158 CHAPTER 12
The effect of framing significance tests in this way is to provide a clear “yes” or
“no” answer to the question of whether the observation tested really characterizes
the populations involved rather than just being the result of sampling vagaries. The
problem is that statistics never do really give us a “yes” or “no” answer to this
question. Significance tests may tell us that the probability that the observation is
just the result of sampling vagaries is very high or moderate or very low. But as
long as we are making inferences from samples we are never absolutely certain
about the populations the samples represent. Significance is simply not a condition
that either exists or does not exist. Statistical results are either more significant or
less significant. We have either greater or lesser confidence in our conclusions about
populations, but we never have absolute certainty. To force ourselves either to reject
or accept a null hypothesis is to oversimplify a more complicated situation to a
“yes” or “no” answer. (Actually many statistics books make the labored distinction
that one does not accept the null hypothesis but rather “fail to reject” it. In practice,
analysts often treat a null hypothesis they have been unable to reject as a proven
truth – more on this subject later.)
This practice, of forcing statistical results like “maybe” and “probably” to become
“no,” and “highly likely” to become “yes,” has its clearest justification in areas like
quality control where an unequivocal “yes” or “no” decision must be made on the
basis of significance tests. If a complex machine turns out some product, a quality
control engineer may test a sample of the output to determine whether the machine
needs to be adjusted. On the basis of the sample results, the engineer must decide
either to let the machine run (and risk turning out many defective products if he or
she is wrong) or stop the machine for adjustment (and risk wasting much time and
money if he or she is wrong). In such a case, statistical results like “the machine
is probably turning out defective products” must be converted into a “yes” or “no”
answer to the question of stopping the machine. Fortunately, research archaeologists
are rarely in such a position. We can usually (and more informatively)say things like
“possibly,” “probably,” “very likely,” and “with great probability.”
Finally, following the traditional 5% significance rule for rejecting the null
hypothesis leaves us failing to reject the null hypothesis when the probability that
our results are just the vagaries of sampling is only 6%. If, in the house floor exam-
ple, the t value had been lower, and the associated probability had been 6%, it would
have been quite reasonable for us to say, “We have fairly high confidence that mean
house floor area was greater in the Classic period than in the Formative.” If we had
approached the problem as one of attempting to reject a null hypothesis, however,
with a 5% rejection level, we would have been forced to say instead, “We have failed
to reject the hypothesis that house floor areas in the Formative and Classic are the
same.” As a consequence we would probably have proceeded as if there were no dif-
ference in house floor area between the two periods when our own statistical results
had just told us that there was a 94% probability that there was such a difference.
In some disciplines, almost but not quite rejecting the null hypothesis at the
sacred 5% level is dealt with by simply returning to the lab or wherever and studying
a larger sample. Other things being equal, larger samples produce higher confi-
dence levels, and higher confidence levels equate to lower significance probabilities.