176 CHAPTER 13
DIFFERENCES BETWEEN POPULATIONS VERSUS
RELATIONSHIPS BETWEEN VARIABLES
Analysis of variance can also be thought about from a rather different perspective.
Instead of focusing on the differences between several populations in mean values
of some measurement, we could focus on the analysis of variance as an investigation
of the relationship between two variables. In the example above,the two variables
would be projectile point weight and period. In an analysis of variance, conceived
in this way, there are always two variables: one of them is a measurement, and the
other is a set of categories. It is the categorical variable that provides the basis for the
division of the overall sample into subsamples, one corresponding to each category.
The categorical variable is always considered the independent variable because
we simply take the division of the sample into subsamples based on these cate-
gories as a given. The measurement is called the dependent variable because we
speak as if it were determined, at least in part, by the categories. In the example
of Archaic projectile points from the Cottonwood River valley we found that Late
Archaic projectile points weighed less, on average, than Early Archaic ones. Thus
it seems reasonable to say that projectile point weight depends on period to some
extent. It is simpler in statistics to speak of the relationship in these terms, although
this implies nothing about the direction of causality in the real world. Indeed, it
makes little real sense even to talk about period as an independent variable that
“causes” projectile points to be larger or smaller. This is simply a convention of
statistical language, having little to do with real notions of causality.
It is often useful to think of variable relationships in predictive terms. If the two
variables – projectile point weight and period – are related to each other, then know-
ing the value of one for a particular case would help us to predict the value of the
other. If, before looking at a particular projectile point, we wished to predict its
weight, the best guess we could make would be the mean of the overall sample.
That guess would most often be closest to the real weight of the projectile point in
question. Given what we found out in the analysis of variance, however, we know
that it would help us make better predictions if we knew to what part of the Archaic
the projectile point pertained. If we knew that the point was Late Archaic, the best
prediction would be the mean of the Late Archaic subsample. This prediction would
more often be closer to the real weight than the prediction based on the overall sam-
ple mean. It is in this sense that we can say that knowing the period helps us to
predict the projectile point weight. (We could, of course, reverse direction and pre-
dict period from weight. It is a little more complicated to phrase, and so we don’t
usually find it convenient to speak that way, but the relationship is symmetrical in
that sense.)
If there were no relationship between projectile point weight and period, then
knowing one would not help us predict the other at all. Looked at from this view-
point, the significance question then becomes, “How likely is it that the relationship
between projectile point weight and period that we observe in this sample is simply
a consequence of sampling vagaries?” Yet another way to put it would be, “How