XYXY
5.9 50 12.1 31
6.1 32 14.1 29
7.0 41 15.0 23
8.2 42
9.8 SOME PRECAUTIONS
Regression and correlation analysis are powerful statistical tools when properly
employed. Their inappropriate use, however, can lead only to meaningless results. To
aid in the proper use of these techniques, we make the following suggestions:
1. The assumptions underlying regression and correlation analysis should be reviewed
carefully before the data are collected. Although it is rare to find that assumptions
are met to perfection, practitioners should have some idea about the magnitude of
the gap that exists between the data to be analyzed and the assumptions of the pro-
posed model, so that they may decide whether they should choose another model;
proceed with the analysis, but use caution in the interpretation of the results; or
use the chosen model with confidence.
2. In simple linear regression and correlation analysis, the two variables of interest are
measured on the same entity, called the unit of association. If we are interested in
the relationship between height and weight, for example, these two measurements
are taken on the same individual. It usually does not make sense to speak of the
correlation, say, between the heights of one group of individuals and the weights of
another group.
3. No matter how strong is the indication of a relationship between two variables, it
should not be interpreted as one of cause and effect. If, for example, a significant
sample correlation coefficient between two variables X and Y is observed, it can
mean one of several things:
a. X causes Y.
b. Y causes X.
c. Some third factor, either directly or indirectly, causes both X and Y.
d. An unlikely event has occurred and a large sample correlation coefficient has
been generated by chance from a population in which X and Y are, in fact,
not correlated.
e. The correlation is purely nonsensical, a situation that may arise when measure-
ments of X and Y are not taken on a common unit of association.
4. The sample regression equation should not be used to predict or estimate outside
the range of values of the independent variable represented in the sample. As illus-
trated in Section 9.5, this practice, called extrapolation, is risky. The true relation-
ship between two variables, although linear over an interval of the independent
variable, sometimes may be described at best as a curve outside this interval. If
9.8 SOME PRECAUTIONS 455