which two variables are linearly related, and can be any value from –1to+1.
Usually the population statistic ρ is not known, so it is estimated by the sample
statistic r.
An r of +1, which shows a perfect positive linear correlation, will only be
obtained when the values of both variables increase together and lie along a
straight line (Figure 15.1(a)). Similarly, an r of –1, which shows a perfect
negative linear correlation, will only be obtained when the value of one
variable decreases as the other increases and the points also lie along a
straight line (Figure 15.1(b)). In contrast, an r of zero shows the lack of a
relationship between two variables and Figure 15.1(c) gives one example
where the points lie along a straight line parallel to the X axis. When the
points are more scattered but both variables tend to increase together, the
values of r will be between zero and +1 (Figure 15.1(d)), while if one variable
tends to decrease as the other increases, the value of r will be between zero
and −1(Figure 15.1(e)). If there is no relationship and considerable scatter
(Figure 15.1(f)) the value of r will be close to zero. Finally, it is important to
remember that linear correlation will only detect a linear relationship
between variables – even though the two variables shown in Figure 15.1(g)
are obviously related the value of r will be close to zero.
15.4 Calculation of the Pearson r statistic
A statistic for correlation needs to reliably describe the strength of a linear
relationship for any bivariate data set, even when the two variables have
Table 15.1 A contrast between the uses of correlation and regression.
Correlation Regression
Exploratory – are two variables
significantly related?
Definitive – what is the functional relationship
between variable Y and variable X and is it
significant?
Predictive – what is the value of Y given a
particular value of X?
Neither Y nor X has to be dependent
upon the other variable. Neither
variable has to be determined by
the other.
Variable Y is dependent upon X. It must be
plausible that Y is determined by X, but Y does
not necessarily have to be caused by X.
196 Relationships between variables