COVARIANCE, VARIANCE, AND CORRELATION
15
The table reproduces his data set, which relates to the period 1953/1954 to 1963/1964
(annual exponential growth rates). Plot a scatter diagram and calculate the sample correlation
coefficient for e and p. [If you are not able to use a spreadsheet application for this purpose, you
are strongly advised to use equations (1.9) and (1.17) for the sample covariance and variance and
to keep a copy of your calculation, as this will save you time with another exercise in Chapter 2.].
Comment on your findings.
1.6
Suppose that the observations on two variables X and Y lie on a straight line
Y = b
1
+ b
2
X
Demonstrate that Cov(X, Y) = b
2
Var(X) and that Var(Y) =
2
2
b Var(X) , and hence that the sample
correlation coefficient is equal to 1 if the slope of the line is positive, –1 if it is negative.
1.7*
Suppose that a variable Y is defined by the exact linear relationship
Y = b
1
+ b
2
X
and suppose that a sample of observations has been obtained for X, Y, and a third variable, Z.
Show that the sample correlation coefficient for Y and Z must be the same as that for X and Z, if
b
2
is positive.
1.9 Why Covariance is Not a Good Measure of Association
The correlation coefficient is a much better measure of association than the covariance, the main
reason being that the covariance depends on the units in which the variables X and Y happen to be
measured, whereas the correlation coefficient does not. This will be demonstrated for the sample
concepts; the proof for the population concepts will be left as an exercise.
Returning to the schooling and earnings example, we will investigate what happens when hourly
earnings are measured in cents rather than dollars. The covariance will be affected, but the correlation
coefficient will not.
We will denote the revised earnings data by Y'. The data for S and Y' are shown in Table 1.4. Of
course the data for Y' are just the data for Y in Table 1.2, multiplied by 100. As a consequence, the
average value of Y' in the sample is 100 times as large as the average value of Y. When we come to
calculate the earnings deviations (Y' –
'Y
), these are 100 times those in Table 1.2 because (Y' –
'Y
) =
(100Y’ – 100
Y
) = 100(Y –
Y
). Hence the products (S – S ) (Y' –
'Y
) are 100 times those in Table
1.2 and the sample covariance, 1529.4, is 100 times that obtained when hourly earnings were
measured in dollars. However, the correlation coefficient is unaffected. The correlation coefficient
for S and Y' is
55.0
771080888.10
4.1529
)'(Var)(Var
)',(Cov
'
=
×
==
YS
YS
r
SY
. (1.28)