27.3 The t-test in a regression setting 405
The one-tailed p-value P(T ≤−7.39) can be approximated by P(Z ≤−7.39),
where Z has an N(0, 1) distribution. From Table B.1 we see that this probabil-
ity is smaller than P(Z ≤−3.49) = 0.0002. This is smaller than α/2=0.0005,
so we reject the null hypothesis at level 0.001. In fact the p-value is much
smaller: a statistical software package gives P(Z ≤−7.39) = 7.5 · 10
−14
.The
data provide overwhelming evidence against H
0
: µ = 240, so that we conclude
that the expected length of an eruption is different from 4 minutes.
Quick exercise 27.3 Compute the critical region K for the test, using the
normal approximation, and check that t = −7.39 falls in K.
In fact, if we would test H
0
: µ = 240 against H
1
: µ<240, the p-value
corresponding to t = −7.39 is the left tail probability P(T ≤−7.39). This
probability is very small, so that we also reject the null hypothesis in favor
of this alternative and conclude that the expected length of an eruption is
smaller than 4 minutes.
27.3 The t-test in a regression setting
Is calcium in your drinking water good for your health? In England and Wales,
an investigation of environmental causes of disease was conducted. The annual
mortality rate (percentage of deaths) and the calcium concentration in the
drinking water supply were recorded for 61 large towns. The data in Table 27.3
represent the annual mortality rate averaged over the years 1958–1964, and
the calcium concentration in parts per million. In Figure 27.3 the 61 paired
measurements are displayed in a scatterplot. The scatterplot shows a slight
downward trend, which suggests that higher concentrations of calcium lead
to lower mortality rates. The question is whether this is really the case or if
the slight downward trend should be attributed to chance.
To investigate this question we model the mortality data by means of a simple
linear regression model with normally distributed errors, with the mortality
rate as the dependent variable y and the calcium concentration as the inde-
pendent variable x:
Y
i
= α + βx
i
+ U
i
for i =1, 2,...,61,
where U
1
,U
2
,...,U
61
is a random sample from an N(0,σ
2
) distribution. The
parameter β represents the change of the mortality rate if we increase the
calcium concentration by one unit. We test the null hypothesis H
0
: β =0
(calcium has no effect on the mortality rate) against H
1
: β<0(higher
concentration of calcium reduces the mortality rate).
This example illustrates the general situation, where the dataset
(x
1
,y
1
), (x
2
,y
2
),...,(x
n
,y
n
)