and
y
t
y
t1
e
t
, t 1,2, …, (18.28)
where {a
t
} and {e
t
} are independent, identically distributed innovations, with mean zero
and variances
a
2
and
e
2
, respectively. For concreteness, take the initial values to be
x
0
y
0
0. Assume further that {a
t
} and {e
t
} are independent processes. This implies
that {x
t
} and {y
t
} are also independent. But what if we run the simple regression
y
ˆ
t
ˆ
0
ˆ
1
x
t
(18.29)
and obtain the usual t statistic for
ˆ
1
and the usual R-squared? Because y
t
and x
t
are
independent, we would hope that plim
ˆ
1
0. Even more importantly, if we test H
0
:
1
0 against H
1
:
1
0 at the 5% level, we hope that the t statistic for
ˆ
1
is insignif-
icant 95% of the time. Through a simulation, Granger and Newbold (1974) showed that
this is not the case: even though y
t
and x
t
are independent, the regression of y
t
on x
t
yields a statistically significant t statistic a large percentage of the time, much larger
than the nominal significance level. Granger and Newbold called this the spurious
regression problem: there is no sense in which y and x are related, but an OLS regres-
sion using the usual t statistics will often indicate a relationship.
Recent simulation results are given by Davidson and MacKinnon (1993, Table
19.1), where a
t
and e
t
are generated as
independent, identically distributed normal
random variables, and 10,000 different
samples are generated. For a sample size
of n 50 at the 5% significance level, the
standard t statistic for H
0
:
1
0 against
the two-sided alternative rejects H
0
about
66.2% of the time under H
0
, rather than 5%
of the time. As the sample size increases, things get worse: with n 250, the null is
rejected 84.7% of the time!
Here is one way to see what is happening when we regress the level of y on the level
of x. Write the model underlying (18.27) as
y
t
0
1
x
t
u
t
. (18.30)
For the t statistic of
ˆ
1
to have an approximate standard normal distribution in large
samples, at a minimum, {u
t
} should be a mean zero, serially uncorrelated process. But
under H
0
:
1
0, y
t
0
u
t
, and because {y
t
} is a random walk starting at y
0
0,
equation (18.30) holds under H
0
only if
0
0 and, more importantly, if u
t
y
t
兺
t
j1
e
j
.
In other words, {u
t
} is a random walk under H
0
. This clearly violates even the asymp-
totic version of the Gauss-Markov assumptions from Chapter 11.
Including a time trend does not really change the conclusion. If y
t
or x
t
is a random
walk with drift and a time trend is not included, the spurious regression problem is even
Chapter 18 Advanced Time Series Topics
585
QUESTION 18.2
Under the preceding setup, where {x
t
} and { y
t
} are generated by
(18.27) and (18.28) and {e
t
} and {a
t
} are i.i.d. sequences, what is the
plim of the slope coefficient, say
ˆ
1
, from the regression of y
t
on
x
t
? Describe the behavior of the t statistic of
ˆ
1
.
d 7/14/99 8:36 PM Page 585