AUTOCORRELATION
20
5. When the specification appears satisfactory, congratulate oneself on having completed the task
and quit.
The danger with this strategy is that the reason that the final version of the model appears
satisfactory is that you have skilfully massaged its specification to fit your particular data set, not that
it really corresponds to the true model. The econometric literature is full of two types of indirect
evidence that this happens frequently, particularly with models employing time series data, and
particularly with those modeling macroeconomic relationships. It often happens that researchers
investigating the same phenomenon with access to the same sources of data construct internally
consistent but mutually incompatible models, and it often happens that models that survive sample
period diagnostic checks exhibit miserable predictive performance. The literature on the modeling of
aggregate investment behavior is especially notorious in both respects. Further evidence, if any were
needed, has been provided by experiments showing that it is not hard to set up nonsense models that
survive the conventional checks (Peach and Webb, 1983). As a consequence, there is growing
recognition of the fact that the tests eliminate only those models with the grossest misspecifications,
and the survival of a model is no guarantee of its validity.
This is true even of the tests of predictive performance described in the previous chapter, where
the models are subjected to an evaluation of their ability to fit fresh data. There are two problems with
these tests. First, their power may be rather low. It is quite possible that a misspecified model will fit
the prediction period observations well enough for the null hypothesis of model stability not to be
rejected, especially if the prediction period is short. Lengthening the prediction period by shortening
the sample period might help, but again there is a problem, particularly if the sample is not large. By
shortening the sample period, you will increase the population variances of the estimates of the
coefficients, so it will be more difficult to determine whether the prediction period relationship is
significantly different from the sample period relationship.
The other problem with tests of predictive stability is the question of what the investigator does if
the test is failed. Understandably, it is unusual for an investigator to quit at that point, acknowledging
defeat. The natural course of action is to continue tinkering with the model until this test too is passed,
but of course the test then has no more integrity than the sample period diagnostic checks.
This unsatisfactory state of affairs has generated interest in two interrelated topics: the possibility
of eliminating some of the competing models by confronting them with each other, and the possibility
of establishing a more systematic research strategy that might eliminate bad model building in the first
place.
Comparison of Alternative Models
The comparison of alternative models can involve much technical complexity and the present
discussion will be limited to a very brief and partial outline of some of the issues involved. We will
begin by making a distinction between nested and nonnested models. A model is said to be nested
inside another if it can be obtained from it by imposing a number of restrictions. Two models are said
to be nonnested if neither can be represented as a restricted version of the other. The restrictions may
relate to any aspect of the specification of the model, but the present discussion will be limited to
restrictions on the parameters of the explanatory variables in a single equation model. It will be
illustrated with reference to the demand function for housing services, with the logarithm of
expenditure written Y and the logarithms of the income and relative price variables written X
2
and X
3
.