TRANSFORMATIONS OF VARIABLES
17
Note that the estimate is not exactly the same as the estimate obtained in equation (5.9), which
was –10.99. In principle the two sets of results should be identical, because both are minimizing the
sum of the squares of the residuals. The discrepancy is caused by the fact that we have cheated
slightly in the nonlinear case. We have assumed that
β
1
is equal to its true value, 12, instead of
estimating it. If we had really failed to spot the transformation that allows us to use linear regression
analysis, we would have had to use a nonlinear technique hunting for the best values of b
1
and b
2
simultaneously, and the final values of b
1
and b
2
would have been 12.48 and –10.99, respectively, as in
equation (5.9).
In practice, the algorithms used for minimizing the residual sum of squares in a nonlinear model
are mathematically far more sophisticated than the simple trial-and-error method described above.
Nevertheless, until fairly recently a major problem with the fitting of nonlinear regressions was that it
was very slow compared with linear regression, especially when there were several parameters to be
estimated, and the high computing cost discouraged the use of nonlinear regression. This has changed
as the speed and power of computers have increased. As a consequence more interest is being taken in
the technique and some regression applications now incorporate user-friendly nonlinear regression
features.
5.5 Choice of Function: Box-Cox Tests
The possibility of fitting nonlinear models, either by means of a linearizing transformation or by the
use of a nonlinear regression algorithm, greatly increases the flexibility of regression analysis, but it
also makes your task as a researcher more complex. You have to ask yourself whether you should
start off with a linear relationship or a nonlinear one, and if the latter, what kind.
A graphical inspection, using the technique described in Section 4.2 in the case of multiple
regression analysis, might help you decide. In the illustration in Section 5.1, it was obvious that the
relationship was nonlinear, and it should not have taken much effort to discover than an equation of
the form (5.2) would give a good fit. Usually, however, the issue is not so clear-cut. It often happens
that several different nonlinear forms might approximately fit the observations if they lie on a curve.
When considering alternative models with the same specification of the dependent variable, the
selection procedure is straightforward. The most sensible thing to do is to run regressions based on
alternative plausible functions and choose the function that explains the greatest proportion of the
variance of the dependent variable. If two or more functions are more or less equally good, you
should present the results of each. Looking again at the illustration in Section 5.1, you can see that the
linear function explained 69 percent of the variance of Y, whereas the hyperbolic function (5.2)
explained 97 percent. In this instance we have no hesitation in choosing the latter.
However, when alternative models employ different functional forms for the dependent variable,
the problem of model selection becomes more complicated because you cannot make direct
comparisons of R
2
or the sum of the squares of the residuals. In particular – and this is the most
common example of the problem – you cannot compare these statistics for linear and logarithmic
dependent variable specifications.
For example, in Section 2.6, the linear regression of expenditure on earnings on highest grade
completed has an R
2
of 0.104, and RSS was 34,420. For the semi-logarithmic version in Section 5.2,
the corresponding figures are 0.141 and 132. RSS is much smaller for the logarithmic version, but this
means nothing at all. The values of LGEARN are much smaller than those of EARNINGS, so it is