which under the null hypothesis that the constraints are valid is chi-squared with
degrees of freedom equal to the number of constraints imposed (e.g., the number of
parameters set to zero).
As the coefficient estimates are normally distributed for large n, we use a z test to
test H
0
: β
k
⫽ 0. The test is of the form
z ⫽
ᎏ
σ
ˆ
b
b
k
k
ᎏ
,
where σ
ˆ
b
k
is the estimated standard error of b
k
. The square of z is what is actually
reported in some software (e.g., SAS), and z
2
is referred to as the Wald chi-squared since
it has a chi-squared distribution with 1 degree of freedom under H
0
. This test is asymp-
totically equivalent to the nested χ
2
that would be found from comparing models with
and without the predictor in question. However, the reader should be cautioned that
Wald’s test can behave in an aberrant manner when an effect is too large. In particular,
the Wald statistic shrinks toward zero as the absolute value of the parameter estimate
increases without bound (Hauck and Donner, 1977). Therefore, when in doubt, the
nested χ
2
is to be preferred over the Wald test for testing individual coefficients.
Confidence intervals for logit or probit coefficients are also based on the asymp-
totic normality of the coefficient estimates. Thus, a 95% confidence interval for β
k
in
either type of model takes the form b
k
⫾ 1.96σ
ˆ
b
k
. This formula applies generically to
any coefficient estimates that are based on maximum likelihood estimation and is
relevant to all the models discussed from this point on in the book. I therefore omit
coverage of confidence intervals in subsequent chapters.
More about the Likelihood. As the likelihood function is liable to be relatively
unfamiliar to many readers, it is worth discussing in a bit more detail. It turns out
that this function taps the indeterminacy in Y under a given model, much like the
total and residual sums of squares do, in linear regression. By indeterminacy, I
mean the uncertainty of prediction of Y under a particular model. For example, in
OLS, if the “model” for Y is a constant, µ, estimated in the sample by y
苶
, the inde-
terminacy in Y with respect to this model is TSS ⫽ the sum of squares around y
苶
. This,
of course, is the naive model, which posits that Y is unrelated to the explanatory vari-
ables. TSS measures the total amount of indeterminacy in Y that is potentially
“explainable” by the regression. On the other hand, SSE, which equals the sum of
squares around y
ˆ
, is the indeterminacy in Y with respect to the hypothesized model.
If the model accounts perfectly for Y, then Y ⫽ y
ˆ
for all cases, and SSE is zero.
In linear regression, we rely on the squared deviation of Y from its predicted value
under a given model to tap uncertainty. The counterpart in MLE is the likelihood of
Y under a particular model. The greater the likelihood of the data, given the param-
eters, the more confident we are that the process that generated Y has been identified
correctly. Under the naive model, the process that generated Y is captured by p, and
⫺2lnL
0
reflects the total uncertainty in Y that remains to be explained. The indeter-
minacy under the hypothesized model is ⫺2lnL
1
. What happens if Y is predicted
NONLINEAR PROBABILITY MODELS 257