magnitudes of estimates as well as in their standard errors, or in the extreme case,
counterintuitive signs of coefficients (Schaefer, 1986). Collinearity diagnostics are
not necessarily available in logit or probit software (e.g., none are currently provided
in SAS’s procedure LOGISTIC). However, in that collinearity is strictly a problem
connected with the explanatory variables, it can also be addressed with linear regres-
sion software. In SAS, I use collinearity diagnostics in the OLS regression proce-
dure (PROC REG) to evaluate linear dependencies in the predictors. The best single
indicator of collinearity problems is the VIF for each coefficient (as discussed in
Chapter 6). As mentioned previously, VIF’s greater than about 10 signify problems
with collinearity.
Other problems are more unique to maximum likelihood estimation. The first per-
tains to zero cell counts. If the cross-tabulation of the response variable with a given
categorical predictor results in one or more zero cells, it will not be possible to esti-
mate effects associated with those cells in a logistic regression model. In an earlier
article (DeMaris, 1995) I presented an example using the 1993 General Social
Survey in which the dependent variable is happiness, coded 1 for those reporting
being “not too happy,” and 0 otherwise. Among categorical predictors, I employ
marital status, represented by four dummy variables (widowed, divorced, separated,
never married) with married as the reference group, and race, represented by two
dummies (black, other race), with white as the reference group. Among other mod-
els, I try to estimate one with the interaction of marital status and race. The prob-
lem is that among those in the “other race” category who are separated, all
respondents report being “not too happy,” leaving a zero cell in the remaining cate-
gory of the response. I was alerted that there was a problem by the unreasonably
large coefficient for the “other race ⫻ separated” term in the model and by its asso-
ciated standard error, which was about 20 times larger than any other. Running the
three-way cross-tabulation of the response variable by both marital status and race
revealed the zero cell. An easy solution, in this case, was to collapse the categories
of race into “white” versus “nonwhite” and then to reestimate the interaction. If col-
lapsing categories of a categorical predictor is not possible, it could be treated as
continuous, provided that it is at least ordinal scaled (Hosmer and Lemeshow, 2000).
A problem that is much more rare occurs when one or more predictors perfectly
discriminates between the categories of the response. (Actually, it’s when some lin-
ear combination of the predictors, which might be just one predictor, discriminates
the response perfectly.) Suppose, as a simple example, that all couples with incomes
under $10,000 per year report violence and all couples with incomes over $10,000
per year report being nonviolent. In this case, income completely separates the out-
come groups. Correspondingly, the problem is referred to as complete separation.
When this occurs, the maximum likelihood estimates do not exist (Albert and
Anderson, 1984; Santner and Duffy, 1986). Finite maximum likelihood estimates
exist only when there is some overlap in the distributions of explanatory variables
for groups defined by the response variable. If the overlap is only marginal—say, at
a single or at a few tied values—a problem of quasicomplete separation develops. In
either case, the analyst is again made aware that something is amiss by unreasonably
268 REGRESSION WITH A BINARY RESPONSE