248 REGRESSION WITH A BINARY RESPONSE
LINEAR PROBABILITY MODEL
Suppose that we have a binary response, Y
i
, coded 1 if the ith case is in the category
of interest, and 0 otherwise. (The coding of a binary response is actually arbitrary,
but dummy coding is especially convenient, as will become apparent.) Recall that
the linear regression model for the conditional mean of Y, given x,is
E(Y
i
) ⫽ β
0
⫹ β
1
X
i1
⫹ β
2
X
i2
⫹
...
⫹ β
K
X
iK
.
However, the mean of a dummy coded variable is the proportion of people in the cat-
egory of interest, or equivalently, the probability of being in the interest category,
denoted π. Letting π
i
be the probability of being in the interest category given the ith
covariate pattern, the linear regression model for the conditional mean of a binary
response is
π
i
⫽ β
0
⫹ β
1
X
i1
⫹ β
2
X
i2
⫹
...
⫹ β
K
X
iK
. (7.1)
Because the probability is being modeled as a linear function of the parameters, this
is referred to as the linear probability model (LPM) (Aldrich and Nelson, 1984;
Long, 1997). The regression coefficients are interpreted in terms of the probability
of being in the interest category on Y. Hence, β
1
represents the change in the proba-
bility for each unit increase in X
1
, net of the other covariates, and so on.
Example
Employing questions from the NSFH on partners’ violence toward each other, I
categorized 4095 married and cohabiting couples surveyed between 1987 and
1994 according to whether or not either partner had been violent toward the other
during that period. These data are in the violence dataset. A total of 555 couples,
or 13.55%, had experienced intimate violence. Of interest here is the extent to
which couple violence is a function of several couple characteristics, including
whether they were cohabiting, as opposed to legally married (cohabiting); the
duration of the relationship, in years, as of the initial survey (relationship dura-
tion); whether either partner in the couple was a minority (minority couple); the
female’s age at the start of the union ( female’s age at union); the degree to which
the male was socially or emotionally isolated from his or his partner’s immediate
kin (male’s isolation); the degree of economic disadvantage exhibited by the cou-
ple’s neighborhood of residence at the time of the initial survey (economic disad-
vantage); and whether either partner had a problem with alcohol or drugs
(alcohol/drug problem). The continuous variables relationship duration, female’s
age at union, male’s isolation, and economic disadvantage are all centered.
Although the couple’s violence profile is, in actuality, a three-category variable,
here I simply distinguish the violent from the nonviolent. In Chapter 8 I consider
the three-level response in greater detail. To keep things relatively simple, issues
of sample selectivity and other critical explanatory variables are omitted in what