Second, w and x can be the same set of regressors, but it is better if we have at
least one factor in w that is not in x. Otherwise, the coefficients in equation (9.31)
are identified only because λ
ˆ
(α
u
) is a nonlinear translation of the model’s regres-
sors. Even so, having w and x be the same regressors causes substantial collinear-
ity problems, since x and λ
ˆ
(α
u
) will then tend to be highly correlated. Ideally, we
want to choose the unique element of w to be a factor that is unrelated to the out-
come.
Third, how is λ(α
u
) to be interpreted? This term can be considered the hazard of
exclusion (Berk, 1983). The higher its value, the more a given case possesses char-
acteristics associated with exclusion from the sample. Why? Once again, regard
Figure 9.3. Remember that as z ⫽ w
i
⬘γ becomes increasingly positive, equation
(9.32) tells us that Φ(w
i
⬘γ), the probability of being selected into the sample, also
increases, and φ(w
i
⬘γ) decreases to zero, which means that λ(α
u
) shrinks toward
zero. As w
i
⬘γ becomes increasingly negative, Φ(w
i
⬘γ), the probability of being
selected, shrinks toward zero, while φ(w
i
⬘γ) also decreases to zero but at a slower
rate, which implies that λ(α
u
) becomes large. Therefore, larger values of λ(α
u
)
reflect a lower probability of inclusion into the sample.
Fourth, θ, the “effect” of the hazard of exclusion, can be misleading, since it is
opposite to intuition. For one thing, it should probably be thought of only as an
association parameter, since the hazard of exclusion does not actually “cause” Y*.
Additionally, the sign of this effect is the same as the sign of ρ, since σ
ε
is always
positive. If ρ is positive, for instance, whatever unobserved factors raise the prob-
ability of selection also elevate the outcome. A positive “effect” of the hazard of
exclusion in this case indicates that the tendency to be included—not excluded—
is associated with a higher mean outcome. This is a subtle point that can easily
cause confusion. As an example, Berk’s (1983) exposition of sample selection
effects considered potential bias in the regression model for satisfaction with jury
duty brought about by using only the sample that responded to a mail survey. The
effect of λ
ˆ
(α
u
) in the model for satisfaction with jury duty was seen to be nega-
tive, implying a negative value for ρ. The temptation is to conclude that exclusion
was associated with less satisfaction, or that the dissatisfied were less likely to
respond. Yet a negative ρ means that, net of observed covariates, the tendency to
respond was associated with less satisfaction; in other words, the dissatisfied were
more likely to respond.
Fifth, an examination of the nature of the bias associated with omitting λ(α
u
)
reveals the conditions under which sample selectivity does, and does not, create
problems. For simplicity of exposition, let’s assume that there is only one regressor,
although the principles generalize to any number of regessors. The substantive equa-
tion is
Y*⫽ β
0
⫹ β
1
X ⫹ ε,
where ε is normal with zero mean and variance σ
2
ε
. The selection equation is
Z*⫽ γX ⫹ u,
SAMPLE-SELECTION MODELS 337