168 NONIGNORABLE MISSINGNESS
this factorization underlies the missing data taxonomy described in Section 5.4.
The typical strategy to model the two components is to specify a parametric
model for each and to assume that ψ and θ are distinct. In the following, we
will assume ψ and θ are distinct and aprioriindependent unless stated oth-
erwise. Note that this does not correspond to the (ω
E
, ω
O
)partition in (8.1).
An early example of selection modeling for multivariate data can be found
in Heckman(1979), where the joint distribution of a bivariate response Y
with missing Y
2
was specified using a multivariate normal distribution (im-
plicitly, a probit model for the binary indicators of missingness that is linear in
(Y
1
,Y
2
)
T
). Diggle and Kenward (1994) expanded the Heckman model to the
case of dropout in longitudinal studies using a logistic model for the hazard
of dropout. Many subsequent articles adapted and extended their approach
within a likelihood framework (see, e.g.,Fitzmaurice, Molenberghs, and Lip-
sitz, 1995; Baker, 1995; Molenberghs, Kenward, and Lesaffre, 1997; Liu, Wa-
ternaux, and Petkova, 1999; Albert, 2000; Heagerty and Kurland 2004). A
framework for semiparametric inference can be found in Robins et al. (1995)
and Scharfstein et al. (1999).
We begin our review of parametric selection models by demonstrating via
several examples the difficulty in finding sensitivity analysis parameteriza-
tions.
8.3.2 Absence of sensitivity parameters in the missing data mechanism
In many parametric selection models, all the parameters are identified. Iden-
tification is drivenbyparametric assumptions on both the full-data response
model and the missing data mechanism (MDM).
To illustrate using a simple example, consider a cross-sectional setting
where Y may or may not be missing. Suppose the histogram of the observed
y’s looks like that given in Figure 8.1 and that we specify the following model
for the missing data mechanism:
logit{P (R =1| y)} = ψ
0
+ ψ
1
y,
where R =1corresponds to observing Y .Withnofurtherassumptions about
the distribution of Y ,wecannot identify ψ
1
because when R =0,wedo
not observe Y .Itcan also be shown (Scharfstein et al., 2003) that ψ
1
is a
sensitivity parameter.
However, suppose we further assume that the full-data response model is
anormaldistribution, N(µ, σ
2
). By looking at the histogram of the observed
y’s in Figure 8.1, it is clear that for this histogram to be consistent with
normally distributed full-data y’s, we need to fill in the right tail; this implies
ψ
1
< 0. On the other hand, if we assumed a parametric model for the full-data
response model that was consistent with the histogram for the observed y’s
(e.g., a skew-normal), it would suggest that ψ
1
=0.Thus,inference about