ASSUMPTIONS ABOUT MISSING DATA MECHANISM 95
covariate effects or other parameters;however, when data are incomplete,
auxiliary variables can be helpful to the extent that they explain variation in
the joint distribution of Y and R.Missingness mechanisms can be defined
conditionally on auxiliary variables as well.
Definition 5.11. Auxiliary variable MAR (A-MAR).
Missing responses are A-MAR if MAR holds conditionally on auxiliary vari-
ables V ; i.e., if, for all y
obs
, x, z,andψ,
p(r | y
obs
, y
mis
, x, v, ψ)=p(r | y
obs
, x, v, ψ).
2
Clearly, A-MAR does not imply MAR. Consequently, when primary interest
is in the full-data response model p(y | x, θ)—whichdoesnot involve v
—itisnecessary to specify a full-data model conditional on v and integrate
it out to obtain the unconditional model. Specifically, the full-data model of
(Y , R | X)is
p(y, r | x, ω)=
p(y, r, v | x, ω) dv. (5.2)
Returning to the longitudinal HIV example, suppose Y is longitudinal CD4
count and V is longitudinal viral load, and that there is appreciable miss-
ingness in CD4. The MAR condition requires the analyst to assume that
conditional on observed CD4 history, missingness is unrelated to the CD4
count that would have been measured; this may be unrealistic. Further sup-
pose that the investigator can confidently specify a joint model for CD4 count
and viral load (e.g., based on knowledge of disease progression dynamics), and
that viral load is thought to explain sufficient variability in CD4 count that
one could use it to predict missing responses. In other words, the analyst is
willing to assume A-MAR, or MAR conditional on viral load.
This approach requires specifying the joint distribution of (Y , R, V | X)
under the integral sign in (5.2). One possibility is to write
p(y, r, v | x, ω)=p(y, v | x, ω) p(r | y, v, x, ω),
where the first factor is the joint model for CD4 and viral load, the second
is the missing data mechanism. As we will see in Chapter 7, incorporating
auxiliary covariates can sometimes be simpler than it would appear. Under
A-MAR (and assuming ignorability, see Section 5.7), the missing data mech-
anism can be left unspecified, although the joint model for Y and V does
have to be specified. Section 6.6 describes some specific models that can be
used for this purpose; in Section 7.5, we illustrate by analyzing data from the
CTQ II trial, where the responses are smoking cessation outcomes measured
weekly for 8 weeks, and the auxiliary process is individual weight, recorded
concurrently.