
120 INFERENCE UNDER MAR
the posterior for the mean parameters in Bayesian inference replaces (β, α)
in (6.8) with (β, α)+logp(β, α). 2
We illustrate using a multivariate normal model.
Example 6.2. Information matrix based on observed data log likelihood un-
der ignorability with a multivariate normal model.
Assume Y
i
follows a multivariate normal distribution with mean X
i
β and
covariance matrix Σ(α). Let θ =(β, α)andassume that p(β, α)=p(β)p(α)
(a common assumption). For the case of complete data, it is easy to show
that the off-diagonal block of the information matrix, I
β,α
,isequalto zero
for all values of α,therebysatisfying condition (6.8). For Bayesian inference,
the posterior for β will be consistent even under mis-specification of Σ(α).
However, under ignorability, the submatrix of the information matrix, now
based on the observed data log likelihood
obs
(or observed dataposterior)
and given by
I
obs
β,α
(β, α)=− E
∂
2
obs
(β, α)
∂β∂α
T
,
is no longer equal to zero even at the true value for Σ(α)(Little and Rubin,
2002). Hence the weaker parameter orthogonality condition given in Definition
6.2 does not even hold. As a result, in order for the posterior distribution of the
mean parameters to be consistent, the dependence structure must be correctly
specified.
This lack of orthogonality can be seen in the setting of a bivariate normal
linear regression, by making a simple analogy to univariate simple linear re-
gression. This will also provide some additional intuition into how inferences
change under missingness.
Suppose E(Y )=µ, R
i
=1fori =1,...,n
1
(y
i2
observed), and R
i
=0for
i = n
1
+1,...,n(y
i2
missing). As in Chapter 5, we factor the joint distribution
of p(y
1
,y
2
)asp(y
1
)p(y
2
| y
1
). For complete data, the conditional distribution
of Y
2
given Y
1
(ignoring priors for the time being) as a function of µ
2
and
φ
21
= σ
12
/σ
11
is proportional to
exp
-
−
n
i=1
{y
i2
− µ
2
− φ
21
(y
i1
− µ
1
)}
2
/2σ
2|1
.
, (6.9)
where σ
2|1
= σ
22
− σ
21
σ
−1
11
σ
12
.Notethatφ
21
and µ
2
do not appear in p(y
1
).
Theorthogonality of µ
2
and φ
21
is apparent by recognizing (6.9) as the
same form as the log likelihood for a simple linear regression having a centered
covariate y
i1
− µ
1
with intercept µ
2
and slope φ
21
.Itcanbeshownfromthis
form that the element of the (expected) information matrix corresponding to
µ
2
and φ
21
is zero for all values of φ
21
.However, with missing data (under