Daniels M.J., Hogan J.W. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis

Подождите немного. Документ загружается.

20 REGRESSION MODELS

the linear predictor

= η(x

, β)=x

β,

where β =(β

,...,β

)

is a vector of regression coeﬃcients. Deﬁne µ

µ(x

, β)=E(Y | x

, β). A smooth, monotone function g links the mean µ

to the linear predictor η

via

g(µ

)=η

= x

β. (2.1)

In many exponential family distributions, it is possible to identify a link

function g such that X

Y is the suﬃcient statistic for β (here, X is the

n × p design matrix and Y =(Y

,...,Y

)

is the n × 1vector of responses).

In this case, the canonical parameter is θ = η.Examplesarewell-known and

widespread: for the Poisson distribution, the canonical parameter is log(µ);

for binomial distribution, it is the log odds (logit), log{µ/(1 − µ)}.

Although canonical links are sometimes convenient, their use is not neces-

sary to form a GLM. In general, only the speciﬁcation of a mean and variance

function, conditionally on covariates, is required. The mean follows (2.1), and

the variance is given by

v(µ

,φ)=φh(µ

where h(·)issomefunction of the mean and φ>0isascalefactor.Certain

choices of g and h will yield likelihood score equations for common parametric

regression models based on exponential family distributions. For example,

setting g(µ)=log{µ/(1 − µ)}, h(µ)=µ(1 − µ)andφ =1yieldslogistic

regression under a Bernoulli distribution. Similarly, Poisson regression can be

speciﬁed by setting g(µ)=log(µ), h(µ)=µ,andφ =1.

2.4 Conditionally speciﬁed models

This section focuses on models that specify the mean of Y given X con-

ditionally on random eﬀects; these models also are known by a variety of

names, including ‘mixed eﬀects models’,‘multilevel models’, ‘random eﬀects

models’, and ‘random coeﬃcient models’. Throughout the text, we use the

terms ‘random eﬀects’ and ‘multilevel’ models. Readers arereferred to Bres-

lowand Clayton(1993) and Daniels and Gatsonis (1999) for a more complete

accounting and list of references. Thisclassofmodels also includes regres-

sion models with factor-analytic and latent class structures (see Bartholomew

and Knott, 1999 for a full account) and Markov models (where the mean is

speciﬁed conditional on a subset of past responses).

Conditionally speciﬁed models with multiple levels, using random eﬀects

or latent variables, provide a highly ﬂexible class of models for handling lon-

gitudinal data. The models can be applied either to balanced or unbalanced

response proﬁles and can be used to capture key features of both between-

and within-subject variation using relatively few parameters.

CONDITIONALLY SPECIFIED MODELS 21

The most common random eﬀects models for longitudinal data specify the

joint distribution p(y, b | x, θ)as

p(y | b, x, θ

) p(b | x, θ

The parameter θ

captures the conditional eﬀect of X on Y and the parameter

captures features of the distribution of random eﬀects b.Thedistribution

unconditional on random eﬀects is obtained by integrating b out of the joint

distribution

p(y | x, θ

, θ



p(y | x, b, θ

) p(b | x, θ

) db;(2.2)

notice the unconditional distribution is indexed by the full set of parameters

θ =(θ

, θ

2.4.1 Random eﬀects models based on GLMs

By including random eﬀects, generalized linear models can be used to model

longitudinal and clustered data. For common distributions such as Bernoulli

and Poisson, the GLM with random eﬀects can be written in terms of the

conditional mean and variance. The model for the conditional mean µ

E(Y

| x

, w

)takestheform

g{E(Y

| x

, w

, b

)} = g(µ

)=x

β + w

where g(·)isalink function and w

is a design matrix for the subject-speciﬁc

random eﬀects. This representation of the conditional mean motivates the

term ‘mixed-eﬀects model’ because the coeﬃcients quantify both population-

level (β)andindividual-level (b

)eﬀects.

The conditional variance is given by

=var(Y

| x

, w

, b

)=φh(µ

)

for φ>0andasuitably chosen h(·). Finally, within-subject correlation is

speciﬁed through a covariance function

ijk

(γ)=cov(Y

| x

, x

, b

, γ).

In many cases it is assumed that C

ijk

=0;i.e., that the random eﬀects capture

relevant within-subject correlation (after averaging over their distribution),

but this assumption may not always be appropriate for longitudinal responses.

At the second level, the random eﬀects b

follow some distribution such

as multivariate normal. The model for the marginal joint distribution of

,...,Y

| X

)isobtained by integrating over b

as in (2.2).

The relationship between marginal andconditional models is important to

understand, particularly as it relates to interpreting covariate eﬀects. In what

follows we give several examples to illustrate.

22 REGRESSION MODELS

2.4.2 Random eﬀects models for continuous response

Anatural choice for modeling continuous or measured responses is the normal

distribution. In random eﬀects models, allowing both within- and between-

subject variation to follow a normal distribution, or more generally a Gaussian

process, aﬀords considerable modeling ﬂexibility while retaining interpretabil-

ity.

Example 2.1. Normal random eﬀects model for continuous responses.

Acommon model for continuous longitudinal responses is the normal random

eﬀects model. This model illustrates well the concept of a conditionally speci-

ﬁed joint distribution because the variance-covariance structure in p(y | x, θ)

is a by-product of the assumed random eﬀects distribution.

Like many random eﬀects models, it is easiest to describe in two stages. At

the ﬁrst stage, the vector of responses Y

,measuredattimes{t

,...,t

i,J

are normal conditionally on a q × 1vector of random eﬀects b

| x

, b

∼ N (µ

, Σ

where the superscript b denotes that the mean and covariance are conditional

on b

.Toincorporatecovariate eﬀects, let

= x

β + w

where w

is the design matrix for random eﬀects. The variance matrix Σ

(φ)captureswithin-subject variation and is parameterized by the r × 1

vector φ of nonredundant parameters. Hence θ

=(β, φ).

When w

⊆ x

,asisusually the case, the b

can be thought of as error

terms for one or more of the regression coeﬃcients, which gives rise to the

term ‘random coeﬃcient model’. For example, if x

= w

,weobtain

= x

(β + b

), (2.3)

where the random eﬀects b

can be interpreted as individual-speciﬁc deviations

from β.

The within-subject variance Σ

(φ)usually has a simpliﬁed structure, pa-

rameterized through a covariance function C

ijk

(φ). For example,anexponen-

tial structure takes the form

ijk

(φ)=σ

−t

where φ =(σ

,ρ)and0≤ ρ ≤ 1.

At the second level, the q-dimensional vector of random eﬀects is assigned

adistribution that can depend on covariates. The (multivariate) normal is a

common choice,

| x

∼ N (0, Ω),

where Ω = Ω(η)isaq × q covariance matrix indexed by η (hence θ

= η).

CONDITIONALLY SPECIFIED MODELS 23

It also is possible to allow η to depend on individual-level covariates through

appropriate speciﬁcations (Daniels and Zhao, 2003).

Upon integrating over b,themarginal distribution of Y

follows the mul-

tivariate normal distribution

| x

, w

∼ N (x

β, w

Ωw

+ Σ

). (2.4)

The marginal variance var(Y

| x

)depends on parameters from both p(y |

x, b)andp(b | x). Moreover, we see by comparing (2.3) and (2.4) that β can

be interpreted both as a marginal and a conditional eﬀect of X on Y .We

discuss this further in Section 2.7. A version of this model is used to analyze

the schizophrenia trial in Sections 4.3 and 7.3. 2

2.4.3 Random eﬀects models for discrete responses

Random eﬀects speciﬁcations can be very useful for modeling longitudinal

discrete responses, where the joint distribution rarely takes an obvious form

and principles from generalized linear models are not as easily applied. In the

case of longitudinal binary data, for example, it is straightforward to show that

the joint distribution of a J-dimensional response variable can be represented

by a multinomial distribution with 2

categories. When J is appreciably large,

however, parameter constraints must be imposed to make modeling practical.

SeeLaird (2004), Chapter 7, for a more detailed discussion.

Compared to direct speciﬁcation of the joint distribution, random eﬀects

models oﬀer the advantage of being parsimonious, providing a natural decom-

position of multiple sources of variation, and applying equally well to balanced

and unbalanced response proﬁles. The regression parameters represent covari-

ate eﬀects in the conditional rather than marginal joint distribution of Y ,

however, and because the link functions are nonlinear transformations of the

mean (e.g., log, logit), these do not generally coincide. Therefore care must be

taken when interpreting regression eﬀects. The logistic regression with normal

random eﬀects illustrates several of these points rather well.

Example 2.2. Logistic regression with random eﬀects.

As in Example 2.1, a logistic random eﬀects model is speciﬁed in terms of the

joint distribution

p(y, b | x, θ)=p(y | x, b, θ

) p(b | x, θ

where θ =(θ

, θ

). Conditionally on b

,thedistribution of each component

of Y

follows the Bernoulli model

| x

, b

∼ Ber(µ

where

g(µ

)=x

β + w

(2.5)

24 REGRESSION MODELS

(hence θ

= β). The random eﬀects distribution follows

| x

∼ N (0, Ω),

so θ

corresponds to the nonredundant components of Ω.

The parameter β characterizes the conditional, or subject-speciﬁc eﬀect of

X on Y .Bycontrast, the marginal — or population-averaged —distribution

p(y | x, θ)mustbeobtained by integrating over b.Whenw

is a subset of

,themarginal mean µ

(β, Ω)=E(Y

| x

, β, Ω)is

(β, Ω)=



(β) p(b | Ω) db



exp(x

β + w

1+exp(x

β + w

p(b | Ω) db.

where p(b | Ω)isthemultivariate normal density with mean 0 and variance

Ω.Themarginal eﬀect of X diﬀers from the conditional eﬀect in that it is a

function of both β and Ω,andonthelogit scale, it is no longer linear.

Zeger and Liang (1992) show that in some cases, the marginal eﬀect in

the logit-normal model is approximately linear on the logit scale, and diﬀers

from the conditional eﬀect by a scale factor that depends on Ω.Toillustrate,

consider the simple case of a logistic regression with a single covariate x

and

random intercept; i.e.,

logit(µ

)=β

+ b

+ β

where b

∼ N(0,ν

). Here, β

is the conditional eﬀect of x.Itcanbeshown

that the marginal eﬀect of x,denoted by β

,canbeapproximately represented

by the logistic model

logit(µ

)=β

+ β

where β

=(c

+1)

−1/2

β,andc ≈ .346. Hence the marginal or population

averaged eﬀect of x is attenuated relative to the conditional or subject-speciﬁc

eﬀect, with degree of attenuation governed by the magnitude of the random

eﬀects variance ν

.Interpreting the marginal and conditional eﬀects is con-

sidered further in Section 2.7.

In Section 4.4 we use this model to characterize the eﬀect of a behavioral

intervention on weekly smoking cessation status using longitudinalbinarydata

from the Commit to Quit I study. 2

Examples 2.1 and 2.2 assume the random eﬀects b

follow a normal dis-

tribution; this is not necessary and in many cases it may be inappropriate

or incorrect. Zhang and Davidian (2001) describe models where the random

eﬀects distribution belongs to a ﬂexible class of densities that includes the

normal as a special case. Verbeke and Lesaﬀre (1996) describe random ef-

fects distributions that follow discretemixturesofnormaldistributions. For

simple models, it is sometimes possible to use exploratory analysis in order

DIRECTLY SPECIFIED (MARGINAL) MODELS 25

to ascertain whether a normal or other symmetric distribution is suitable for

describing the random eﬀects. In other cases, more formal methods of model

choice may beneeded.

2.5 Directly speciﬁed (marginal) models

This section reviews the family of models in which the joint distribution of Y

given X is directly speciﬁed by a model p(y | x, θ). Usually the most challeng-

ing aspect of model speciﬁcation is ﬁnding a suitable parameterization for the

correlation and/or covariance, particularly when observations are unbalanced

in time or when the number of observations per subject is large relative to

sample size. In these cases, sensible decisions about dimension reduction must

be made.

For continuous data that can be characterized using a normal distribution

or Gaussian process, modelspeciﬁcation (though not necessarily selection)

can be reasonably straightforward, owing to the natural separation of mean

and variance parameters in the normal distribution. The analyst can focus

eﬀorts separately on models for mean and covariance structure.

Other types of data pose more signiﬁcant challenges to the process of direct

speciﬁcation due to a lack of obvious choices for joint distribution models. Un-

like the normal distribution, which generalizes naturally to the multivariate

and eventhestochastic process setting, common distributions like binomial

and Poisson do not have obvious multivariate analogues. One problem is that

the mean and covariance models share the same parameters, even for sim-

ple speciﬁcations. Another potential problem is that unlike with the normal

model, higher-order associations do not necessarily follow from pairwise as-

sociations, and need to be speciﬁed or explicitly constrained (Fitzmaurice

and Laird, 1993). The joint distribution of J binary responses, for example,

has 2

− J parameters governing the association structure. With count data,

appropriate speciﬁcation of even a simple correlation structure is not imme-

diately obvious.

This section describes various approaches to direct model speciﬁcation,

illustrated with examples from the normal and binomial distributions. The

ﬁrst examples use the normal distribution. For longitudinal binary data, we

describe an extension of the log-linear model that allows transparent inter-

pretation of both the mean and serial correlation. Another useful approach to

modeling association in binary data is the multivariate probit model, which

exploits properties of the normal distribution by assuming the binary ran-

dom variables are manifestations of an underlying normally distributed latent

process.

26 REGRESSION MODELS

2.5.1 Multivariate normal and Gaussian process models

The multivariate normal distribution provides a highly ﬂexible starting point

for modelingcontinuous response data, both temporally aligned and mis-

aligned. It also is useful for handling situations where the number of observa-

tion times is large relative to the number of units being followed. The most

straightforward situation is where data are temporally aligned and n  J,

allowing both the mean and variance to be unstructured. When responses are

temporally misaligned, or when J is large relative to n,structure must be

imposed.

Akeycharacteristic of the normal distribution allowing for ﬂexible mod-

eling across a wide variety of settings is that the mean and variance have

separate parameters. The next two examples illustrate a variety of model

speciﬁcations using the normal distribution.

Example 2.3. Multivariate normal regression for temporally aligned obser-

vations.

Assume that observations on the primary response variable are taken at a

ﬁxed schedule of times t

,...,t

.Foraresponse vector Y

=(Y

,...,Y

)

with associated J × p covariate matrix X

,themultivariate normal regression

is written as

| x

∼ N (µ

, Σ

where µ

is J × 1andΣ

is J × J.Themeanµ

= E(Y

| X

= x

)follows

aregression model

= x

β,

where x

is the observed J × p covariate matrix and β is a p × 1vector of

regression coeﬃcients.

The covariance matrix is parameterized with a vector of non-redundant

parameters φ.Toemphasize that the covariance matrix may depend on x

through φ,wesometimes write

(φ)=Σ(x

, φ)

(Daniels and Pourahmadi, 2002). If Σ

is assumed constant across individuals,

it has J(J +1)/2 unique parameters, but structure can be imposed to reduce

this number (Jennrich and Schluchter, 1986). As an alternative to leaving Σ

fully parameterized, common structures for longitudinal data include banded

or Toeplitz (with common parameter along each oﬀ-diagonal), and autore-

gressive correlations of pre-speciﬁed order (N´u˜nez Ant´on and Zimmermann,

2000; Pourahmadi, 2000; Pourahmadi and Daniels, 2002).

The matrix X

can include information about measurement time, baseline

covariates, and the like. If we set x

=(1,t

)andβ =(β

,β

)

,thenβ

corresponds to the average slope over time, where the average is taken over

DIRECTLY SPECIFIED (MARGINAL) MODELS 27

the population from which the sample of individuals is drawn. When J is small

enough, x

can include a vector of time indicators, allowing the mean to be

unstructured in time. In Sections 4.2 and 7.2, this model is used to analyze

data from the Growth Hormone Study. 2

In the previous example, it is sometimes possible to allow both the mean

and variance to remain unstructured in time when there are relatively few time

points and covariate levels. When time points are temporally misaligned, or

when the number of observation times is large relative to the sample size,

information at the unique measurement times will be sparse andadditional

structure needs to be imposed. Our focus in the next example is on covariance

parameterization in terms of a covariance function. Further details can be

found in Diggle et al. (2002), Chapter 4.

Example 2.4. Multivariate normal regression model for temporally misaligned

observations.

The main diﬀerence in model speciﬁcation when observations are temporally

misaligned has mainly to do with the covariance parameterization. As with

Example 2.3, a normal distribution may be assumed, but with covariance Σ

whose dimension and structure depend on the number and timing of observa-

tions for individual i.Thejointdistribution follows

| x

∼ N(µ

, Σ

Also as in Example 2.3, µ

= x

β.Thecovariance Σ

(φ)hasdimension J

and structure is imposed by specifying a model C(t

, φ)fortheelements

ikl

=cov(Y

). For example, the model

C(t

; φ

,φ

)=φ

exp(−φ

−t

)(2.6)

requires only φ

and φ

to fully parameterize Σ

(φ)(withthe constraints

> 0andφ

≥ 0). This covariancemodelimplies

(i) var(Y

| x

, φ)=φ

(setting t

= t

)

(ii) For any two observations Y

separated by lag |t

− t

corr(Y

| x

, φ)=exp(−φ

−t

The correlation function above is stationary because for any given lag, the

correlation is a constant function of (t

As with Example 2.3, the variance and correlation parameters may depend

on covariates through an appropriately speciﬁed model. If we modify (2.6)

such that

ikl

= φ

exp(−φ

−t

)

(i.e., φ

now depends on i), then we may model φ

as a function of covariates

via

log(φ

)=x

α.

28 REGRESSION MODELS

In Section 7.6, a semiparametric version of this model will be used to char-

acterize the response of longitudinal CD4 counts to initiation of highly active

antiretroviral therapy (HAART) in the HER Study. 2

2.5.2 Directly speciﬁed models for discrete longitudinal responses

Although the multivariate normal distribution is a natural choice for charac-

terizing the joint distribution of continuous longitudinal responses, approaches

to dealing with discrete observationssuchasbinary,categorical, or count

data are less obvious. Thissection describes two models for longitudinal bi-

nary data, both of which can be generalized to handle ordinal or multinomial

response.

Achallenge in formulating models for the joint distribution of discrete

responses is having a sensible parameterization of the correlation while main-

taining an interpretable regression structure for the mean. This diﬃculty arises

because the correlation is a function of the mean. One approach is to fully

parameterize higher-order interactions and then constrain some of them to

be zero (Fitzmaurice and Laird, 1993; Fitzmaurice et al., 1994, 1996). An-

other is to formulate models that deal explicitly with serial correlation. Our

examples here include the marginalized transition model (MTM) (Heagerty,

2002), which is a constrained version of a log-linear model, and a multivari-

ateprobit model, which uses a latent multivariate normal structure to induce

correlation (Chib and Greenberg, 1998). The probit model is used widely in

econometric modeling but less so for biostatistical applications. Model coeﬃ-

cients lack the simplicity of odds ratio interpretation, but the assumption of

an underlying latent normal distribution provides computational tractability

and ﬂexible modeling of serial covariance structures.

Marginalized transition models

We illustrate the formulation of MTMs using an example with ﬁrst-order

dependence.

Example 2.5. Marginalized transition model for temporally aligned binary

data.

Let Y

=(Y

,...,Y

)

denote a vector of binary responses, and let X

denote the design matrix. The marginal mean given covariates x

= E(Y

| x

)=P(Y

=1| x

To describe the mo del, some additional notation is needed. For any time-

dependent variable Z,let

= { Z

,...,Z

} denote its history up to and

including time j.TheMTMlikelihood is a transition model where the dis-

tribution of Y

given (Y

i,j−1

, x

)follows a Bernoulli distribution, but is

DIRECTLY SPECIFIED (MARGINAL) MODELS 29

constrained to allow g(µ

)tobelinear in covariates.

∗

Hence the marginal

mean retains its form as a regression, but the distributional assumptions are

made in the serial correlation model.

We use the logistic regression formulation for illustration. The underlying

joint distribution of responses is factored as

p(y

,...,y

| x)=p(y

| x

) p(y

| y

, x

) ···p(y

| y

J−1

, x

Each component of the joint distribution is assumed to follow a Bernoulli

distribution

| y

i,j−1

, x

∼ Ber(φ

where φ

= E(Y

| y

i,j−1

, x

). For the ﬁrst-order dependence model, de-

noted MTM(1), we have

p(y

| y

j−1

, x

)=p(y

| y

j−1

, x

The model is speciﬁed in terms of two simultaneous equations. The ﬁrst allows

logit of the marginal mean µ

to depend linearly on covariates x

;thesecond

characterizes the dependencestructure described by the conditional mean φ

logit(µ

)=x

logit(φ

)=∆

+ y

i,j−1

, (2.7)

where ∆

=∆(x

)isdetermined by β and γ

.Ifserialcorrelation γ

depends

on individual-level covariates w

⊆ x

,thenwereplace γ

by γ

,where

= w

α.

These constraints imply that for a given value of x

,∆

is a deterministic

function of γ

, β,andy (Heagerty, 2002). To see this clearly, ﬁrst write

= φ(∆

,γ

,y)=E(Y

| Y

i,j−1

= y)

to emphasize its dependence on Y

i,j−1

.Then

(β)=P(Y

=1| x

)

=P(Y

=1| Y

i,j−1

=0, x

, γ, ∆

)P(Y

i,j−1

=0| x

i,j−1

, β)

+P(Y

=1| Y

i,j−1

=1, x

, γ, ∆

)P(Y

i,j−1

=1| x

i,j−1

, β)

= φ

(γ, ∆

, 0){1 − µ

i,j−1

(β)} + φ

(γ, ∆

, 1)µ

i,j−1

(β). (2.8)

In Section 4.4 we use this model to analyze data from the CTQ I study

(Section 1.4) and compare the results to those obtained with the logistic-

normal random eﬀects model described in Example 2.2. In Section 7.4, we

also use this model to analyze data from the same study under MAR. 2

∗

In many cases, we implicitly assume x

includes relevant covariate history up to and

including time j,obviatingthe need for overbar notation.