Wilkinson D.J. Stochastic Modelling for Systems Biology

Подождите немного. Документ загружается.

214

BAYESIAN INFERENCE AND MCMC

9.3.1 Symmetric chains (Metropolis method)

The simplest case is the Metropolis sampler, which is based on the use

a symmetric

proposal with

¢>)

¢>,

8),

VB,

¢>.

see then that the acceptance probability

simplifies to

. {

7r(</>)}

a.(8,

¢>)

1r(B)

and hence does not involve the proposal density at all. Consequently proposed moves

which will take the chain to a region

higher density are always accepted, while

moves which take the chain to a region oflower density are accepted with probability

proportional to the ratio

the two densities

-moves

which will take the chain to a

region

very low density will be accepted with very low probability. Note that any

proposal

the form q(8,

¢>)

= f(l8 -

¢1)

is symmetric, where f(-)

an arbitrary

density. In this case, the proposal will represent a symmetric displacement from the

current value. This also motivates random walk chains.

9.3.2 Random walk chains

In this case, the proposed value

¢>

at stage j is

¢>

8U-

l +

where the

are

iid random variables (completely independent

the state

the chain). Suppose

that the

have density f(·), which is easy to simulate from.

can then simulate

an innovation,

Wj,

and set the candidate point

to¢>

8(j-

)

Wj·

The transition

kernel is then

q(8,

¢>)

= f(¢>- 8), and this can be used to compute the acceptance

probability.

course,

iff(-)

is symmetric about zero, then we have a symmetric

cbain, and the acceptance probability does not depend on f (

·)

at all.

So, suppose that it is decided to use a symmetric random walk chain with proposed

mean zero innovations. There is still the question

how they should be distributed,

and what variance they should have. A simple, easy to simulate from distribution is

always a good idea, such as uniform or normal (normal is generally better, but is a

bit more expensive

simulate). The choice

variance will affect the acceptance

probability, and hence the overall proportion

accepted moves.

the variance of

the innovation is too low, then most proposed values will be accepted, but the chain

will move very slowly around the space - the chain is said

be too "cold." On the

other hand,

the variance

the innovation is too large, very few proposed values

will be accepted, but when they are, they will often correspond to quite large moves

- the chain is said to be too "hot." Experience suggests that an overall acceptance

rate

around 30% is desirable, and so it is possible to "tune" the variance

the

innovation distribution to get an acceptance rate

around this level. This should be

done using a few trial short runs, and then a single fixed value should be adopted for

the main monitoring run. t

An R function implementing a simple Metropolis random walk sampler

given

t Although it sounds appealing to adaptively change the tuning parameter during the main monitoring

run, this usually affects the stationary distribution

the chain, and hence should be avoided (unless

you

really know what you are doing).

216

BAYESIAN INFERENCE AND MCMC

-straightforward

calculate.

Unfortunately

this

method

tends

result

badly

mix-

··

ing chain

the problem is high dimensional and the data are not in strong accordance

with the prior.

9.4 Hybrid

MCMC

schemes

We have seen how we can use the Gibbs sampler to sample from multi-variate distri-

butions provided that we can simulate from the full conditionals.

have also seen

how we can use Metropolis-Hastings methods to sample from awkward distributions

(perhaps full conditionals).

we wish, we can combine these in order to form hybrid

Markov chains whose stationary distribution is a distribution

interest.

Componentwise transition: Given a multivariate distribution with full conditionals

that are awkward to sample from directly, we can define a Metropolis-Hastings

scheme for each full conditional and apply them to each component in tum for

each iteration. This is like the Gibbs sampler, but each component update is a

Metropolis-Hastings update, rather than a direct simulation from the full condi-

tional. This is in fact the original form

the Metropolis algorithm.

Metropolis within Gibbs: Given a multivariate distribution with full conditionals,

some

which may

simulated from directly, and others which have Metropolis-

Hastings updating schemes, the Metropolis within Gibbs algorithm goes through

each in turn, and simulates directly from the full conditional,

carries out a

Metropolis-Hastings update as necessary.

Blocking: The components

a Gibbs sampler, and those

Metropolis-Hastings

chains, can be vectors (or matrices)

well as scalars. For many high-dimensional

problems, it can

helpful to group related parameters into blocks and use multi-

variate simulation techniques to update those together

possible. This can greatly

improve the mixing

the chain, at the expense

increasing the computational

cost

each iteration.

Some

the methods discussed in this section will

illustrated in practice in Chap-

ter

10.

9.5 Exercises

Modify the simple Metropolis code given in Figure 9.5 in order to compute the

overall acceptance rate

the chain. Write another R function which uses this

modified function in order to automatically find a tuning parameter giving an

overall acceptance rate

around 30%.

2. Rewrite the Gibbs sampling code from Figure 9.2 in a faster language such as C,

Fortan,

Java. Real MCMC algorithms run too slowly in

so it is necessary

build up an MCMC code base in a more efficient language.

3. Install the

R-CODA package for MCMC output analysis and diagnostics and learn

how

works. Try it out on the examples you have been studying.

4. Download some automatic MCMC software (linked from this book's website).

Learn how these packages work and try them out on some simple models.

220

INFERENCE

FOR

STOCHASTIC

KINETIC

MODELS

literature

this

area

accessible.

Appropriate

further

reading

will

highlighted

where relevant.

10.2 Inference given complete

data

turns out to

helpful to consider first the problem

inference given perfect data

on the state

the model over a finite time interval

[0,

T].

That is, we will assume

that the entire sample path

each species

the model is known over the time period

[0,

T].

This is equivalent to assuming that we have been given discrete-event output

from a Gillespie sirimlator, and

are then required to figure out the rate constants that

were used on the basis

the output. Although

is completely unrealistic to assume

that experimental

data

this quality will

available in practice, understanding this

problem is central

.to

understanding the more general inference problem.

any case,

is clear that

we cannot solve even this problem, then inference from data sources

lower quality

will

beyond our reach.

will

be helpful to assume the model notation from Chapter 6, with species

...

, x,., reactions R

...

Rv.

rate constants c = (c

...

ev)',

reaction hazards

h1(x,

c1),

...

, hv(x,

Cv),

and combined hazard

ho(x,

= L

~(x,

Ci)·

i=l

is now necessary to explicitly consider the state

the system at a given time, and

this will

denoted x(t) = (x

(t),

...

, x,.(t))'. Our observed sample path will be

written

:z:

= {x(t) : t E

[0,

T]}.

As we have complete information on the sample path, we also know the time and

type

each reaction event (in fact, this is what

really mean by complete data).

is helpful to use the notation

for the number

reaction events

type

that

occurred

the sample path

:z:,

j = 1,

...

, v, and to define n =

'2:j=

ri to be the

total number

reaction events occurring in the interval

[0,

T].

We will now consider

the time and type

each reaction event, (ti, vi). i = 1,

...

, n, where the ti are

assumed to

increasing order and

lli

E {1,

...

, v }. It is notationally convenient

to make the additional definitions

to=

0 and tn+l = T.

In order to carry out model-based inference for the process, we need the likelihood

function. A formal approach to the development

a rigorous theory

likelihood for

continuous sample paths is beyond the scope

a text such as this, but it is straight-

forward to compute the likelihood in an informal way by

considermg the terms in

the likelihood that arise from constructing the sample path according to Gillespie's

direct method. Here, the

term

in the likelihood corresponding to the ith event is just

the joint density

the time and type

that event. That is, ·

( ( ) ) { ( ( )

)[

]}

hv;(x(ti-t),Ci)

ti-l

, c

exp

-ho

ti-l

, c

-ti-l

x h ( ( ) )

0 X

ti-l

,Ci

= exp{ -ho(x(ti-1),

c)[ti-

ti-l]}hv,

(x(ti-d,

Ci).

222

INFERENCE

FOR

STOCHASTIC KINETIC MODELS

Substituting into (10.2) and simplifying then gives

L(c;

{fi

Cv,9v,

(x(ti-1))}

exp {-iTt Cj9j(x(t))

dt}

t=l

J=l

cj'}

exp

[ c;g;(x(t))

dt}

=IT

exp

{-cj

gj(x(t))

dt}

J=l

=IT

L·(c··x)

)

j=l

where the component likelihoods are defined by

(10.3)

This factorisation

the complete-data likelihood has numerous important conse-

quences for inference.

means that in the complete data scenario, information re-

garding each rate constant is independent

the information regarding the other rate

constants. That is, inference may be carried out for each rate constant separately. For

example, in a maximum likelihood framework (where parameters are chosen

make

the likelihood

large

possible), the likelihood can be maximised for each param-

eter separately. So, by partially differentiating (10.3) with respect to

and equating

to zero, we obtain the maximum likelihood estimate

= T , j = 1,

...

gj(x(t))dt

(10.4)

In the context

Bayesian inference, the factorisation means that

independent prior

distributions are adopted for the rate constants, then this independence will be re-

tained a posteriori.

is also clear from the form

(10.3) that the complete-data

likelihood is conjugate to an independent

ganima prior for the rate constants. Thus,

adopting priors for the rate constants

the form

1r(c)

=IT

1r(cj),

j=l