Wilkinson D.J. Stochastic Modelling for Systems Biology

Подождите немного. Документ загружается.

224

INFERENCE FOR STOCHASTIC KINETIC MODELS

space

bridging

sample

paths

that

has

the

same

support

the

true

bridging

process.

Then we can use an appropriate Metropolis-Hastings acceptance probability in order

to correct for the approximate step. An outline

the proposed MCMC algorithm

can be stated as follows.

Initialise the algorithm with a valid sample path consistent with the observed data.

2. Sample rate constants from their full conditionals given the current sample path.

3. For each

the T intervals, propose a new sample path consistent with its end-

points and accept/reject it with a Metropolis-Hastings step.

Output the current rate constants.

5. Return to step 2.

In order to make progress with this problem, some notation is required.

keep

the notation

simple

possible, we will now redefine some notation for the unit

interval

[0,

which previously referred to the entire interval

[0,

T]. So now

{x(t):

t E

(0,

1]}

denotes the "true" sample path that is only observed at times t = 0 and t = 1, and

{X(t)

: t E

[0,

1]}

represents the stochastic process that gives rise to x

a single observation. Our prob-

lem is that we would like to sample directly from the distribution (XIx(O), x(1), c),

but this is difficult, so instead we will content ourselves with constructing a Metropo-

lis-Hastings update that has

7T(xlx(O),x(1),

its target distribution. Let us also

re·define r = ( r

...

, r v

be the numbers

reaction events in the interval [ 0,

1],

and n =

I:;=l

Tj.

is clear that knowing both

x(O)

and x(1) places some con-

straints on

but it will not typically determine it completely.

turns out to be eas-

iest

sample a new interval in two stages: first pick an r consistent with the end

constraints and then sample a new interval conditional on

x(O)

and r.

So,

ignoring

the problem

sampling r for the time being, we would ideally like

be able to

sample from

7T(xlx(O),

c), but this is still quite difficult to do directly. At this point

is helpful to think

the u-component sample path X

being a function

the

v-component point process

reaction events. This point process is hard to simulate

directly

its hazard function is random, but the hazards are known at the end-points

x(O)

and x(1), and so they can probably be reasonably well approximated by v in-

dependent inhomogeneous Poisson processes whose rates vary linearly between the

rates at the end points. In order to make this work, we need to be able to sample

from an inhomogeneous Poisson process conditional on the number

events. This

requires some Poisson process theory not covered in

Chapter

Lemma

10.1

For

given fixed A,p >

consider N

"'Po(>-.)

and

XIN"'

B(N,p).

Then

marginally

have

Po(>-.p).

226

INFERENCE FOR STOCHASTIC KINETIC MODELS

Poisson process is as a homogeneous Poisson process with time rescaled in a non-

linear way.

Proposition 10.2 Let X be a homogeneous Poisson process on the interval

[0,

with rate

(ho

+ h

)j2,

and let Y be an inhomogeneous Poisson process on the

same interval with rate

.\(t) =

(1-

t)ho +

,for

given .fixed

ho,

A realisation

the process Y can be obtained from a realisation

the process X

by applying the time transfonnation

Jh~

+{hi-

hB}t-

t·-

~~--~~~~----

h1-

to the event times

the X process.

Proof

Process X has cumulative hazard M(t) = t(ho + h

)/2, while process Y

has cumulative hazard

A(t) =

[(1-

t)ho + th1]dt

=hot+

2"(h

- ho).

Note that the cumulative hazards for the two processes match at both t = 0 and

t = 1, and so one process can be mapped to the other by distorting time

make the

cumulative hazards match also at intermediate times. Let the local time for the

process be

sand

the local time for

theY

process

bet.

Then setting M(s) = A(t)

gives

2(ho +

h1)

=hot+

2(h1

ho)

0 =

(h1-

ho)

+hot-

(ho

h1)

-ho

)h6

+ (h1 - ho)(ho + h1)s

~t= h h .

So, we can sample an inhomogeneous Poisson process conditional on the number

events by first sampling a homogeneous Poisson process with the average rate

conditional

the number

events and then transforming time to get the correct

inhomogeneity.

In order to correct for the fact that we are not sampling from the correct bridging

process, we will need a Metropolis-Hastings acceptance probability that will depend

both on the likelihood

the sample path under the true model and the likelihood

the sample path under the approximate model.

have already calculated the

likelihood under the true model (the complete-data likelihood).

now need the

likelihood under the inhomogeneous

Poisson process model.

Proposition 10.3 The complete data likelihood

for

a sample path x

the interval

[0,

under the approximate inhomogeneous Poisson process model is given by

LA(c;

{il

>-v,

(ti)}

exp {

-~[ho(x(O),

c)+

(x(1), c)]},

228

INFERENCE FOR STOCHASTIC KINETIC MODELS

way.

The

Radon-Nikodym

derivative

measures

the

"closeness"

the

approximating

process to the true process, in the sense that the more closely the processes match,

the closer the derivative will be to

We are now

a position to state the basic form

the Metropolis-Hastings update

the interval

[0,

1].

First a proposed new r vector will be sampled from an appropri-

ate proposal distribution with PMF

f(r* lr) (we will discuss appropriate ways

con-

structing this later). Then conditional on

r*,

sample a proposed sample path x* from

the approximate process and accept the pair

(r*,x*) with probability

min{l,A}

where

= 1r(x*lx(O),

x(l),

I f(r*lr

)1rA

(x*lx(O),

r*,

1r(xlx(O),

x(l

), c) f(rir*)1fA (xix(O),

1r(x*lx(O),

q(r*)

1fA(x*lx(O),c) f(r*ir)

-.:.::..;.-..,-'-;--.o,.....c..:.,.....c..

.:......:....,.....;--'-

1r(xlx(0), c) q(r)

1fA(xix(O),c) f(rlr*)

x*) q(r*)

LA(c; x*) f(r*ir)

-';:-;-.:.--..,--"-- X

..:.__.o._..,.....;-.:-

L(c;x)

q(r)'

LA(c;

f(rir*)

where q(r) is the PMF

r under the approximate model. That is,

q(r)

=II

qj(rj),

j=l

where qj(rj) is

thePMF

a Poisson with mean [hj(x(O),

c)+hj(x(l),

c)]/2. Again,

could write this more formally as

dlP'

q(r*)

dQ(x*)

dlP'

q(r).

dQ(x) f(rir*)

So now the only key aspect

the MCMC algorithm that has not yet been discussed

is the choice

the proposal distribution f(r*ir). Again, ideally we would like to

sample directly from the true distribution

r given

x(O)

and

x(l),

but this is not

straightforward. Instead we simply want to pick a proposal that effectively explores

the space

consistent with the end points. Recalling the discussion

Petri nets

from

Section 2.3, to a first approximation the set

r that we are interested in is the

set

all non-negative integer solutions in r to

x(l)

x(O)

=?-

x(l)-

x(O).

(10.6)

There will be some solutions to this equation that do not correspond to possible

sample paths, but there will not be many

these. Note that given a valid solution

then r + x is another valid solution, where x is any T -invariant

the Petri net. Thus,

230

INFERENCE

FOR

STOCHASTIC KINETIC MODELS

10.4 Diffusion approximations for inference

The discussion in the previous section demonstrates that it is possible to construct

exact MCMC algorithms for inference in discrete stochastic kinetic models based on

discrete time observations (and

is possible to extend the techniques

more realis-

tic data scenarios than those directly considered). The discussion gives great insight

into the nature

the inferential problem and its conceptual solution. However, there

is a slight problem with the techniques discussed there in the context

the relatively

large and complex models

genuine interest to systems biologists. It should be clear

that each iteration

the MCMC algorithm described in the previous section

computationally demanding than simulating the process exactly using Gillespie's di-

rect method (for the sake

argument, let us say that it is one order

magnitude

more demanding). For satisfactory inference, a large number

MCMC iterations

will be required. For models

the complexity discussed in the previous section, it

is not uncommon for

-10

iterations to be required for satisfactory convergence

to the true posterior distribution. Using such methods for inference therefore has a

computational complexity

-10

times that required to simulate the process.

this were not bad enough, it turns out that MCMC algorithms are particularly diffi-

cult to parallelise effectively (Wilkinson

2005). One possible approach

improving

the situation is to approximate the algorithm with a much faster one that is less ac-

curate, as discussed

Boys et al. (2004). Unfortunately even that approach does not

scale up well to genuinely interesting problems, so a different approach is required.

A similar problem was considered in Chapter 8, from the viewpoint

simula-

tion rather than inference. We saw there how it was possible

approximate the true

Markov jump process by the chemical Langevin equation (CLE), which is the dif-

fusion process that behaves most like the true

jump

process.

was seen there how

simulation

the CLE can

many orders

magnitude faster than an exact algo-

rithm. This suggests the possibility

using the CLE as an approximate model for

inferential purposes.

turns out that the CLE provides an excellent model for infer-

ence, even in situations where it does not perform particularly well

a simulation

model. This observation at first seems a little counter-intuitive, but the reason is that

in the context

inference, one is conditioning on data from the true model, and this

helps to calibrate the approximate model and stop MCMC algorithms from wander-

ing off into parts

the space that are plausible in the context

the approximate

model, but not in the context

the true model.

What is required is a method for inference for general non-linear multivariate dif-

fusion processes observed partially, discretely and with error. Unfortunately this too

turns out to be a highly non-trivial problem, and is still the subject

a great deal

ongoing research. Such inference problems arise often in financial mathematics and

econometrics, and so much

the literature relating to this problem can be foundin

that area; see Durham & Gallant

(2002) for an overview.

The problem with diffusion processes is that any finite sample path contains an in-

finite amount

information, and so the concept

a complete-data likelihood does

not exist in general.

will illustrate the problem in the context

high-resolution

time-course data on the CLE. Starting with the CLE in the form

(8.3), define

232

INFERENCE

FOR STOCHASTIC KINETIC MODELS

observations

L(c;x) =

7r(xJc)

n-1

7r(xoJc)

7r(X(i+1)2>tlxi2..t,

i=O

n-1

7r(xoJc)

(27r.6.t)-uf2J,8(xi2..t, c)J-1/2

i=O

x exp { -

(

b.~:t

fJ-(Xi2.t,

c))

,8(xi2.t,

c)-

(

.6.~t

fJ-(Xi2,.t,C))

}·

Now assuming that

1r(

I c) is in fact independent

we can simplify the likelihood

L(c; x)

{}]

J,8(xi2..t,

c)J-

} x

{

~ (

.6.Xi2,.t

)

1 (

.6.Xi2,.t

)

exp - 2

--s:i -

fJ-(Xi2.t,

,8(xi2.t,

c)-

--s:i-

fJ-(Xi2..t,

.6.t}. (10.9)

Equation (10.9) is the closest we can get to a complete-data likelihood for the CLE,

it does not have a limit as

.6.t

tends to zero. t In the case

perfect high-resolution

observations on the system state,

(10.9) represents the likelihood for the problem,

which could be maximised (numerically) in the context

maximum likelihood esti-

mation or combined with a prior to form the kernel

a posterior distribution for

this case there is no convenient conjugate analysis, but it is entirely straightforward

to implement a Metropolis random walk MCMC sampler to explore the posterior

distribution

course it is unrealistic to assume perfect observation

all states

a model,

and in the biological context, it is usually unrealistic to assume that the sampling

frequency will be sufficiently high to make the Euler approximation sufficiently ac-

curate. So, just as for the discrete case, MCMC algorithms can be used in order to

"fill-in" all

the missing information in the model.

MCMC algorithms for multivariate diffusions can be considered conceptually in a

similar way to those discussed in the previous section. Given (perfect) discrete-time

it were the case that

(

was

independent of c, then

could

drop

tenns

longer

involving

get

expression that is well behaved

b.t tends

zero.

In this case, the complete data likelihood

the

exponential of the

sum

two

integrals,

one

which

a regular

Riemann

integral,

and

the

other

Ito

stochastic integral. Unfortunately this rather elegant result

use

here,

the diffusion

matrix of the

CLE

depends

c in a fundamental

way.