Barnes D.J., Chu D. Introduction to Modeling for Biosciences

Подождите немного. Документ загружается.

Chapter 7

Simulating Biochemical Systems

This chapter describes in detail how modeling techniques can be used to simu-

late biochemical systems. In particular, we look at Gillespie’s stochastic simula-

tion algorithm (SSA) [21] and some of the variations spawned from this to effect

efﬁcient modeling of systems of coupled chemical reactions. Gillespie’s work of-

fers a bridge between the continuous models of differential equations (Chap. 4)

and the individual representations of agent-based models (Chap. 3). Distinctive of

the SSA approach is that it treats individual molecular species as individual en-

tities, yet processes all agents of the same species en masse. Nevertheless, the

number of molecules within each species plays a signiﬁcant role within the evo-

lution of a model. Exploration of the SSA and related approaches includes de-

tailed Java code to illustrate the main elements of their implementation. In ad-

dition, we introduce the widely available modeling environments Dizzy [32] and

SGNsim [36], that provide convenient, packaged implementations of these algo-

rithms.

7.1 The Gillespie Algorithms

In the differential equation approach, a model of a chemical reaction system is built

as a set of coupled differential equations. The variables of these equations are molec-

ular concentrations. For other than relatively simple sets of equations, analytical so-

lutions are likely to be unavailable and the equations have to be solved numerically.

A notable feature of ODE models is that they imply a system of reactions that is

both continuous and deterministic, neither of which accurately matches the realities

of the physical world. For instance, we know that the inherent stochasticity of nature

can produce important distinctive non-deterministic effects, and at the beginning of

Chap. 6 we discussed some of the errors that creep in when we ignore the fact of

integral particle numbers at low concentrations.

In contrast to ODE models, agent-based models consider each individual

molecule along with the discrete interactions occurring between them. Random

number generators support the stochastic aspect but individual-based approaches

D.J. Barnes, D. Chu, Introduction to Modeling for Biosciences,

DOI 10.1007/978-1-84996-326-8_7, © Springer-Verlag London Limited 2010

273

274 7 Simulating Biochemical Systems

can be computationally very expensive. Given that the states of individual molecules

are effectively indistinguishable from each other at the level of the model, represent-

ing each molecule distinctly is questionably detailed.

Gillespie [20] formulated an approach that was designed for systems with rela-

tively low numbers of the individual molecules of interest within a perfectly mixed

chemical volume, treating different molecular species en masse rather than individ-

ually. His formulation was derived from an analysis of the “collision probability per

time unit” of two reactive molecules within a perfectly mixed volume in thermal

equilibrium. Collisions between molecules of interest lead to reactions which then

discretely alter the numbers of reactant and product molecular species.

Rather than having a deterministic reaction rate constant (k

) for a reaction, Gille-

spie associated each with a stochastic reaction constant, c

, representing the proba-

bility that a particular pair of reactant molecules would react in the next inﬁnitesimal

time interval. By combining c

with the number of possible combinations of pairs

of those reactant molecules (h

) at a particular time then the probability of that

reaction taking place within the next time interval can be calculated. Given a set

of reactions, the task of simulation becomes to repeatedly identify which reaction

is most likely to occur next and when. The stochastic nature of a reaction system

means that these two questions should be answered probabilistically. As numbers

of reactant molecules change through reactions occurring, so do the probabilities of

other reactions. This probabilistic element means that the results from using Gille-

spie’s approach naturally have the desirable deviations from the smooth results of

deterministic approaches, which are unrealistic when relatively small numbers of

molecules of the species of interest are involved. Figure 7.1 outlines the basic sim-

ulation steps of one of Gillespie’s formulations of this approach.

The probability of any particular reaction occurring next is dependent on the

stochastic reaction constant values of all the reactions in the system, as well as on

the numbers of all of the different reactant molecules. This can be seen from the

probability density function for a reaction i, which is:

P(τ,i)=a

exp(−a

τ) (7.1)

This is the probability at time t that i will be the next reaction to occur, and that it

will occur between time (t +τ) and time (t +τ +δτ), where δτ is an inﬁnitesimal

time interval, a

, a



i=1

, and M is the number of reactions. The value

is usually referred to as the propensity of reaction i.Ifμ is the next reaction to

occur then no reaction takes place between the current time t and time t +τ .

Using this formulation, Gillespie described two methods for providing the an-

swers to the two questions posed above: identifying the next reaction, μ and when

it would occur, t +τ . These are known as the direct method and the ﬁrst reaction

method.

7.1.1 Gillespie’s Direct Method

In the direct method, the probability density function, along with a pair of random

numbers from the unit interval, are used to generate the answers to the two questions

7.1 The Gillespie Algorithms 275

1. Initialization:

• Set the simulation time to zero, and initialize the random number generator.

• Set up data for the reactions to be modeled: reaction constants and numbers of

each type of molecular species.

• Calculate the initial propensity values, a

, for each reaction, and a

as the sum

of these.

2. Iteration:

• Generate random numbers r

and r

from a uniform distribution.

• Identify the next reaction, μ, and its time to its occurrence, τ ,usingr

and r

• Increase the current time by τ .

• Adjust the reactant and product levels according to reaction μ.

• Output the time and molecular levels, if required.

• Recalculate the a

and a

values for the next iteration.

Fig. 7.1 Outline of Gillespie’s direct method

at the heart of the simulation’s iteration. Given two random numbers r

and r

,we

calculate τ, the time to the next reaction, as: τ =(1/a

) ln(1/r

) and choose μ such

that

μ−1



i=1



i=1

(7.2)

This approach is known as the direct method because of the way it identiﬁes both τ

and μ directly (Fig. 7.1).

In Gillespie’s original formulation, all of the a

values are recalculated after each

reaction. However, note that a reaction’s propensity value, a

, will only change if

the number of molecules of one of its reactants is altered by the selected reac-

tion. Therefore, given suitable data structures relating reaction dependencies to each

other, some optimization of the calculations on each iteration will be possible. We

will explore this aspect in more detail in Sect. 7.2, when we consider Gibson and

Bruck’s variations on Gillespie’s SSA. However, it is worth noting that much of the

literature that uses Gillespie’s SSA approach for its models tends to omit these im-

provements for the sake of simplicity. In Sect. 7.3, we will also consider how the

search corresponding to (7.2), to identify μ, can be made efﬁciently.

7.1.2 Gillespie’s First Reaction Method

The ﬁrst reaction method is entirely equivalent to the direct method but works

slightly different. In this method, the τ value for every reaction is calculated and

the one with the smallest value is identiﬁed as the next reaction. It is this ap-

proach that also forms the basis for Gibson and Bruck’s improvements to Gille-

spie’s SSA, which we shall look at in Sect. 7.2. Figure 7.2 shows an outline of

276 7 Simulating Biochemical Systems

1. Initialization:

• Set the simulation time to zero, and initialize the random number generator.

• Set up data for the reactions to be modeled: reaction constants and numbers of

each type of molecular species.

2. Iteration:

• Calculate the propensity value a

for each reaction.

• Calculate τ

(the putative time delta) for each reaction from an exponential

distribution with parameter a

• Choose the reaction, μ, with smallest τ

• Increase the current time by τ

• Adjust the reactant and product levels according to reaction μ.

• Output the time and molecular levels, if required.

Fig. 7.2 Outline of Gillespie’s ﬁrst reaction method

the ﬁrst reaction method. An obvious difference between the direct and ﬁrst re-

action methods is that the former generates two random numbers per iteration

whereas the latter generates only one. Where random number generation is a rel-

atively expensive operation, this difference may be signiﬁcant. However, note that

both methods take runtime that is proportional to the number of reactions, M.

We say, therefore, that these algorithms are of order M, written as O(M).In

other words, if we were to double the number of reactions in the model then we

would expect something like a doubling of the runtime for the simulation com-

ponent of the model. That the algorithms are O(M) can be observed, for in-

stance, in the way that the selected reaction is identiﬁed using r

in the direct

method and the identiﬁcation of the smallest τ

in the ﬁrst reaction method. Both

involve a linear search, whose length increases in direct proportion to the num-

ber of reactions. Where the number of reactions is small, this will not be partic-

ularly signiﬁcant, but it does become an issue with large numbers. Figure 7.3 il-

lustrates the scaling effect via our implementation, using sets of randomly gener-

ated reactions. This is why others have sought to improve the scalability of the

SSA.

7.1.3 Java Implementation of the Direct Method

In order to illustrate some of the practical issues associated with the Gillespie and

related algorithms, we will present some Java implementations of them. Even if the

reader has no intention of ever implementing these algorithms from scratch, this

material will still provide insights into their different characteristics and properties,

which will contribute to an overall better basis for making decisions over how to

choose between them for a particular simulation task.

7.1 The Gillespie Algorithms 277

Fig. 7.3 Mean runtime in milliseconds for 1,000,000 iteration steps of an implementation of Gille-

spie’s direct method. Reaction sets were randomly generated

Consider a chemical volume containing two distinct molecular species of inter-

est, X and Y . In our models, we use two instances of a Molecule class to represent

these species, with each instance storing a count of the number of molecules of that

species present in the volume at a particular time. These counts will be adjusted

repeatedly as reactions that affect them take place.

In our implementation, a chemical reaction of the form

X →Y (7.3)

is represented as an instance of the Reaction class. Instances of this class store

separate lists of Reactant and Product objects (Code 7.1). Reactant and

Product objects are wrappers for Molecule objects. As part of their wrapper

function they include a multiplicity attribute, indicating how many molecules

of the wrapped species is involved as a reactant or product in a particular reaction.

Multiplicity cannot be represented at the level of a Molecule because a species may

well occur with different multiplicities in different reactions within a single reaction

set. So, the following reaction

X +2Y →X +Z (7.4)

would be represented as a Reaction object with a list containing two Reactant ob-

jects another list containing two Product objects. The Reactant object wrapping

the Y Molecule would have a ﬁeld recording a multiplicity of two. Note that the

278 7 Simulating Biochemical Systems

/**

* Model a single reaction in terms of

* molecular reactants and products.

public class Reaction

{

private List<Reactant> reactants;

private List<Product> products;

...

/**

* React by adjusting quantities of reactants and products.

public void react()

{

for(Reactant r : reactants) {

r.consume();

}

for(Product p : products) {

p.produce();

}

Code 7.1 The react method of the Reaction class

same Molecule may appear as both a Reactant and a Product in a single reaction,

for instance when it acts catalytically. Reaction’s react method adjusts molecular

counts by consuming reactants and producing products.

Code 7.2 shows how we have mapped the outline of Fig. 7.1 to the code of a

run method. The initModel method creates the model’s Molecule, Reactant,

Product and Reaction objects and passes them to a ReactionSet object, which

stores all reactions in a list. Code 7.3 shows an unoptimized ReactionSet with

its findReaction method, which implements the search for μ from (7.2).

As well as outlining the theoretical basis for his approach, Gillespie illustrated

its efﬁcacy with a number of interesting sets of reaction sets, and his original papers

are well worth reading as superbly elucidated combinations of theory and practice

in this area. We have based a couple of example reactions sets on his, in order to

illustrate our Java implementation.

7.1.4 A Single Reaction

Consider a reaction set containing the single reaction

→Y (7.5)

7.1 The Gillespie Algorithms 279

/**

* Run the model until the stop time,

* or no more reactions are possible.

public void run()

{

int step = 0;

ReactionSet system = new ReactionSet();

// Initialize the particular model.

initModel(system);

// Obtain the initial cumulative propensity.

double aZero = system.calculateAZero();

time = 0;

while(time < stopTime && aZero != 0.0) {

// Calculate the time to the next reaction.

double deltaT = (1.0 / aZero *

Math.log(1.0 / Math.random()));

time += deltaT;

Reaction r = system.findReaction(Math.random() * aZero);

r.react();

step++;

if(step % samplingRate == 0) {

system.showStatus();

}

aZero = system.calculateAZero();

}

Code 7.2 Main iteration of Gillespie’s direct method

which represents the isomerization of molecule X to the form Y witharateofk.In

the stochastic model we replace the reaction rate k with the stochastic reaction con-

stant c. Code 7.4 shows the way in which the model is set up in terms of Molecule

and Reaction objects, reactants and products. We assume that, at the start of the sim-

ulation, there are 1000 molecules of X and the stochastic constant is 0.5. Figure 7.4

contains a plot of the number of molecules of X against time from a single run of

this model. While the curve roughly follows the expected shape of X

exp(−ct),it

is worth noting that it is exactly this sort of variation from the continuous function,

due to integral particle numbers and stochasticity, that we are looking for from the

SSA.

7.1.5 Multiple Reactions

It is obviously of greater interest to investigate reaction sets with more than a sin-

gle reaction. Consider the following set of three arbitrary reactions involving ﬁve

280 7 Simulating Biochemical Systems

/**

* Model a set of reactions.

public class ReactionSet

{

private List<Reaction> reactions;

...

/**

* Find the required reaction via the given propensity.

public Reaction findReaction(double aValue)

{

double sum = 0;

for(Reaction r : reactions) {

sum += r.getA();

if(sum >= aValue) {

return r;

}

throw new RuntimeException("Internal error.");

}

Code 7.3 Reaction selection in an unoptimized implementation of Gillespie’s direct method

/**

* Set up the model with a single isomerization reaction.

private void initModel(ReactionSet system)

{

Molecule x = new Molecule("X", 1000);

Molecule y = new Molecule("Y", 0);

// Specify the reaction constant.

Reaction r = new Reaction(0.5);

r.addReactant(x);

r.addProduct(y);

system.addReactionToSet(r);

}

Code 7.4 Setting up the model for the reaction X → Y

molecules:

A +

→C

A +D

→E (7.6)

C +E

→2A +D

7.1 The Gillespie Algorithms 281

Fig. 7.4 Isomerization of X →Y with reaction constant c =0.5andX

=1000. One point plotted

every 5 iterations

Following Gillespie, we use the over bar to indicate a boundary species, B, meaning

that the size of the species remains constant; the use of boundary species could

be relevant, when, for example, the level of a molecule is constantly replenished.

Figure 7.5 shows the variations in molecular numbers of A, C, D and E over a single

run of about 2 million iteration steps. Under the conditions of this particular run,

the system is reasonably stable after an initial settling down period. In general, the

steady-state ranges of A, C, D and E will vary depending on the reaction constants

used, and some combinations will lead to the exhaustion of some species.

7.1.6 The Lotka-Volterra Equation

An interesting example of dynamic multi-species interaction to study via stochastic

modeling is that described by the Lotka-Volterra equation, whose general form is

given in (7.7), where the b

i,j

are real numbers (positive or negative).

˙y





j=1

i,j



,i=1,...,n (7.7)

It describes the interaction between n species whose population numbers are de-

scribed by the y

values. Species i grows at rate r

and interactions between pairs

282 7 Simulating Biochemical Systems

Fig. 7.5 A single run of the reaction set in (7.6) with reaction constants c

= 0.12, c

= 0.08,

=0.5 and initial levels of 1000 molecules for all species. One point plotted every 10,000 itera-

tions

of species are determined by the values of b

i,j

. The simplest case of two species is

often characterized as a model of predator and prey interaction (7.8).

˙y

1,2

), r

> 0,b

1,2

< 0

˙y

2,1

), r

< 0,b

2,1

> 0

(7.8)

Species y

is the prey. The value of r

represents population growth in the presence

of sufﬁcient food supply and b

1,2

represents predation proportional to predator

numbers. Conversely, species y

is the predator, with r

standing for death rate and

2,1

for population growth proportional to prey numbers.

SSA is an ideal technique for modeling these interactions and there are various

ways in which the two-species version could be encoded, for instance:

X +Y 1 → 2Y 1 Prey reproduction from undiminishing food supplies

Y 1 +Y 2 → 2Y 2 Prey death and predator reproduction through predation

Y 2 → Z Predator death

where species X represents a food supply that remains undiminished and Z rep-

resents a death state for predators. Note that in this formulation death of a single

prey leads to birth of a single predator. In fact, it isn’t necessary to represent X