Barnes D.J., Chu D. Introduction to Modeling for Biosciences

Подождите немного. Документ загружается.

6.2 Partition Functions 233

free energies as well, we obtain:

Z = Z



N −1

P −1



exp (−G

−(P −1)G

)



N −1



exp (−PG

)

(6.32)

Here Z

is the number of micro-states compatible with the speciﬁc binding site

being bound. There is exactly one way in which a single site can be occupied, hence

we need not do anything. To calculate the number of conﬁgurations to distribute

the remaining P −1 TFs over the remaining N −1 binding sites we again use the

binomial coefﬁcient and the exponential terms formulate the relevant weights. The

second part of the partition function, Z

, expresses all the possible ways to distribute

P TFs over N − 1 non-speciﬁc sites. The probability of the speciﬁc binding site

being occupied is then Z

/Z.

6.11 Where in the derivation did we use the assumption that the individual TFs are

indistinguishable?

If there is more than one speciﬁc site then the partition function needs to be

extended. The idea is essentially the same as in the case of a single site, but we

need to sum over all possible occupation states of the speciﬁc sites, that is we must

consider the case of no speciﬁc site being occupied, exactly 1 site being occupied,

exactly 2 sites, and so on. Continuing with the restricted case that we have only 2

speciﬁc sites we obtain for our partition function.

Z = Z





N −2

P −2



exp

(

−2G

−(P −2)G

)





N −2

P −1



exp

(

−G

−(P −1)G

)





N −2



exp

(

)

(6.33)

The case here is analogous to (6.32). The term Z

counts all the cases where exactly

two speciﬁc sites are occupied. Altogether there are only





= 1 way to occupy

two speciﬁc sites with two TFs, but there are



N−2

P −2



ways to distribute the remain-

ing P −2 TFs over the non-speciﬁc sites. The case for Z

and Z

can be argued

similarly.

6.12 Using the partition function in (6.33) calculate the probability that all 2 speciﬁc

binding sites are occupied.

234 6 Other Stochastic Methods and Prism

6.13 (advanced) Write down the version for the partition function (6.33) for the case

when we distinguish between every TF, and show that the probabilities for various

scenarios (e.g., “exactly one speciﬁc binding site occupied”) are unaffected.

The special case of 2 speciﬁc sites in (6.33) suggests already how the formula

can be extended to an arbitrary number of N

speciﬁc sites, when there are overall

N binding sites. (Of course we assume that N

<P <N.) The pattern is the same.

We need to sum over all possibilities to occupy the N

sites. There are always





possibilities to occupy exactly k of the speciﬁc sites. To obtain the total number of

conﬁgurations compatible with k binding sites being occupied, we need to multiply

the binomial coefﬁcient with the number of ways to distribute P −k TFs over N −

binding sites. Taking into account the weights as well, we obtain for the partition

function:

Z =



k=0





N −N

P −k



exp

(

−kG

−(P −k)G

)

(6.34)

As usual, the probability of, for example, exactly one speciﬁc binding site being

occupied is given by

P(exactly 1) =





N −N

P −1



exp

(

−G

−(P −1)G

)

The next question to consider is how different types of TFs can be included.

Different types of TFs will bind non-speciﬁcally to other types’ speciﬁc sites. To

simplify the problem, we will assume two different species of TF and only be in-

terested in speciﬁc binding of one type. To formulate the new partition function, we

need to extend (6.34) to take into account the additional possibilities of binding. By

this we mean that only type-1 TFs have speciﬁc binding sites, whereas type-2 TFs

bind non-speciﬁcally to all sites. Essentially, the approach is the same again. We

ﬁrst distribute the TFs of type 1 to their speciﬁc sites. Then we distribute the type-1

TFs over the non-speciﬁc sites, just as in (6.34). Finally, we distribute the type-2

TFs over the remaining sites. Since we assume they bind to all sites with the same

preference (or rather binding free energy) we do not need to distinguish between

speciﬁc and non-speciﬁc sites there. To get to our partition function we assume that

there are P

TFs of type 1, P

of type 2. The latter bind to all sites with a binding

free energy of G

whereas the former bind to non-speciﬁc sites with G

Z =



k=0





N −N

−k



N −P



exp(−kG

−(P

−k)G

−P

) (6.35)

6.14 Show that the probability of k type-1 TFs binding to speciﬁc binding sites

does not depend on the number of type-2 TFs.

6.15 (advanced) Formulate conditions that will change the result in the previous

exercise, i.e., ﬁnd conditions when the probability of binding to speciﬁc binding

sites does depend on other types of TFs.

6.2 Partition Functions 235

6.2.3 Codon Bias in Proteins

This statistical reasoning can also be used to analyze the codon bias in amino acids.

The problem there is as follows. Every protein consists of a number of amino acids;

each of which is, in turn, encoded by a sequence of three bases of the genetic code.

For example, Met (or Methonine) is encoded by the triplet ATG. Unlike Met, most

amino acids are encoded by more than one triplet, because the genetic code is de-

generate. As it turns out, however, the different codons are not equivalent. There

is a statistically signiﬁcant bias in the usage frequency of individual codons for a

speciﬁc amino-acid, which varies from species to species. The underlying reason

for this bias seems to be that the tRNAs speciﬁc to a particular triplet of the ge-

netic code are not equally abundant either. Instead, some of the tRNAs are more

frequent than others. The more frequent a tRNA, the faster the relevant amino acid

can be incorporated into the growing protein by the ribosome. Therefore, the usage

of different codons has implications for the time it takes to translate a protein.

It turns out that not all amino-acids are encoded by the fastest codons. As a re-

sult, the majority of proteins are expressed with medium speed, while there are a

few highly optimized ones which use mostly very fast/abundant codons. Similarly,

very few proteins are coded for by predominantly rare codons. There are many bio-

logically aspects to this, but some initial understanding of the system can be reached

by considering simple models based on the partition functions.

To start, let us assume that every protein is under some selection pressure to be

expressed rapidly. This selection pressure will vary from protein to protein. Gener-

ally, of course, the faster a protein is expressed, the better. However, since proteins

are expressed simultaneously there is a competition for tRNA between them. In-

creasing the speed of one must decrease the speed of the other. Selection pressure

itself is not directly measurable, but within the framework of the partition function

we can model it as a preference, as in the case of the king and queen above.

Let us consider a protein of some length and consider a single amino acid within

the protein. Assume that the protein has N copies of this amino acid. Further assume

that this amino acid has n

different codons. Let us (arbitrarily) designate the ﬁrst

codon as the fastest, i.e., the codon that has the highest number of tRNA and assign

to it some preference value G

=a. To simplify matters, let us now assume that the

other codons are much rarer than the ﬁrst one and we assume they have, collectively,

a preference value of G

= b. The idea underlying the model is as follows: Over

evolutionary time scales, random mutations will lead to a random walk between the

individual codons. However, over time, those codons that are faster will be preferred

(by how much they will be preferred is expressed as a), which can be interpreted

as an evolutionary selection pressure. Given this model we can then ask about the

probabilities of observing various possible conﬁgurations.

At this level, the model reduces to understanding the probability of various

macro-states. Here the macro-states are deﬁned as by the number of amino acids,

k, that are encoded by the most frequent codon. We formulate the partition func-

tion by considering the number of conﬁgurations that are compatible with exactly k

amino acids being encoded by the most frequent codon. Formally, this is the same

236 6 Other Stochastic Methods and Prism

problem as distributing k TFs over N binding sites and is thus given by





.Forthe

remaining amino acids, we can then choose, at random, one of the remaining n

−1

codons. Altogether, we thus obtain the partition function.

Z =



k=0





−1)

N−k

exp(−ka −(N −k)b) (6.36)

From this we can obtain the distribution of the codons for this particular protein as

above.

6.16 Write down the formula that all amino acids are of the optimal type. Find some

plausible example values for n

,k,N and compute the results numerically.

6.3 Markov Chains

In Sect. 6.2 we asked about the probability of ﬁnding the king, the queen, and their

courtiers in various rooms of their palace. Using methods from statistical physics,

we can derive the probability of seeing n people in a room, if we were to take a

snapshot of this room at one instant in time. What this approach does not tell us

is, for how long people remain in the rooms, nor does it provide us with any hints

as to which room they are likely to go to if they leave a speciﬁc room. What is

more, the partition function method only tells us about the long-term probabilities.

When modeling stochastic processes we often have (or assume) complete informa-

tion about the state of the system at some time t, which constrains the possible states

the system can take in the near future of t.

To make this more concrete, let us concentrate on the king himself in our court

example. We could imagine that, if the king dwells in one of the dining rooms of

the palace, then he is likely to remain there for about an hour or so during his lavish

meals. After eating he often takes a nap, so it is likely that he will move from the

dining room straight into his bedroom. Sometimes, though, he decides to tend to

matters of state and will meet the prime minister in the appropriate room instead

of taking a nap. Similarly, after getting up in the morning, he enjoys exercising by

viewing the paintings of the queen’s ancestors around the building before taking

breakfast.

If we were to observe the behavior of the king over many days, then we would

be able to measure these regularities and tabulate the probabilities of him going to

room j given that he is currently in room i. Now, to some extent, these probabilities

are constrained by the ﬂoor plan of the palace. For example, there is no way to go to

the palace kitchen from the bedroom of the king without passing through a few other

rooms ﬁrst; equally the washing-room has only one door, directly into the bedroom

of the king. Consequently, the probability of passing from the bathroom into the

kitchen in one step must be zero.

6.3 Markov Chains 237

Altogether, the probabilities of passing from one room into another can be for-

malized in a transition matrix, W.

W =

⎛

⎜

⎝

0.50.00.00.40.1

0.10.10.10.40.3

0.20.00.10.00.7

0.00.30.50.10.1

0.00.00.00.10.9

⎞

⎟

⎠

(6.37)

Here we understand the elements of the matrix as follows: The element w

of the

matrix W gives us the probability that the king is in room i and will next go to room

j . For example, assuming the king is in room 1 then the probability that he next

goes to room 4 is 0.4. Similarly, the probability that he remains in room 1 is 0.5.

Consistent with the meaning of the matrix, we must require that all rows sum to 1.

Implicit in the transition matrix of (6.37) is the assumption that time is discrete.

By this we mean that decisions as to whether the “state” of the system changes or

not (that is whether the king remains in the room or goes to a different one) are

taken in ﬁxed time intervals. Time can then be counted by integer numbers, i.e.,

t =0, 1, 2,... This is, of course, not a realistic assumption. In reality, the dwelling

times of the king in rooms will be drawn from some continuous distribution, not

from a discrete one. For example, he may stay in a room for 12.0985 time units. Our

model does not allow this, at least not for the moment. However, we can approximate

continuous dwelling times to an arbitrary degree of accuracy by making the discrete

steps small. For the moment we will not worry about that, but simply accept that this

is a discrete time Markov chain model; we will consider continuous time Markov

chain models later.

6.17 Draw a ﬂoor plan consistent with the ridiculously small palace implied by

(6.37).

6.18 Justify in detail why the rows of a transition matrix of a discrete time Markov

chain must sum to 1?

It is crucial to note that the transition matrix W does not give us the probability

of ﬁnding the king in a particular room, but just the probabilities that the king moves

to a room j given that he is in room i. It is possible to obtain the actual probabilities

of dwelling in a room from W, but this requires some work. The good thing about

the transition matrix W is that, unlike the partition function ansatz, it can provide

us with a much more detailed picture of the stochastic properties of the system we

model.

To simplify the language let us now talk more abstractly about systems and their

states, and the probabilities of state transitions, rather than about the probability of

the king going from one room to the next. We identify the “state” of our stochastic

system with the room in which the king is. So, if we say the system is in state 1,

then this means the king is in room 1. To describe the system, we need N labels,

corresponding to the N different rooms in which the king may be. We could now

238 6 Other Stochastic Methods and Prism

extend this and include the queen into our model. In this case we would have to talk

about the king being in this room and the queen being in that room and the prob-

abilities of them walking from one room to the next. Clearly, this extended system

has more states, and requires more labels. A state would no longer be identiﬁed with

the room in which the king is dwelling, but with the pair of rooms. If there are N

rooms, then the state space would be N

, not N as in the case of the king alone.

The corresponding transition matrix would then be an N

× N

matrix formulat-

ing all the possible transition probabilities of the system. This shows that the size

of Markov chain models can grow very quickly as entities are added. Clearly, with

every additional person that is included, the state space grows—dramatically so.

Transition matrices such as these implicitly contain the assumption that the tran-

sition from one state to the next does not depend on how the system got into a state.

In other words, the history of the system is irrelevant for the future. All that counts

is the current state and the transition probabilities. In the context of our example,

this means that the probabilities of the king going from the dining room to the bed-

room does not depend on whether he has been in the ballroom before or in the

changing room before entering the dining room. This assumption of independence

of a stochastic process from the past is the deﬁning property of Markov processes.

Luckily for modelers, the Markov property signiﬁcantly simpliﬁes the mathematics

and is also, in most cases, a reasonable assumption to make when modeling natural

systems.

Let us now see what we can do with the transition matrix and what information

we can extract from it. For this purpose, we return again to the simple example of

the king walking through his palace. First we consider the transient probabilities of

the system given by the transition matrix (6.37), that is the probabilities not too long

after a time when the state of the system was known. The simplest case is to assume

that, at a time t =0, the system is in a speciﬁc state, say that the king is in room 1.

Where will he be next? In fact, we do not even need to calculate this, because a

simple look at the W reveals that he will be in room 4 and 5 with the respective

probabilities 0.4 and 0.1, and in room 1 with probability 0.5. We can also calculate

this, if we wish. To do so, we must introduce the state vector. As the name suggests,

this is a vector that tells us in which state the system is at a given time. The entries

of the vector should sum to one, so the entries of the vector can be interpreted as

giving a probability of being in a particular state. For example, the state that the king

is in room 1 can be represented by S



10000



Similarly, the state vector S

represent that the king is in room 2 or 4 with probabil-

ity 0.5.



00.500.50



6.3 Markov Chains 239

To obtain the state-vector S

at the next time step, t =1, we simply compute the

product of S

and W.

= S

W =



10000



⎛

⎜

⎝

0.50.00.00.40.1

0.10.10.10.40.3

0.20.00.10.00.7

0.00.30.50.10.1

0.00.00.00.10.9

⎞

⎟

⎠



0.5000.40.1



This yields the expected result and conﬁrms what we have already known, namely

that, at time t =1, the king is in rooms 4 and 5 with the respective probabilities of

0.4 and 0.1. This result also gives us the state vector for t = 1, which we call S

.We

can use this to continue computing the state vector at time t =2, 3,....

W =



0.25 0.12 0.20.25 0.18



This tells us that, at time t =2, the king might be anywhere in the palace, although

it is most likely that he is in room 1 or room 4, each of which has a probability

of 0.25. We can now continue this process indeﬁnitely, and compute various state

vectors, such as



0.04 0.04 0.07 0.11 0.72



and



0.03 0.03 0.06 0.11 0.74



Looking at these sequences of numbers, the reader might now wonder whether these

state vectors will eventually stabilize. Comparing S

with S

and S

suggests that

the probability of being in state 1 seems to decrease over time. On the other hand,

room 5 emerges as a preference for the king. While it can be very interesting to

numerically calculate how the probabilities change over time, one of the things we

may be interested in are the long-term, or steady state probabilities of the system,

if they exist at all. The steady state of the Markov chain model would describe

the probabilities of ﬁnding the system in a speciﬁc state long after the transient

effects of initial conditions have been “forgotten” by the system. This steady state

would correspond to the quantities we have calculated in Sect. 6.2 using the partition

function approach.

The good news is that, at least for the Markov chain model deﬁned by the tran-

sition matrix (6.37) there exists a unique steady state probability vector deﬁning the

long-term distribution of states. Intuitively, it is very easy to understand how this

long-term behavior can be obtained. The idea of steady state behavior is that it does

Here we truncate the values after two decimal places, which is why the state vectors do not sum

to exactly 1.

240 6 Other Stochastic Methods and Prism

not change when acted upon by the transition matrix. This is very much like in the

case of the steady state in the case of differential equations (see Chap. 4), where

the steady state was deﬁned by the state of the system when the time derivatives

vanish. In the case of Markov chains, the change operation is the application of the

transition matrix to the state vector. So, if the system is in a state S

∗

and if we apply

the transition matrix to it, i.e., if we calculate S

∗

W, then the probabilities should not

have changed. This is perhaps best expressed formally. The steady state probabili-

ties of the system are deﬁned by the vector S

∗

that fulﬁlls the steady state equation:

∗

W =S

∗

(6.38)

An equivalent form of this equation is:

∗

(W −I) =0 (6.39)

Here we use I as a shorthand for the identity matrix. Equation (6.39) is essentially

an eigenvalue problem. We can either solve it by hand or use a computer algebra

system to obtain the solution.

∗



0.03 0.03 0.06 0.11 0.74



This tells us that, based on the transition matrix deﬁned above, the king is most

likely to be in room 5, where he spends around 74% of his time.

6.3.1 Absorbing Markov Chains

Let us now change our matrix W to make it a so-called absorbing Markov chain.

W =

⎛

⎜

⎝

0.50.00.00.40.1

0.10.10.10.40.3

0.20.00.10.00.7

0.00.30.50.10.1

0.00.00.00.01.0

⎞

⎟

⎠

(6.40)

The relevant change with respect to our original transition matrix has taken place

in the last line, where we have altered the transition probabilities such that room 5

has only a single transition, namely to itself. With this modiﬁcation our example of

the king in the palace becomes a bit unrealistic, in that now room 5 becomes a sink;

once the king enters it, he will remain there for ever. Correspondingly, the steady

state is always given by the vector, S

∗

=(0, 0, 0, 0, 1).

Absorbing Markov chains can (and often do) involve more than a single absorb-

ing state. We could have constructed our system such that there is no way from room

4 and 5 to any of the other rooms, but free movement is possible between them and

into them; or we could have stipulated that rooms 3 and 1 are connected to each

6.3 Markov Chains 241

other but that there is no escape from them to any of the other rooms. All the tech-

niques we are going to present over the following pages are perfectly applicable to

such more general cases.

6.19 Calculate the steady state vector S

∗

for the Markov chain in (6.40).

Our absorbing transition matrix is of the general form

W =





(6.41)

Here the lower row corresponds to all the absorbing states. In the present case, we

have only one such absorbing state, hence in our case the lower row corresponds to

(0, C) =(0, 0, 0, 0, 1). The sub-matrix A corresponds to the following:

A =

⎛

⎜

⎝

0.50.00.00.4

0.10.10.10.4

0.20.00.10.0

0.00.30.50.1

⎞

⎟

⎠

Note that transition matrices of absorbing Markov chains can always be coerced into

the shape of (6.41), where the sub-matrix C summarizes the absorbing states simply

by renumbering states. As it turns out, the matrix A is of fundamental signiﬁcance.

In absorbing Markov chains it can be used to compute the average number of times

a speciﬁc state is visited before the absorbing state is entered. To be more precise,

let us deﬁne the so-called fundamental matrix Q

=(I −A)

−1

(6.42)

Here I is the identity matrix. The entry q

of this matrix is the average number of

times the system will be in state j given that it started in state i before the absorbing

state is entered. To make this concrete, we can compute this matrix for our modiﬁed

transition matrix W above.

⎛

⎜

⎝

2.37 0.41 0.73 1.23

0.53 1.39 0.63 0.85

0.52 0.09 1.27 0.27

0.47 0.51 0.91 1.54

⎞

⎟

⎠

This means that the king starting in room 1 will, on average, be expected to be in

room 2 ≈ 0.41 times before entering room 5. So, if we repeated a thought exper-

iment where we let the king start from the same room, then in just under every

second experiment we would observe him passing through room 2 on his way to the

absorbing state. This procedure generalizes to all absorbing Markov chains.

Using this fundamental matrix Q we can calculate another quantity that is often

of interest, namely the mean time before the absorbing state is entered. In discrete

time Markov chains the mean time is the mean number of steps. The fundamental

242 6 Other Stochastic Methods and Prism

matrix already contains the answer. Say we start in room 1, then the mean time

to enter room 5 would be the sum of the mean times to pass through rooms 2, 3,

and 4. We can obtain this by multiplying the fundamental matrix with a vector v

that contains only 1’s.

T =Qv

Using the numbers of our speciﬁc example with the king, we obtain the following

mean times before the king enters room 5.

T =

⎛

⎜

⎝

4.76

3.41

2.16

3.45

⎞

⎟

⎠

This tells us that, if we start from room 1 then, on average, it takes 4.7 time steps

before we reach the absorbing state.

6.20 Assume the system is in state (0.5, 0.5, 0, 0, 0). What is the mean time to reach

the absorbing state?

6.3.2 Continuous Time Markov Chains

So far we have assumed that our Markov chains are updated in discrete ticks of time.

At each time step the system makes at most one transition. The problem of such

discrete Markov chains is that they do not represent time very well. In real systems

events play out in continuous time. An event may happen at t =0.032 and another

0.327 time units later. Continuous time cannot be modeled exactly by discrete time

Markov chains. Discrete time models can approximate continuous time dynamics

to an arbitrary degree of precision by decreasing the transition probabilities, that is

by making every single time step correspond to smaller and smaller time quanta.

One can compare this to a series of pictures taken at, say, 1 second intervals. From

the series of pictures one can obtain a sense of the dynamics of the scenery, although

one would miss some faster movement. If one reduces the interval to half a second

then more detail will visible. As one increases the frequency with which pictures

are taken, at some point the series of pictures will be a very good approximation

of continuous motion. In fact, any movie is just a series of still pictures taken at

discrete time points.

The same procedure applies with discrete Markov chains. In order to arrive at

continuous time Markov chains, we start with a discrete chain and make a time-step

correspond to ever smaller units of real time. If we continue this process indeﬁnitely,

then we will eventually arrive at a continuous model.

Unlike discrete time Markov chains, the continuous version is not deﬁned by a

matrix of transition probabilities, but by transition rates; that is, transitions per time