2.2 Stochastic Models of Flow Raised by Sequence Mutations 43
1. The Bernoulli process
˜
ζ applies to discrete stochastic sequences and each
component ζ
i
=1, 0 designates whether or not a mutation happened at
position i. Thus, it is easy to understand it intuitively.
2. The crucial disadvantage of the Bernoulli process is that the probability
of its counting sequence η
n
=
n
i=1
ζ
i
is difficult to compute directly if n
is too large, because in the binomial distribution b(n, k; ), C
n
k
=
n!
k!(n−k)!
and
k
, and we are unable to calculate it exactly for large numbers. As
a result, Poisson flow is typically used to approximate Bernoulli processes.
3. When Poisson flow is used to approximate counting sequences, the chang-
ing of positions becomes continuous. A region is chosen with proper
length n
0
as its unit. The position n can be replaced by t =
n
n
0
.Thus,the
discrete position region {1, 2, 3, ···} becomes a continuous region (0, ∞).
If n
0
= λ,thenn = λt and the binomial distribution
b(n, k; )=C
n
k
k
(1 − )
n−k
∼ p
λt
(k)=
(λt)
k
k!
e
−λt
approximates the probability of the Poisson flow. This describes the prob-
ability of the number of times mutation occured in the integer region [1,n],
or a continuous interval (0,t).
2.2.3 Mutated Flows Resulting from the Four Mutation Types
The four types of mutation in biological sequences are defined in Sect. 1.2.1.
Consequently, there are four types of mutated flows corresponding to the four
mutation types. All four types of mutated flows can be denoted by Bernoulli
processes and Poisson processes.
Representation Using Bernoulli Processes
If we use a Bernoulli process
˜
ζ
τ
=(ζ
τ,1
,ζ
τ,2
,ζ
τ,3
, ···) ,τ=1, 2, 3, 4 , (2.39)
to represent the mutated flow, then
˜
ζ
τ
is also a Bernoulli process for each
τ =1, 2, 3, 4. ζ
τ,j
denotes the random variable that represents whether or not
the mutation type τ happened at position j.Let
τ
denote the strength of the
mutation type τ,then
P
r
{ζ
τ,j
=1} =
τ
,P
r
{ζ
τ,j
=0} =1−
τ
. (2.40)
Based on many calculations, we know that
2
,
3
,
4
1and
1
< 1/2within
the homologous sequence family. Following from
˜
ζ
τ
, τ =1, 2, 3, 4, we may
write the corresponding renewal process as follows:
v
∗
τ,n
=
n
j=1
ζ
τ,j
,τ=1, 2, 3, 4 , (2.41)