64 2 Stochastic Models of Mutations and Structural Analysis
be the stochastic model of E
∗
.IfT is a fixed sample
T =
˜z
τ
,
˜
τ
,v
τ
,τ=1, 2, 3, 4,τ
=1, 3
(2.110)
then E
= {
˜
ξ,T} is a hybrid model of the four types of mutations. We refer to
E
= {
˜
ξ,T} as the semistochastic model of mutated sequences. We will discuss
the structure and analysis of hybrid mutations T in the following chapters.
2.6 Exercises
Exercise 6. Explain the differences between the following terms: i.i.d. se-
quences, Bernoulli process, Poisson process, geometric distribution sequence,
additive sequence, Markov process, and renewal process. Explain the advan-
tages and disadvantages of using each of the above to describe biological se-
quences.
Exercise 7. Try to extend the application of the Bernoulli process and Pois-
son process to the case of nonhomogeneous sequences, and use it to describe
the model of type-III mutated sequences.
Exercise 8. List some of the important laws of large number and central limit
theorems in probability theory, and give examples showing how they apply to
biological sequences.
Exercise 9. Prove the following propositions:
1. Properties 1, 2, and 3 of the renewal process given in Sects. 2.2.1–2.2.3
2. Theorems 5 and 7.
3. Formulas (2.82), (2.86) (2.100), (2.101), and (2.102).
Exercise 10. For the stochastic sequences in model (2.103), perform a simu-
lation according to the following cases:
1. The range of the sequence lengths is 1 kbp–1 Mbp (i.e., n =1× 10
3
,
1 × 10
4
, 1 × 10
5
, 5 × 10
5
, 1 × 10
6
, etc.); the range of
1
is 0.01–0.4 (i.e.,
1
=0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, etc.); the range of
2
,
3
,
4
is 0.01–0.1
(i.e., 0.01, 0.02, ···, 0.1 etc.); and the range of p
1
–p
4
is 0.1–0.5 (i.e., 0.1,
0.2, 0.3, 0.4, 0.5, etc.).
2. For the sequence
˜
ξ in (2.103), construct the i.i.d. sequence defined on V
4
which obeys uniform distribution.
3. Create the stochastic sequences of model (2.103) according to the param-
eters given in case 1.
Exercise 11. Based on the simulation results from Exercise 10, align the mu-
tated sequences using the dynamic programming-based algorithm, and com-
pare the time taken by the CPU with the parameters listed as follows: