272 9 Protein Secondary Structure Prediction
Their protein primary–secondary structures are denoted respectively by
Ω
1
= {(A
1
,B
1
), (A
2
,B
2
), ··· , (A
m
1
,B
m
1
)} ,
Ω
2
= {(C
1
,D
1
), (C
2
,D
2
), ··· , (C
m
2
,D
m
2
)} ,
(9.11)
where A
s
, C
t
are the primary structure sequences of the two proteins, re-
spectively, in databases Ω
1
,andΩ
2
,andB
s
, D
t
are the secondary structure
sequences of the above two proteins s and t in databases Ω
1
,andΩ
2
.Wethen
denote
Z
s
=(z
s,1
,z
s,2
, ··· ,z
s,n
s
) ,Z= A, B, C, D , z = a, b, c, d ,
for their sequence expression. Using the PDB-Select database, we take m
1
=
2765, m
2
= 500.
Table of Conditional Probability Distribution
From the training set Ω
1
, we calculate its conditional probability distribution
table, the types of which are
⎧
⎪
⎨
⎪
⎩
Model I: p[i|(s, t, r)],p[j|(s, t, r)],p[k|(s, t, r)] ,
Model II: p[i|(s, t, r, j)],p[j|(s, t, r, i)],p[j|(s, t, r, k)],p[k|(s, t, r, j)] ,
Model III: p[i|(s, t, r, j, k)],p[j|(s, t, r, i, k)],p[k|(s, t, r, i, j)] ,
(9.12)
where the tables of Model I are conditional probability distribution tables of
primary structures on secondary structures, while the tables of Models II and
III are conditional probability distribution tables of primary structures and
some secondary structures on other secondary structures. The sizes of Mod-
els I, II, and III are 8000 ×3, 24,000×4, and 72,000×3 matrices, respectively.
When Ω
1
is given, the joint probability distribution p(s, t, r; i, j, k) is deter-
mined, and all these conditional probability distributions can be determined
by the joint probability distribution p(s, t, r; i, j, k).
Maximum Likelihood Estimate Prediction
1. Maximum likelihood estimate (MLE) prediction uses the tables of Model I,
for instance, in p[i|(s, t, r)], for every fixed (s, t, r) ∈ V
(3)
20
, calculate the
max p[i|(s, t, r)] on i =1, 2, 3, denoted by i(s, t, r). Then
p[i(s, t, r)|(s, t, r)] = max{p[1|(s, t, r)],p[2|(s, t, r)],p[3|(s, t, r)]} . (9.13)
If the primary structure of the protein is A =(e
1
,e
2
, ··· ,e
n
), then its
predicted secondary structure is
ˆ
f
2
,
ˆ
f
3
, ··· ,
ˆ
f
n−2
(9.14)
=(i(e
1
,e
2
,e
3
),i(e
2
,e
3
,e
4
), ··· ,i(e
n−3
,e
n−2
,e
n−1
),i(e
n−2
,e
n−1
,e
n
)) ,