276 9 Protein Secondary Structure Prediction
1. From the training set Ω
1
, we can find the joint frequency and joint fre-
quency distribution table p(s, t, r; i, j, k) of (9.4). The corresponding con-
ditional probability distribution, the table of Models I, II, and III in (9.12)
is then obtained.
2. If the above informational and statistical threshold series prediction is
used, when θ
1
= θ
2
= θ
3
=0.70, the correct rate can be 4–5% higher
than that obtained using MLE prediction. If the values of θ
1
, θ
2
,andθ
3
are adjusted constantly, the correct rate may be increased still further.
However, the best prediction results have not yet been obtained. An over-
all introduction to the other algorithms in protein secondary structure
software packages may be found in [79].
3. Secondary structure prediction is a complicated problem in the area of
informational statistics. In the algorithms above, it is not only related to
the choice of the parameters θ
1
, θ
2
,andθ
3
, but also to the division Ω
1
and Ω
2
of the database Ω. Some sources in the literature set Ω
1
and Ω
2
to be the same as Ω, which will greatly increase the nominal prediction
accuracy. However, in view of statistics, this is unreasonable, and therefore
having it extended is meaningless.
4. Some of the secondary structure predictions add other protein information
besides that contained in the PDB-Select database (such as information on
the biological classification) in order to improve prediction accuracy. For
example, the jackknife testing and multiple sequences alignment methods
are used for this reason.
The Jackknife Test
The jackknife test uses a statistical testing method where:
1. Ω = {1, 2, ··· ,m} is the PDB-Select database, in which i = A
i
=(E
i
,F
i
),
where
E
i
=(e
i,1
,e
i,2
, ··· ,e
i,n
i
) ,F
i
=(f
i,1
,f
i,2
, ··· ,f
i,n
i
) (9.19)
are the primary and secondary structure of protein i, respectively.
2. Ω
1
and Ω
2
are two sets of proteins, where Ω
2
= {i},andΩ
1
= Ω − Ω
2
.
Ω
1
is the training set, and Ω
2
is the testing set.
3. We consider the set Ω
1
, and give a two-dimensional sequence
Ω
1
=((e
1,1
,f
1,1
), (e
1,2
,f
1,2
), ··· , (e
1,n
0
,f
1,n
0
)) , (9.20)
where n
0
= ||Ω
1
||.
4. Using the calculations on Ω
1
, the primary structure of protein Ω
2
,and
the predicted secondary structure of Ω
2
, the prediction result of ISIA is
ˆ
F
i
=
ˆ
f
i,1
,
ˆ
f
i,2
, ··· ,
ˆ
f
i,n
i
. (9.21)