b. Reverse Turns Are Characterized by a Minimum
in Hydrophobicity Along a Polypeptide Chain
The positions of reverse turns can also be predicted by
the Chou–Fasman method. However, since a reverse turn
usually consists of four consecutive residues, each with a
different conformation (Section 8-1D), their prediction al-
gorithm is necessarily more cumbersome than those for
sheets and helices.
Rose has proposed a simpler empirical method for pre-
dicting the positions of reverse turns. Reverse turns nearly
always occur on the surface of a protein and, in part, define
that surface. Since the core of a protein consists of hy-
drophobic groups and its surface is relatively hydrophilic,
reverse turns occur at positions along a polypeptide chain
where the hydropathy (Table 8-6) is a minimum. Using
these criteria for partitioning a polypeptide chain, we can
deduce the positions of most reverse turns by inspection
(Fig. 9-27). Since this method often predicts reverse turns
to occur in helical regions (helices are all turns), it should
be applied only to regions that are not predicted to be
helical.
c. Physical Basis of ␣ Helix Propensity
Why do amino acid residues have such different
propensities for forming ␣ helices? This question has been
answered, in part, by Matthews through the structural and
thermodynamic analysis of T4 lysozyme (Section 9-1Bd) in
which Ser 44, a solvent-exposed residue in the middle of a
12-residue (3.3-turn) ␣ helix, was mutagenically replaced,
in turn, by all 19 other amino acids. The X-ray structures of
13 of these variant proteins revealed that, with the excep-
tion of Pro, the substitutions caused no significant distor-
tion to the ␣ helix backbone and, hence, that differences in
␣ helix propensities are unlikely to arise from strain. How-
ever, for 17 of the amino acids (all but Pro, Gly, and Ala),
the stability of the ␣ helix increases with the amount of side
chain hydrophobic surface that is buried (brought out of
contact with the solvent) when residue 44 is transferred
from a fully extended state to an ␣ helix. The low ␣ helix
propensity of Pro is due to the strain generated by its pres-
ence in an ␣ helix, and that of Gly arises from the entropic
cost associated with restricting this most conformationally
flexible of residues to an ␣ helical conformation (compare
Figs. 8-7 and 8-9) and its lack of hydrophobic stabilization.
The high ␣ helix propensity of Ala, however, is caused by
its lack of a ␥ substituent (possessed by all residues but Gly
and Ala) and hence the absence of the entropic cost associ-
ated with conformationally restricting such a group within
an ␣ helix together with its small amount of hydrophobic
stabilization.
d. Computer-Based Secondary Structure
Prediction Algorithms
A number of sophisticated computer-based secondary
structure prediction algorithms have been developed. Most
of them, like the Chou–Fasman method, employ sets of pa-
rameters whose values are determined by the analysis of
(learning from) a set of nonhomologous proteins with
known structures, in some cases coupled with energy mini-
mization techniques. These algorithms are typically ⬃60%
accurate in predicting which of three conformational
states, helix, sheet, or coil, a given residue in a protein
adopts. However, a significant increase in accuracy has
been gained (to over 80%) by employing evolutionary in-
formation through the use of multiple sequence align-
ments. This is because knowledge of the distribution of
residue identities at and around each position in a series of
homologous and presumably structurally similar proteins
provides a better indication of the protein’s structural ten-
dencies than does a single sequence.
Several secondary structure prediction algorithms are
freely available over the Web. Among them is Jpred3
(http://www.compbio.dundee.ac.uk/www-jpred/), which clas-
sifies residue conformations as being either helical (H),
extended/ sheet (E), or coil (⫺) with 81.5% reliability. It
requires as input either the sequence of a single polypep-
tide or a multiple sequence alignment. However, if Jpred3
is supplied with only a single sequence, it will first use PSI-
BLAST (Section 7-4Bi) to construct a multiple sequence
alignment.
Although we have seen that secondary structure is
mainly dictated by local sequences, we have also seen that
tertiary structure can influence secondary structure (Sec-
tion 9-1Be). The inability of sophisticated secondary struc-
ture prediction schemes to surpass ⬃80% reliability is
therefore partially explained by their failure to take terti-
ary interactions into account.
B. Tertiary Structure Prediction
The sequence databases (Section 7-4A) contain the sequences
of ⬃7 million polypeptides, and the rapid rate at which entire
genomes are being sequenced (Section 7-2C) promises that
many more such sequences will soon be known. Yet, only a
small fraction of the ⬃70,000 protein structures in the PDB
(Section 8-3B) are unique because many of them are of the
same protein binding different small molecules, mutant forms
of the same protein, or closely related proteins. Moreover,
around 40% of the open reading frames (ORFs; nucleic acid
sequences that appear to encode proteins) in known genome
sequences specify proteins whose function is unknown. Con-
sequently, formulating a method to reliably predict the native
structure of a polypeptide from only its sequence is a major
goal of biochemistry. In the following paragraphs we discuss
the progress that has been made in achieving this goal.
There are currently several major approaches to tertiary
structure prediction. The simplest and most reliable ap-
proach, comparative or homology modeling, aligns the se-
quence of interest with the sequences of one or more ho-
mologous proteins of known structure, compensating for
amino acid substitutions as well as insertions and deletions
(indels) through modeling and energy minimization calcu-
lations. For proteins with as little as 30% sequence identity,
this method can yield a root-mean-square deviation (rmsd)
between the predicted and observed positions of corre-
sponding C
␣
atoms of the “unknown” protein (once its
structure has been determined) of as little as ⬃2.0 Å. How-
ever, the accuracy of this method decreases precipitously
304 Chapter 9. Protein Folding, Dynamics, and Structural Evolution
JWCL281_c09_278-322.qxd 2/24/10 1:17 PM Page 304