92 3. Protein Structure Introduction
Silk is another example of a fibrous protein with a repetitive sequence. The
product of many insects and spiders, silk is the polypeptide β-keratin composed
largely of glycine, alanine, and serine residues, with smaller amounts of other
amino acids such as glutamine, tyrosine, leucine, valine, and proline. The softness,
flexibility, and high tensile strength of silk stems from its unique arrangement of
loose hydrogen bonding networks in the form of β-sheets connected by β-turns, a
mixture of both highly-ordered and less densely-packed regions. Figure 3.9 shows
a model of the repetitive β-sheet network of silk (without connecting regions).
3.3.3 Emerging Patterns from Genome Databases
As genome sequencing projects are completed, interesting findings on enzyme
sequences also emerge. For example, the genome of the tuberculosis bacterium
(completed in 1998 by the Wellcome Trust Genome Campus of the Sanger
Institute in collaboration with the Institut Pasteur in Paris) revealed surprisingly
that, unlike other bacteria, repetitive gene families of glycine-rich proteins exist
in M. tuberculosis; these approximately 10% of the enzyme-coding sequences are
associated with gene families involved in anaerobic respiratory functions.
3.3.4 Sequence Similarity
Sequence Similarity Generally Implies Structure Similarity
As mentioned above, sequence similarity generally implies structural, func-
tional, and evolutionary commonality. Thus, for example, if we were to scan
the Protein Databank (PDB) randomly and find two proteins with low se-
quence identity (say less than 20%), we could reasonably propose that they also
have little structural similarity. Such an example is shown in Figure 3.12 for the
cytochrome/barstar pair. Similarly, large sequence similarity generally implies
structural similarity (see introduction in 2.1.2 of Chapter 2).
In general, small mutations (e.g., single amino acid substitutions) are well
tolerated by the native structure, even when they occur at critical regions of sec-
ondary structure. The small protein Rop (Repressor of primer), which controls
the mechanism of plasmid replication, provides an interesting subject to both this
sequence-implies-structure paradigm, and to exceptions to this rule (discussed
below).
Rop is a dimer, with each monomer consisting of two antiparallel α-helices
connected by a short turn; it dimerizes to form a four-helix bundle as active
form, as shown in Figure 3.10. (Fold details and motifs are discussed in the next
chapter). Recall that Rop was used as the basis for solving Paracelsus challenge
(Chapter 2) because the α-helix motif was thought to be quite stable.
The high stability of Rop emerged surprisingly from experiments of Castagnoli
et al. [205]. When these researchers deleted just a few residues in a key turn region
that produces the overall bundle fold in the native Rop structure, they expected one
long contiguous helix to form. Instead, their tinkering produced a small variation