
pathways. Genealogical relations between species can be inferred from sequence differences between their proteins. We
can even estimate the time at which two evolutionary lines diverged, thanks to the clocklike nature of random mutations.
For example, a comparison of serum albumins found in primates indicates that human beings and African apes diverged
5 million years ago, not 30 million years ago as was once thought. Sequence analyses have opened a new perspective on
the fossil record and the pathway of human evolution.
3. Amino acid sequences can be searched for the presence of internal repeats. Such internal repeats can reveal
information about the history of an individual protein itself. Many proteins apparently have arisen by duplication of a
primordial gene followed by its diversification. For example, calmodulin, a ubiquitous calcium sensor in eukaryotes,
contains four similar calcium-binding modules that arose by gene duplication (Figure 4.28).
4. Many proteins contain amino acid sequences that serve as signals designating their destinations or controlling their
processing. A protein destined for export from a cell or for location in a membrane, for example, contains a signal
sequence, a stretch of about 20 hydrophobic residues near the amino terminus that directs the protein to the appropriate
membrane. Another protein may contain a stretch of amino acids that functions as a nuclear localization signal, directing
the protein to the nucleus.
5. Sequence data provide a basis for preparing antibodies specific for a protein of interest. Careful examination of the
amino acid sequence of a protein can reveal which sequences will be most likely to elicit an antibody when injected into
a mouse or rabbit. Peptides with these sequences can be synthesized and used to generate antibodies to the protein. These
specific antibodies can be very useful in determining the amount of a protein present in solution or in the blood,
ascertaining its distribution within a cell, or cloning its gene (Section 4.3.3).
6. Amino acid sequences are valuable for making DNA probes that are specific for the genes encoding the corresponding
proteins (Section 6.1.4). Knowledge of a protein's primary structure permits the use of reverse genetics. DNA probes that
correspond to a part of the amino acid sequence can be constructed on the basis of the genetic code. These probes can be
used to isolate the gene of the protein so that the entire sequence of the protein can be determined. The gene in turn can
provide valuable information about the physiological regulation of the protein. Protein sequencing is an integral part of
molecular genetics, just as DNA cloning is central to the analysis of protein structure and function.
4.2.3. Recombinant DNA Technology Has Revolutionized Protein Sequencing
Hundreds of proteins have been sequenced by Edman degradation of peptides derived from specific cleavages.
Nevertheless, heroic effort is required to elucidate the sequence of large proteins, those with more than 1000 residues.
For sequencing such proteins, a complementary experimental approach based on recombinant DNA technology is often
more efficient. As will be discussed in Chapter 6, long stretches of DNA can be cloned and sequenced, and the
nucleotide sequence directly reveals the amino acid sequence of the protein encoded by the gene (Figure 4.29).
Recombinant DNA technology is producing a wealth of amino acid sequence information at a remarkable rate.
Even with the use of the DNA base sequence to determine primary structure, there is still a need to work with isolated
proteins. The amino acid sequence deduced by reading the DNA sequence is that of the nascent protein, the direct
product of the translational machinery. Many proteins are modified after synthesis. Some have their ends trimmed, and
others arise by cleavage of a larger initial polypeptide chain. Cysteine residues in some proteins are oxidized to form
disulfide links, connecting either parts within a chain or separate polypeptide chains. Specific side chains of some
proteins are altered. Amino acid sequences derived from DNA sequences are rich in information, but they do not
disclose such posttranslational modifications. Chemical analyses of proteins in their final form are needed to delineate
the nature of these changes, which are critical for the biological activities of most proteins. Thus, genomic and proteomic
analyses are complementary approaches to elucidating the structural basis of protein function.