71
Protein Structure Modeling
400 for SwissProt and hence, it is expected that the average
protein will have two structural domains that must be
examined.
4. If no domains can be detected, one can resort to identifying
“block structures” in a multiple sequence alignment. The
multiple sequence alignment can be generated using blast or
PSI-BLAST from NCBI webpage, http://blast.ncbi.nlm.nih.
gov/Blast.cgi. Viewing the alignment of longer proteins
sometimes has a “blocky” appearance where one part of the
sequence has numerous homologs that do not cover the other
parts. These blocks are indicative of domains and thus putative
domains can be identified by the block boundaries.
5. The online databases are quite comprehensive, but newly
sequenced proteins are, for obvious reasons, not present.
However, because all the tools presented here are available via
web services, it is possible to model these proteins too.
6. There are also proteins that belong to protein families that
are less studied for which most of these techniques fail. Note
that the tools presented herein are dependent on knowing
something about homologs to the protein of interest.
References
1. Pacheco, B., Maccarana, M., Goodlett, DR.,
Malmström, A., Malmström, L. (2008),
Identification of the active site of DS-epimerase
1 and requirement of N-glycosylation for enzyme
function. J Biol Chem 2009 Jan 16; 284(3):
1741–7.
2. Berman, H., Henrick, K., Nakamura, H.,
Markley, JL. (2007), The worldwide Protein
Data Bank (wwPDB): ensuring a single, uni-
form archive of PDB data. Nucleic Acids Res 35:
D301–3 (pmid: 17142228).
3. Rohl, CA., Strauss, CE., Misura, KM., Baker, D.
(2004), Protein structure prediction using
Rosetta. Methods Enzymol 383: 66–93. (pmid:
15063647).
4. Eswar, N., Eramian, D., Webb, B., Shen, MY.,
Sali, A. (2008), Protein structure modeling
with Modeller. Methods Mol Biol 426: 145–59.
(pmid: 18542861).
5. Pieper, U., Eswar, N., Davis, FP., Braberg,
H., Madhusudhan, MS., Rossi, A., Marti-
Renom, M., Karchin, R., Webb, BM.,
Eramian, D., Shen, MY., Kelly, L., Melo, F.,
Sali, A. (2006), MODBASE: a database of
annotated comparative protein structure mod-
els and associated resources. Nucleic Acids Res
34: D291–5. (pmid: 16381869).
6. Simons, KT., Kooperberg, C., Huang, E.,
Baker, D. (1997), Assembly of protein tertiary
structures from fragments with similar local
sequences using simulated annealing and
Bayesian scoring functions. J Mol Biol 268:
209–25. (pmid: 9149153).
7. Das, R., Qian, B., Raman, S., Vernon, R.,
Thompson, J., Bradley, P., Khare, S., Tyka,
MD., Bhat, D., Chivian, D., Kim, DE.,
Sheffler, WH., Malmström, L., Wollacott,
AM., Wang, C., Andre, I., Baker, D. (2007),
Structure prediction for CASP7 targets using
extensive all-atom refinement with Rosetta@
home. Proteins 1: 118–28. (pmid:
17894356).
8. Shortle, D., Simons, KT., Baker, D. (1998),
Clustering of low-energy conformations near
the native structures of small proteins. Proc
Natl Acad Sci USA 95: 11158–62. (pmid:
9736706).
9. Riffle, M., Malmström, L., Davis, TN. The
yeast resource center public data repository.
(2005), Nucleic Acids Res 33: D378–82.
(pmid: 15608220).
10. Kim, DE., Chivian, D., Malmström, L., Baker,
D. (2005), Automated prediction of domain
boundaries in CASP6 targets using Ginzu and