Some of Eric's Favorite
Protein Structure Literature
An annotated bibliography by
to accompany Protein Explorer
This document is on-line at http://proteinexplorer.org/favlit.htm
Please tell me about your favorite
protein structure papers, or suggest additions to this list:
- Dictionary of Bioinformatics and
Computational Biology (2004) edited by
John M. Hancock and Marketa J. Zvelebil,
Wiley-Liss, 636 pages.
Includes over 600 words, phrases, and concepts.
Most entries have the detail of encyclopedia entries,
and include weblinks as well as literature references.
- Structural Bioinformatics (2003) edited by
Philip E. Bourne and Helge Weissig,
Wiley-Liss, 649 pages.
Complete contents at
Reviewed by Eric Martz in Biochem.
Mol. Biol. Education 31:370-1, 2003.
A goldmine of fundamental and practical information:
- Section I introduces bioinformatics, fundamentals of protein
structure, crystallography, NMR, electron microscopy, and
visualization in a bit over a hundred pages.
- Section II: the PDB format, mmCIF, the Protein Data Bank,
the Nucleic Acid Database, and other databases in a bit over
a hundred pages.
- Section III: Structural Comparison of Proteins (SCOP), CATH
(Class, Architecture, Topology, Homology), structural quality assurance,
comparision and alignment in about a hundred pages.
- Section IV: assignment of secondary structure, identifying
domains, and inferring function from structure in about fifty pages.
- Section V: prediction of protein-protein interactions from
evolution, and electrostatic interactions in about twenty pages.
- Section VI: proteins as drug targets in about thirty pages.
- Section VII: protein structure prediction in about a hundred pages.
- Section VIII: twenty pages on structural genomics.
- Introduction to Bioinformatics - A Theoretical and Practical
Approach (2003) edited by
Stephen A. Krawetz and David D. Womble,
Humana Press. 746 pages plus CD.
- Part I, Biochemistry, Cell and Molecular Biology provides
an introduction to the underlying science in about a hundred pages.
- Part II, Molecular Genetics introduces genomics and
clinical human genetics in about a hundred pages.
- Part III, The UNIX operating system introduces unix and
Solaris, general management of bioinformatics servers, and GCG
in about a hundred pages.
- Part IV, Computer Applications introduces software
resources for DNA sequences, genome databases, multiple sequence
alignment, 3D visualization, and gene expression microarray tools
in about three hundred pages.
- Nature's Robots: A History of Proteins, Charles Tanford
& Jacqueline Reynolds (2001),
Oxford University Press. 304 pages.
A fascinating, meticulous and scholarly
account of the history of protein science
embroidered with details of the personalities and events in the lives
of those involved. Begins in the 18th century, before the word
"protein" was established, and traces protein science through
the discovery of the genetic code in the 1960's.
- Introduction to Protein Architecture -
The structural biology of proteins, Arthur M. Lesk (2000),
Oxford University Press. 360 pages, numerous illustrations.
- Introduction to Protein Structure, Carl Branden and
John Tooze, second edition (1999),
Taylor and Francis. 410 pages.
Lavishly illustrated and eminently readable. A wonderful overview
of protein structure.
- An introduction to hydrogen bonding, George A. Jeffrey,
Oxford University Press, 1997. 314 pages.
- See also
Journal Articles (most recent first)
- Articles on these topics have been gathered on separate pages:
Articles that don't fall in the above topics are listed below, most
Conformational diversity and protein evolution -- a 60-year-old hypothesis
Leo C. James and Dan S. Tawfik.
Trends Biochem. Sci. 28:361.
A "new view" of proteins is presented based on the notion that conformational
diversity for a single sequence may support multifunctionality, and that this in turn fosters relatively
rapid evolution of function from existing folds.
A spontaneous pre-equilibrium (pre-ligand equilibrium) may exist
between conformers supporting different functions (Monod-Wyman-Changeaux model).
Typically, ligand for a given function may stabilize the conformer that
binds it at higher affinity, thus pulling the equilibrium towards the
This contrasts with induced fit in which ligand binding induces the high-affinity
conformer de novo from a stable low-affinity form (Koshland-Nemethy-Filmer model).
Experimental evidence for pre-equilibrium conformational diversity, and its
links to functional diversity, are reviewed.
A pre-equilibrium, multifunction hypotheses proposed for antibodies
in the 1930's by Landsteiner and Pauling now has some experimental support.
A glossary is included with
terms such as promiscuity (a different function employing the original
active site), and moonlighting (a different function employing a site
distinct from the original active site).
Intrinsicially disordered proteins show function without a stable fold, perhaps
providing a clue to the evolutionary precursors of primordial folds.
The "new view" is that
function co-evolves with fold, rather than fold implausibly having to evolve before
function. However, "a direct proof for the role of functional diversity
in protein evolution is yet to be obtained".
All-atom contacts: a new approach to structure validation.
Jane S. Richardson.
Chapter 15 in Structural Bioinformatics (2003).
Protein crystallographers have, for the most part, traditionally
omitted hydrogen atoms in their models. Richardson and her team
have developed software to insert hydrogens, checking for steric
fit and optimizing hydrogen bonding. The steric clashes they
detect reveal occasional correctable errors in crystallographic
models, such as side chains of Asn, Gln, and His flipped
180o (where the best rotamer typically cannot be
discerned from electron density alone), or other rotamer errors
such as for Thr or Val (where the best rotamer can be discerned from the
electron density map only if it has adequate local resolution).
van der Waals clashes are clearly and easily visualized as
Kinemages. The resulting corrections, gathered from high-resolution structures,
enabled development of improved rotamer libraries and better-defined favorable
regions in Ramachandran plots, which are available for use by
crystallographers during model development.
"All-atom contact analysis" is a powerful new method for structure
validation and improvement that complements
The resulting improvements in the model often reduce the
The software is free, can be downloaded
for local (secure) operation, or you can use it via the
with large genomes
Evolution of the protein repertoire.
Cyrus Chothia, Julian Gough, Christine Vogel, Sarah A. Teichmann
Can be very
"A [protein] domain ... is an evolutionary unit whose coding sequence can be
duplicated and/or undergo recombination.
... Domains typically have 100-250 residues.
... we only clearly know the family relationships and domain structures
of those proteins that either have a known structure, or are homologous to
proteins of known structure. At present, close to 50% of the sequences in the
currently known genomes are homologous to proteins of known structure."
(This is based on
hidden Markov model methods that find about twice as many relationships as
do pairwise sequence comparisons,
Gough et al., 2001.)
"Rough estimates indicate that two-thirds of prokaryote proteins have
two or more domains. In eukaryotes ... about four-fifths of proteins
are multidomain. ...
The evolutionary relationships of domains in proteins of known structure
are described in the
Structural Classification of Proteins (SCOP) database.
... [evidence] suggests that [most] of the protein repertoire is formed
by members of families that go back to the origin of eukaryotes or the
origin of the different kingdoms.
... [and] that it is much easier to evolve new binding
sites than new catalytic mechanisms."
Computational design of receptor and sensor proteins with novel
Loren L. Looger, Mary A. Dwyer, James J. Smith, and Homme W. Hellinga
In this breakthrough in computational design of protein binding sites for
small ligands, existing binding sites were successfully
redesigned to accomodate three unrelated and quite different
ligands. When empirically tested, half of the redesigned proteins
bound their targeted ligands with Kd's of <10 micromolar
(some close to the affinities for the original ligands).
impressive specificity when challenged with analog decoys.
Five members of the E. coli periplasmic
binding protein superfamily that bound monosaccharides or amino acids
were redesigned to bind TNT, L-lactate, or serotonin. These target ligands
are neutral, anionic, and cationic respectively.
Only L-lactate is chiral. Successes with
all three and multiple original specifities exclude the results being
merely a lucky fluke. Ligand-contacting residues were mutated computationally,
while the ligand underwent translation and rotation restricted to a volume
roughly occupied by the natural ligand. The combinatorial problem
(1053 - 1076 cases) was solved with a novel
algorithm based upon dead-end elimination theorems. Each redesign took about
three days on a 20-processor computer cluster.
In a further tour de force, redesigned sugar chemotaxis receptors
were shown to enable E. coli to respond to TNT and L-lactate (using
a re-engineered signal transduction pathway that increased
b-galactosidase gene expression). Potential
biosensor applications include locating the sources of underwater TNT plumes
leaching from unexploded military ordnance, or locating landmines.
Biosensors for L-lactate or serotonin have potential clinical applications.
Chirally selective receptors could aid the "preparation of optically
pure pharmaceuticals from racemic mixtures".
Finally, the methods may be extended to design of enzymes, with the
advantage that the virtual transition-state intermediate may more accurately
reflect the true transition state than the chemically stable transition
state analogs heretofore used to select catalytic antibodies.
- Membrane proteins: the 'Wild West' of structural biology.
Jaume Torres, Tim J. Stevens, and Montserrat Samsó
Trends Biochem. Sci. 28:137.
Membrane proteins are greatly underrepresented in the Protein Data Bank
because they are less amenable to the classical methods of
and solution NMR.
This article reviews recently developed experimental and predictive methods, their pros
and cons, and successes for membrane proteins. It touches upon
electron crystallography, single particle methods, atomic force microscopy,
solid state NMR (SSNMR), oriented samples NMR (OS NMR), magic-angel spinning
NMR, site-directed spin labelling electron paramagnetic resonance, site-specific
infrared dichroism, and global search molecular dynamics simulations coupled
with evolutionary conservation data. With the latter,
"a model for the transmembrane domain of glycophorin, <1 Å root-mean-square
deviation away from the structure painstakingly determined using NMR,
was obtained in just a few days of simulations, without any experimental
data". See Integral Membrane Proteins
in the Atlas of Macromolecules.
Intramolecular interactions at protein surfaces and their impact
on protein function.
Robertson, Andrew D. (2002)
Trends Biochem. Sci. 27:521.
"... in [soluble] globular proteins, 33% of surface residues are apolar ..."
and these are often in small clusters.
Data mining the Protein Data Bank: residue interactions.
Oldfield, Thomas J. (2002)
Oldfield has created software to identify recurrent 3D structural motifs
of interactions of 3 or more amino acids, using a completely
objective method that is minimally biased by expectation.
This paper does not report any surprising new 3D interactions, but rather concentrates
on showing that the expected and well-known interactions are properly detected,
in order to validate the methods. Future work may reveal novel
He started by reducing the PDB to about 1,700 nonredundant, high quality
domain structures (PDB file fragments, mean 130 residues). This method,
dictated by the necessity for computational feasibility, precludes
identification of interdomain 3D motifs. He detected 237 3D interactions of
3 to 7 residues (ignoring Gly, Ala, Leu, Ile, Val) that occurred more than
five times. These were collated into least-squares aligned results and
templates of 1,972 common geometrical configurations. The template
coordinates can be used to search the entire PDB, or a single PDB file, for
occurrences. The results included the expected metal binding sites, ligand
binding sites, catalytic triads, 3-residue salt bridges, polar uncharged
interactions, and aromatic interactions (most edge to ring plane).
between homodimeric and monomeric proteins in the crystalline state.
Ponstingl, H., K. Henrick, and J. M. Thornton (2000).
Extent and nature of contacts between protein molecules in crystal
lattices and between subunits of protein oligomers.
Dasgupta, S., G. H. Iyer, S. H. Bryant, C. E. Lawrence, and J. A. Bell.
Dasgupta et al. studied 58 oligomers and 223 nonvirus crystals with
good resolution and completeness. They found that "crystal contact patches
are frequently smaller than patches involved in oligomer interfaces".
Contact patches of 10 to 100 atoms were common at crystal contacts. Patches
involving 100-1,000 atoms were common in oligomer interfaces but rarely
seen in crystal contacts. Nevertheless, the total number of atoms involved
in crystal contacts vs. oligomer interfaces were about the same; that is,
the larger number of small crystal contact patches involved about the same
number of atoms as the smaller number of large oligomer interfaces. They
also observed that crystal contacts tend to involve more polar
interactions, while nonpolar interactions tend to predominate in oligomer
interfaces. "Hydrophobic interactions lead to disordered precipitates, and
not to crystals."
See also Crystal Contacts
Probable Quaternary Structures.
Books and Journal Articles
Alphabetical by Author
- Hendrickson, Wayne A. Synchrotron crystallography.
2000. Trends Biochem. Sci. 25:637-43.
Overview of the previous quarter century of crystallography by a leader
in the field, emphasizing breakthroughs that matured in the 1990's.
"Six prime developments have
evolved into maturity, and they completely
change the scene for rapid structure
determination. These are undulators,
charge-coupled device (CCD)
detectors, cryopreservation, MAD phasing,
selenomethionyl proteins and structure-solving automation."
Each is explained in some detail.
- Kleywegt, GJ, AT Brünger. 1996. Checking your
imagination: applications of the free R value. Structure 4:897-904.
Kleywegt, G. J., and T. A. Jones. 1996. Phi/psi-chology: Ramachandran
revisited. Structure 4:1395-1400.
Kleywegt, G. J., and R. J. Read. 1997. Not your average density.
- Kleywegt, GJ. 2000. Validation of protein crystal structures.
Acta. Crystallogr. D. Biol. Crystallogr. 56:249-265.
- Laskowski, Roman A. 2003. Structural quality assurance.
Chapter 14 in
- Rhodes, Gale. Crystallography made crystal clear.
second edition, 1999.
Highly readable introduction recommended for
an overview of the method and interpretation of the results,
regardless of whether you plan to do crystallography yourself.
- Richardson, Jane S.
All-atom contacts: a new approach to structure validation.
See reference and precis above.