Syllabus for
Protein 3D Structure Visualization & Structural Bioinformatics

Graduate School of Frontier Biosciences, Osaka University (Japan), May 9-13, 2011

This document is on-line at
Schedule of Times and Rooms

Lead Instructor: Professor Eric Martz
(Author of FirstGlance in Jmol and Protein Explorer; Member of the Proteopedia Development Team.
Professor Emeritus, University of Massachusetts, Amherst --

Organizer and Co-Instructor: Professor Keiichi Namba.
Co-Instructors Tohru Minamino and Shigehiro Nagashima.
Teaching assistants Akihiro Kawamoto, Fumiaki Makino, Noritaka Hara, and Akira Hida.
Thanks to Kana Moriya for arrangements.

Goals: This course will prepare students to understand and incorporate 3D macromolecular structure into their research and teaching. The principles of protein structure will be reviewed, including noncovalent bonds. Structural bioinformatics and genomics will be introduced. Students will learn what percentage of proteins have known 3D structures, and the importance of crystallographic models compared to homology or theoretical models.

In the computer laboratory, students will learn how to find 3D protein molecular models for proteins in their research, and how to use Proteopedia.Org and FirstGlance in Jmol (adopted by Nature) to investigate key structural features.
Protein structure will be related to function, evolutionary conservation and multiple-sequence alignments, and drug design. Specific oligomers will be constructed automatically by quaternary structure servers, and visualized. Students will learn how to prepare customized publication-quality molecular images, animations for Powerpoint slides, and how to effectively and intuitively communicate function-structure relationships with online molecular scenes in the Proteopedia.Org wiki. Students will be challenged with molecular structure problems, and each student will prepare Powerpoint slides capturing examples of the skills they have learned. All the software is web browser-based, easy to use, works on Windows or Macintoshes, requires no installation, is free and open-source, and should be available for years to come.

    I. Proteopedia: The best place to begin understanding a molecule's structure.
    Protein Data Bank & PDB Codes
    Crystallographic Resolution
  1. Primary vs. Derived 3D Macromolecular Structure Databases:
  2. PDB identification code examples:
    • 1pgb a small protein domain, one chain.
    • 2hhd four protein chains with ligands.
    • 1d66 protein and DNA.
    • 104d DNA/RNA hybrid.
    • 9ins protein hormone.

  3. Proteopedia.Org (Part I). The Best Place to Start Looking at a Molecule!
    1. Main page: green links, ligands, functional sites.
    2. Has a page for each of the >70,000 PDB entries, showing it in Jmol.
    3. Highlights ligands, functional sites, and non-standard residues giving their full names. (1h6m)
    4. Title, Abstract, Publication (visit the PDB codes listed above).
    5. Shows resolution (see below).
    6. Identify by touching (shows a "hovering" report).
    7. Pop up any molecular scene.
    8. Save any page for off-line projection.
    9. Explanations of structural biology terms and concepts, e.g. asymmetric unit, Protein Data Bank, hydrogen bonds, temperature value, etc. all at About Macromolecular Structure.

  4. X-Ray Crystallography and Resolution
    • 85% of models in the PDB come from X-ray crystallography experiments.
    • X-ray crystallography produces an electron density map (EDM).
    • The average uncertainty in an EDM is measured by its Resolution in Ångstroms:
      • 1.2 Å Excellent -- backbone and most sidechains very clear. Some hydrogens resolved.
      • 2.5 Å Good -- backbone and many sidechains clear.
      • 3.5 Å OK -- backbone and bulky sidechains mostly clear.
      • 5.0 Å Poor -- backbone mostly clear; sidechains not clear.
      • See the MOVIE.

    II. Finding published molecules of interest.
    Begin Powerpoint Slides.

  1. Finding published molecules of interest:
    Each student: please find the PDB code of a molecule related to your research or interests.
    You will use the PDB structure that you select for the rest of the class, and for your Powerpoint report.
      Recommended Characterstics
    1. X-ray crystal structure with good resolution <= 3.0 Ångstroms.
    2. Published in a very good journal (Nature, Science, Cell, Biochem., Biochem. J. EMBO J., Eur. J. Biochem., J. Biol. Chem., J. Mol. Biol, Nature Struct. & Mol. Biol., Nucl. Acids Res., Public Lib. Sci., Proc. Nat. Acad. Sci., Proteins, Protein Sci., Structure, etc. ).
    3. Ligand(s) present.

    4. Searching (for a molecule from your research or team)
      Use the sequence of your protein to search for 3D structures.
      Write down the PDB code(s)!
      • At (PDB-USA), click Advanced Search. Choose the Query type "Sequence Features: Sequence". Paste your sequence in the large box and click Submit Query.   Powerful but sometimes difficult to use; help is sometimes inadequate. Offers the most detailed information about hits.
      • PDBsum   Useful short summaries of hits.
      • OCA   Powerful and straightforward. Useful results table.
      • PDB-Europe

      Find molecules by name. (Not for sequence searching.)

    5. Browsing (if you don't have a molecule in mind)

  2. What if your molecule has no published structure?
    • Try making a comparative/homology model. Submit your sequence to Swiss-Model . Use Automated Mode. More..
    • See if a structure of your molecule is in the Structural Genomics pipeline. Submit your sequence to the SG TargetDB. (Ask for help interpreting the results.)

  3. Begin your Powerpoint Slides (Later, you will email them to Prof. Martz) Complete slides 1-3 now: Required content for your slides.

    III. Review of Protein Chemistry and Structure.
    Introduction to Structural Bioinformatics.

  1. Central Dogma: DNA mRNA Protein.     DNA structure in Jmol / Estructura del ADN
  2. 20 Amino acids
  3. Polypeptide chain geometry and steric restrictions
  4. Covalent and non-covalent chemical bonds
  5. Typical hydrogen bond within a protein: hydrogen donor atom is covalently bonded to hydrogen; acceptor atom is not.
  6. Secondary Structure
  7. Folding: hydrophobic collapse
  8. Protein folds cannot be reliably predicted from sequence alone (using ab initio theory).

  9. Introduction to Structural Bioinformatics
    • Why do we care about 3D macromolecular structure?
    • What are 3D structure data?
    • Where do 3D structure data come from?
    • How much 3D structure knowledge do we have?
    • Primary and Derived 3D Structure Databases

    IV. FirstGlance in Jmol for exploring any macromolecule.
    FirstGlance in Jmol (Part I).

  1. Go to 1pgb in Proteopedia, then under Resources, click on FirstGlance.
    In FirstGlance, try these:
    1. Introduction
    2. Top 2 rows of views
      1. Hydrophobic core
      2. Challenge: hydrophobic/polar -- 1bl8
      3. Orientations of Proteins in Membranes.
      4. Amphipathic helices and strands: 1icw. (In FirstGlance, Hide a range (the helix or strand) then invert.)
    3. Vines
    4. Buttons
    5. Center Atom
    6. Reset

  2. Go to 1hho in Proteopedia, then click on FirstGlance. Explore these features:
    1. Ligands button
    2. Hide

  3. Go to 1icw in Proteopedia, then click on FirstGlance.
    1. Find (explain the distributions of gly, pro, ala, glu, phe, viewed one at a time; check the hydrophobic core)

  4. Go to 3kwb in Proteopedia, then click on FirstGlance. Under More Views.., show all Disulfide Bonds and Cysteines.

  5. Continue preparing your Powerpoint Slides 4-6.

    V. Introduction to Multiple Sequence Alignment (MSA) and Conservation
    ConSurf Server
    Structure of Atomic Coordinate ("PDB") Files
  1. Evolutionary conservation identifies functional sites in protein molecules.
    1. In Proteopedia, show Evolutionary Conservation.
    2. Enzyme example: ConSurf-colored sequence -- enolase 4enl in Proteopedia -- enolase in Wikipedia.

    3. Multiple sequence alignments reveal conservation: MSA for 4ENL in black and white (printed handout).
    4. Detail of MSA with color

    5. ConSurf Mechanism.   (Details of Mechanism).
    6. There are two ConSurf Servers:
      1. ConSurfDB (DataBase)
        • Pre-calculated for every chain in the PDB.
        • Results are shown in Proteopedia.
        • Multiple Sequence Alignments typically include proteins of more than one function, so some conservation may be hidden.
      2. ConSurf
        • Set up each job by hand.
        • Easily select sequences for a single protein function, revealing conservation (within a family of proteins performing a single function) that may be hidden in ConSurfDB.

  2. Atomic Coordinate Files
    • Formats
      • Crystallographers: "PDB format" (Human-readable; based on 1970's system that used 1928-design paper punch cards)
      • Protein Data Bank: mmCIF (macromolecular crystallographic information format)
      • US National Center for Biotechnology Information: ASN1
    • Structure of PDB Files: What are 3D structure data?
    • PDB files are plain text -- they can be edited with a text editor.
    • Examine PDB file for 4phv at

    VI. Evolutionary Conservation with ConSurf-DB
    Authoring Molecular Scenes in Proteopedia
    Publication-Quality Images & Animations for Powerpoint
    As you complete each section today, record your results in your Powerpoint Slides.

  1. Evolutionary Conservation: Follow the instructions for Slide 8 to show conservation in your PDB code.
  2. If you have a serious research interest in the conservation pattern of your molecule (not required):
    1. You will want to do a ConSurf run where you limit the multiple sequence alignment to proteins with the same function as your molecule. Instructions.
    2. Using FirstGlance from ConSurf, you can see the conservation levels of amino acids contacting a moiety of interest.

    FirstGlance in Jmol (Part II).
  3. Noncovalent bonds (with 1hho): Contacts to HEM
    • Selecting a target (HEM).
    • Four views of noncovalent interactions.
    • Showing subsets of 7 kinds of noncovalent interactions.
    • Measuring interatomic distances.
    • 2vaa chain P has 1 salt bridge and 2 cation-pis.
  4. Specific Oligomers:
  5. Missing amino acids: No coordinates in the model due to disorder in the crystal. Use under Key Resources. Example: 2ACE.

  6. Author two scenes in Proteopedia.Org (Part II):
    1. See the help and movies under Want to Contribute? at the Main Page of Proteopedia.

    2. Login as "student". Ask for the password.
    3. Go to the page Sandbox Reserved NN, where NN is the number assigned to you. For example, if you are assigned number 12, go to the page titled Sandbox Reserved 12.
    4. Click the tab, at the top, edit this page.
    5. Keep the {{Template:...}} at the top, but delete anything else that you did not put in this page.

    6. Click the 3D button (above the box) to insert a Jmol.
    7. Put your PDB code in the load parameter of the applet tag.
    8. Save the page (click Save page twice). You should see your molecule.

    9. Edit again, and show the Scene authoring tools.
    10. Use the load molecule tab to load your PDB code.
    11. Customize your scene: select, represent (display), color, label as you wish.
    12. Option: If you wish, you may copy a scene from FirstGlance into Proteopedia: Instructions.
    13. Use the save scene tab to save your scene.
    14. Paste the scene tag into the box above (the page text).
    15. Save the page (click Save page twice).

    16. Try the green link you made.
    17. Put a snapshot into a Powerpoint slide.
    18. Create a second scene and green link for your second Proteopedia Powerpoint slide.

      If you would like to contribute permanent content to Proteopedia, please apply for an account and password: click on request account.

  7. Continue preparing your Powerpoint Slides 7-12.

    VII. FirstGlance in Jmol -- Part III
        Salt Bridges & Cation-Pi Orbital Interactions
        Color by Undertainty ("Temperature")
        Gaps in the Model due to Disorder
    Intrinsically Unstructured Proteins
    FirstGlance in Jmol (Part III)
  1. Under More Views:
  2. Solution Nuclear Magnetic Resonance (NMR)
    • Gives an ensemble of multiple models consistent with the data.
      Examples: 1abt, 1cfc, 1jsa.
    • Differences between models can reflect flexible motion in solution, or simply uncertainty due to a lack of enough data. Nothing in the PDB file tells you which is the case. You need to contact the authors.
    • There is nothing in the PDB file that measures reliability. (In contrast, for X-ray data, Resolution, R, and R-free measure reliability.)
  3. Charge:
    • 1d66.
    • Challenge: how can protein charge be changed in seconds, without changing the pH?     .
    • Calculate the isoelectric point (pI) and charge at pH 7 for one chain of your protein:
      1. Show your PDB code in Proteopedia, and click on OCA.
      2. At your PDB code in OCA, scroll down to Sequence-derived information (near the bottom).
      3. Click on the link for the one-letter amino acid sequence for one chain.
      4. Copy the sequence and paste it into the large box at the Protein Calculator.
      5. Check the three boxes under Charge at the right, and click Submit Query.

  4. Intrinsicially Unstructured / Natively Disordered Proteins
      About 10% of proteins are thought to be fully disordered to support their functions, and 40% of eukaryotic proteins have at least one long disordered region. Examples.

    VIII. Flagellar Assembly
    Structural Bioinformatics and Genomics.
    Homology (Comparative) Modeling
    Assessing Model Quality
  1. Introduction to bacterial flagellar assembly:

  2. Structural Genomics: Worldwide Protein 3D Structure Knowledge
    1. How are 3D macromolecular structures obtained? Crystallography, NMR, and homology modeling.
    2. What fraction of the human proteome has known structure? A few percent.
    3. Is Structural Genomics the answer? Not in the next few years.
    4. Intrinsicially unstructured proteins.

  3. Modeling vs. Visualization

  4. Homology (comparative) modeling: Introduction.
    1. Automated homology modeling: submit sequences to Swiss-Model (click on Automated Mode).
    2. Compare homology models from various methods at LOMETS. Here is an example study comparing multiple homology models. See especially comparisions in supplementary figures S4-S6.
    3. See if a structure of your molecule is in the Structural Genomics pipeline. Submit your sequence to the SG TargetDB. (Ask for help interpreting the results.)

  5. Model Quality: X-ray > NMR >> homology modeling >> ab initio theoretical modeling.
      X-ray crystallographic models:
    • Resolution is very important.
    • R value measures disagreement between model and data. Should be <0.20, and not greater than Resolution/10. The R value can be fooled.
    • Free R tests for major errors. Should be <(R + 0.05). Most important! Cannot be fooled if calculated correctly.
    • The above are averages. Examine the temperature values for local quality variation (in FirstGlance under More Views..: 1pgb).

      NMR, homology, or theoretical models:
    • There is nothing easily available to assess quality.

    IX. Publication Quality Images and Animations with Polyview-3D
    Finishing Powerpoint Slides
    Animation from Polyview-3D.
    Click on the above image for
    a larger view and explanation.
  1. Generate publication quality images easily with Polyview-3D.
    1. PyMol: popular with crystallographers. Beautiful views but not user friendly.
    2. Example (at right): 1d66 with Gal4's recognition of the CGG sequence in the DNA
    3. Demonstration of how to use Polyview-3D for customized scenes. highlighted.
    4. Create a static ("single image") scene of your molecule, size 500 pixels.
    5. Create a small (300 pixel) rotating view of your molecule, and paste it into a Powerpoint slide. Limit rotation to 30 degrees in 2 degree steps!

  2. You are now prepared to finish your Powerpoint Slides. Please email the completed PPT file to
    emartz AT microbio DOT umass DOT edu.

Additional Resources.
    Probably we will not have time in class to spend on these resources. Links are provided here in case you are interested to look at these later.
  1. Save any molecule you see! (as a PDB file)
    • Jmol: click on Jmol, then top item on menu, then bottom item on submenu.
    • Upload saved molecule (PDB file) to

  2. Example: Gramicidin channel in a lipid bilayer.

  3. Jmol in Scientific Journals
    1. FirstGlance in Jmol is used in Nature and other journals.
    2. "Jmolized" Interactive Journal Figures: Biochemical Journal and ACS Chemical Biology.
    3. Jmolize your own figures: Frieda Reichsman -- MoleculesInMotion.Com
    4. Toolkit for Jmolizing journal figures provided by the International Union of Crystallography.
    5. Note that Proteopedia is easier than "Jmolizing": see Supplementary Materials in Proteopedia.

    Simplified SV40 Virus Capsid.
  4. Specific Oligomers vs. Crystal Contacts

    Lac repressor bending the DNA operon. If this image is not moving, reload the page.
  5. Animations & Morphing

    For Teachers and Future Teachers

  6. BioMolecular Explorer 3D (for students ages 15-19). All Jmol!

  7. High School Teacher's Resources.

  8. Bird Flu: N1 vs. Tamiflu Lesson Plan:
    • See links to background, lesson plan, morph animations of induced fit, and a cavity near Tamiflu at Proteopedia: Eric Martz's Favorites

  9. MolviZ.Org
    1. DNA, Hemoglobin, Antibody
    2. Lipid Bilayers and Gramicidin Channel
    3. Collagen
    4. Water & Ice & hydrogen bonding
    5. Toobers in Science Education

  10. World Index of Molecular Visualization Resources
    1. Hundreds of tutorials indexed by macromolecule (most in Chime, some in Jmol)
    2. Sources of atomic coordinate (PDB) files (metabolites, inorganic crystals, lipid micelles, etc.)
    3. Galleries, Molecular Sculpture and Physical Models, Software

  11. About Protein Structure
  12. Building a web page that shows your favorite molecules for research or teaching.
  13. T shirts and mugs!                  
    (Click images for more information.)

Keep in touch!