Guide to Using MSA3D
in Protein Explorer -- by Eric Martz, August 2000

Contents       Snapshot
Portions of PE's MSA3D routines were generously contributed by Paul Stothard.

Walk-Through for First-Time Users: Enolase
Thanks to Garry Duncan, Nebraska Wesleyan College, for providing the enolase alignment.

  1. Please start by reading carefully the overview on the "MSA3D Procedure" page (accessed from Advanced Explorer).

  2. Now click on the link "MSA3D ALIGNMENT FORM" and carefully read the "MSA3D Form" page. You can skip the "Advanced Options" section at the bottom. If portions of this page are unclear, they will become clear as we proceed.

  3. Now, in the "MSA3D Form" window, please find the Ready-Made Examples and click the link "Enolase". Accept the offer to fetch 4enl.pdb via Internet, or else you will need to load a local copy. Notice that clicking "Enolase" caused the relevant alignments to be pasted into both boxes.

  4. Press the button "Color Alignment & Molecule". A new window will appear containing the "MSA3D Alignment Listing". Read carefully the explanation at the top, and see also the summary counts and percentages at the bottom of this window.

  5. After you have scrutinized the Listing, click on the molecule to bring it to the foreground. Notice that the alignment colors have been applied to the molecule.

  6. Click on the links Identical, Similar, Different to spacefill the residues in these categories. The catalytic site is marked by a bound sulfate ion, and deeper, a brown zinc ion. Notice that the entire area around the active site is "Identical" in an evolutionary span from Archebacteria through man!

Pasting in an alignment and correcting mismatches with "sliding".
Thanks to Gabe McCool, University of Massachusetts Amherst, for acquainting me with these molecules.

  1. Instructions are given for making an alignment in Biologists Workbench in a later section below. Before you do that, the prepared alignment in this section will give you some useful experience in using the MSA3D feature. This is an alignment between chain B of tubulin, and the bacterial cell division protein Ftsz. These two proteins have less than 20% sequence identity, but a high level of structural similarity. (For more information on these proteins, please see Exploring structure and function of FtsZ, a prokaryotic cell division protein and tubulin-homologue by Gabe J. McCool.) The alignment below was done by ClustalW in Biology Workbench, using default settings. Given the low level of sequence homology, the alignment may not be very meaningful, but it is useful to illustrate some features of MSA3D.
    >1TUB_B; Tubulin from Sus scrofa, electron diffraction
    MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEAAGNKYVPRAILVDLEP
    GTMDSVRSGPFGQIFRPDNFVFGQSGAGNNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLG
    GGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDTVVEPYNATLSVHQLVENTDETYCIDNEALYDI
    CFRTLKLTTPTYGDLNHLVSATMSGVTTCLRFPGQLNADLRKLAVNMVPFPRLHFFMPGFAPLTSRGSQQ
    YRALTVPELTQQMFDAKNMMAACDPRHGRYLTVAAVFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVK
    TAVCDIPPRGLKMSATFIGNSTAIQELFKRISEQFTAMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVS
    EYQQYQD
    
    >1FSZ; Methanococcus jannaschii
    -------------SPEDKELLEYLQQTKAKITVVGCGGAGNNTI--TRLKMEG--------IEGAKTVAINT
    DAQQLIRTKADKKILIGKKLTRG-LGAG-----GNPKIGEEAAKESAEEIKAAIQDSDMVF---ITCGLG
    GGTGTGS-APVVAEISKKIG---ALTVAVVTLPFVMEGKVRMKNAMEGLERLKQHTDTLVVIPNEKLFEI
    VPN--MPLKLAFKVADEVLINAVKGLVELITKDGLINVDFADVKAVMN---NGGLAMIGIG--ESDSEKR
    AKEAVSMALNSPLLDVD-----IDGATGALIHVMGPED--LTLEEAREVVATVSSR--------------
    ----------LDPNATIIWG--------ATIDENLENTVRVLLVITGVQSR----IEFTDTGLKRKKL--
    -------
    

  2. Load 1fsz.pdb.

  3. In the "MSA3D Form" window, press the "Clear Form" button and OK the confirmation. Block and paste the alignment above into the top "Alignment Box" on the MSA3D Form window. (Don't worry about the spaces at the beginning of each line -- spaces will be ignored.)

  4. Block the 1FSZ portion of the above alignment and paste it into the lower "3D Sequence" box.

  5. Uncheck "Apply colors to molecule". This is optional but will save some time until we get the mismatches fixed.

  6. Click on "Color Alignment & Molecule". (The molecule won't be colored, however, since we unchecked that option.) Notice that nearly all residues are red, signifying mismatches with the aligned 3D sequence.

  7. In the "Alignment Listing" window, touch the N-terminal Ser with the mouse and notice (in the status bar) that it is residue 23 in the sequence of 1fsz.pdb. This causes 22 dots to be prefixed, representing the missing 22 residues (presumably disordered and unresolved in the crystal). These types of gaps are typically closed up in aligned sequences. Notice that the leading sequences labeled 1FSZ and 1fsz.pdb agree, but are offset by 22 residues. To make them match, we need to slide the PDB file sequence 22 residues to the left. To instruct MSA3D to do this, enter "-22" in the slot labeled "slide the PDB file sequence to the right positions". Now press the "Color Alignment & Molecule" button again. There are now 0 mismatches (check summary line at the bottom of the Listing window).

  8. Here is a more complicated example. Bring the main Protein Explorer window to the foreground, click on the link "MSA3D Procedure". Enter 1tub (tubulin) into the slot near "Load" at item 4 on the "MSA3D Procedure" page.

  9. Bring the "MSA3D Form" window to the foreground, and replace the contents of the bottom box "3D Sequence" with the aligned sequence 1TUB_B. (Leave the contents of the top box unchanged.)

  10. Delete the "-22" in the slot.

  11. Uncheck "Apply colors to molecule".

  12. Press the "Color Alignment & Molecule" button. Examining the listing will reveal that about 90% of the 1tub.pdb sequence is mismatched, and there is no obvious offset that would correct this. The problem is that we did not specify which chain is in the alignment, so chain A was used by default. There is not much sequence similarity between the two chains in tubulin. Enter "b" in the "Apply colors to chain(s)" slot. Press the "Color Alignment & Molecule" button again.

  13. The first 44 residues are matched, but a 2-residue gap causes a mismatch thereafter. Touching the first dot in the gap reports in the status line that it is position 45. Therefore we must slide the PDB file sequence 2 positions to the left starting at position 45. In the "slide the PDB file sequence to the right" slot, enter "-2@45". Press the "Color Alignment & Molecule" button again.

  14. Mismatches are now avoided up to an 8-residue gap beginning with at dot at position 361. In the "slide the PDB file sequence to the right" slot, enter "-2@45;-8@361". Press the "Color Alignment & Molecule" button again. Zero mismatches -- hooray!

  15. Now check "Apply colors to molecule", and press the "Color Alignment & Molecule" button again. Pull the main Protein Explorer window to the foreground so you can see the molecule. The Ready/Busy indicator below the molecule will be busy while the colors are applied, and again while the Identical/Similar/Different buttons are generated.

  16. The main purpose of the above exercise was to make clear what mismatches mean, and how to correct them when sliding is needed.

Characteristics of alignments suitable for MSA3D

  1. The multiple sequence alignment must be for amino acids. It would be possible to adapt MSA3D to handle RNA alignments as well. If you wish to use MSA3D for RNA, please contact Eric Martz.
  2. The alignment must be in FASTA format. PIR formatted alignments can be used with minor editing (see example below).
  3. The alignment cannot exceed about 30,000 bytes because larger blocks of text are not truncated to that size when pasted into a browser form box. A mechanism has been designed and tested (but not released) that can handle much larger alignments. If you need this, please contact Eric Martz.

Ready-made alignments from HOMSTRAD

  1. The Homologous Structure Alignment Database (HOMSTRAD) offers alignments of families of sequences within the Protein Data Bank. That is, the only sequences included are those corresponding to 3D structure entries in the Protein Data Bank. This has the advantage that the alignments usually contain less than a dozen sequences, and hence fit easily into MSA3D's form and are processed rapidly.

  2. For example, go to HOMSTRAD and search for "recombinase". Several families are displayed, with 2 to 5 sequences per alignment. Click on the family Bacterial DNA recombination protein, RuvA: holliday junction DNA helicase RuvA.

  3. You will see a table containing (at the time I tried it) only two PDB codes, 1cuk and 1bvs.

  4. Click on the link pir in the bottom line of the table. This displays the alignment in PIR format. PIR format is very close to FASTA format, however PIR has two lines of comments preceding the sequence, while FASTA has only one. This confuses MSA3D, so you need to edit the alignment to reduce the two lines to one. This can be done simply by deleting the carriage return to join the first two lines into one long one. (It will wrap in the form, but still be processed as one line.) Alternatively, you can delete the text in red, which is tedious if you have a lot of sequences in your alignment, but enables the labels to show in the MSA3D Alignment Listing. (Labels are truncated at the first semicolon.) Example: change this
      >P1;1cuk
      structureX:1cuk: 1 : : 203 : :
      holliday junction DNA helicase RuvA:Escherichia coli: 1.9: 20.9
      MIGRLRGIIIEKQPPLVLIEVGGVGYEVHMPMTCFYELPEAGQEAIVFTHFVVREDAQLLYGFNNKQERTLFKEL
      IKTNGVGPKLALAILSGMSAQQFVNAVEREEVGALVKLPGIGKKTAERLIVEMKDRFKGLHGDLFTPTDDAEQEA
      VARLVALGYKPQEASRMVSKIARPDASSETLIREALRAAL--*
    to this
      >1cuk holliday junction DNA helicase RuvA:Escherichia coli
      MIGRLRGIIIEKQPPLVLIEVGGVGYEVHMPMTCFYELPEAGQEAIVFTHFVVREDAQLLYGFNNKQERTLFKEL
      IKTNGVGPKLALAILSGMSAQQFVNAVEREEVGALVKLPGIGKKTAERLIVEMKDRFKGLHGDLFTPTDDAEQEA
      VARLVALGYKPQEASRMVSKIARPDASSETLIREALRAAL--*

  5. Copy the PIR alignment from HOMSTRAD and paste it into a text editor (Word, WordPad, BBEdit, etc.). Edit it as explained above. Now it is ready to paste into the MSA3D form and use to color the 3D image. We'll assume you've done the sections on How to Use MSA3D, so you know how to proceed.

Preparing an alignment in Biology Workbench (BW).

    BW is a very flexible and powerful system. The method described below is only one of many ways it could be used to prepare a multiple protein sequence alignment. BW is not very user friendly, but the fact that it saves your sessions makes it worth the trouble. Once you get the hang of it, you can try variations on the method below.

    The instructions below were written for Biology Workbench. After they were written, Biology Workbench for Students became available. It has a more limited set of options, and is a bit more user friendly, and it also saves your sessions (indeed, shares them with Biology Workbench!). Students may prefer to use it, while researchers will prefer the additional options in the full Workbench. In the Student Workbench, the process is similar to the one outlined below for the full Biology Workbench -- a few details are different, but you should have no trouble adapting the procedure below.

    Caveat: I have little experience preparing alignments! If you know of a tutorial with better or more complete advice, please tell me about it.

  1. Go to the Biology Workbench (BW).

  2. If you have not used BW before, click "Setup a free account". It takes only a few minutes. The advantage is that your sessions will be saved, so you can easily resume one.

  3. After you enter BW, click the [Session Tools] button.

  4. Select "Start New Session", and press the [Run] button.

  5. Enter a session description, such as the name of the molecule of interest. Press the [Start New Session] button.

    SELECTING SEQUENCES
    Sequences can be selected to address different questions. Often, one wants to know which residues are conserved over a broad range of phylogenetic distance. Another question is which residues have mutated in closely related molecule, for example wild type human hemoglobin vs. sickle cell hemoglobin.
     

    The Sickle Hemoglobin Mutation in MSA3D

    If you want to visualize the difference between wild type vs. sickle human hemoglobin using MSA3D, view 2HBS (human sickle hemoglobin) in Protein Explorer, and use an alignment between the sequences of the beta chains (chain B in each case) of 2HBS and 2HHD (human wild type hemoglobin). You can use the steps below to search for these two PDB ID codes ("2HBS or 2HHD") and align chains B, or if you're impatient, here is that alignment ready to paste into MSA3D's form:

    >2HHD_B
    VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
    FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA
    LAHKYH
    
    >2HBS_B
    VHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
    FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA
    LAHKYH

    Be sure to enter BDFH (all the beta chains) in the slot "Apply colors to chain(s)" on the MSA3D form. Otherwise the alignment will be compared with the alpha chain, and you'll get mostly mismatches!

    If you want to find out what the mutant valine residue (E6V) is contacting in order to aggregate the two hemoglobin molecules:

    • At the MSA3D Result, click Different to spacefill just the mutant valine. (If the Identical residues are spacefilled, click Hide after the word Identical.)
    • Go to Advanced Explorer, then Quickviews.
    • SELECT Clicked, one residue per click, and select the one valine that is touching the adjacent hemoglobin molecule.
    • Click Stop at the top of the middle frame, to stop selection.
    • DISPLAY Contacts.
    • Zoom to enlarge the view.
    • To tell which chain each contacting residue belongs to, enter this command in the command entry slot: select not balled.
    • COLOR Chain.
    • To find out more, visit this Hemoglobin Tutorial. You'll learn that the residues contacting the mutant valine in 2HBS are not the same ones involved in sickle cell disease. Nevertheless, the principle is similar: the additional surface hydrophobicity of the mutant valine "precipitates" the hemoglobin.

  6. Press the [Protein Tools] button.

  7. Select "Ndjinn - Multiple Database Search", and press the [Run] button.

  8. Check only one database: PDBFINDER. (Look for the first blue line, OMIM; just below it is PDBFINDER.) This guarantees that there is a 3D structure (PDB file) for everything you find.

  9. Enter the name of the molecule of interest in the slot at the top, and press the [Search] button.

  10. A list of sequence names is displayed. Each sequence name is prefixed with "PDBFINDER" (meaning it has a published 3D structure). You need to select at least one of these to include in your alignment. More than one is also OK. Use the [Show Records] button to get more information about the checked sequences.

  11. After you have checked the desired sequences, and unchecked others, press the [Import Sequences] button.

  12. At this point, you may already have enough sequences to try an alignment. If so, skip ahead to ALIGNMENT. Even two sequences sometimes gives an informative result.

    GETTING MORE SEQUENCES

  13. Method I: Searching by Name. Go to Protein Tools and use the Ndjinn search as in the preceding steps, but this time check the SWISSPROT database (instead of only PDBFINDER). You'll get a lot more sequences. (If you can't find it in the long list of databases, use Netscape's Edit, Find in Page to look for "swiss".)

  14. Method II: Searching by Sequence Similarity. Go to Protein Tools and check just one sequence name, the one representing the PDB file you want to color in Protein Explorer. (You may want to preview the PDB files you've selected in Protein Explorer to select the best one for 3D viewing.)

  15. Select BLASTP, then press Run.

  16. On the BLASTP page, select the SWISSPROT database.

  17. Just below the databases list is a slot for "Expectation value". The default (10) is much too high. Enter 0.1.

  18. Scroll to the bottom and press the Submit button.

  19. Now you need to select some of the sequences in this list for importing. Use the [Show Records] button to help your selection.

  20. Use the [Import Sequence(s)] button to import sequences you wish to align.

    ALIGNMENT

  21. Now you have a list of sequences with checkboxes. Select "Select All Sequences" and press [Run].

  22. Uncheck any sequences you don't want to include in the alignment.

  23. Make sure you check at least one sequence for which a 3D structure is available. All PDBFINDER sequences have 3D structures. SWISSPROT sequences usually don't.

  24. Scroll down in the list of operations at the top until you find CLUSTALW (near the middle of the list). Select it and [Run]. On the next screen titled CLUSTALW, press [Submit].

  25. Examine the alignment carefully. An alignment that has very few identities, or very few differences, may not be informative. If you wish to exclude one or more sequences, press the [Return] button and rerun the alignment.

  26. Once you are satisfied with the alignment, press [Import Alignment].

  27. You should now see a list of all alignments you have made (initially just one), each with a checkbox. Notice that you are now in the Alignment Tools, no longer in Protein Tools.

  28. Now we need to get the alignment in FASTA format. Check the checkbox for the desired alignment. Select "Edit Aligned Sequences", press [Run].

  29. Find the Format menu, and change it to "Fasta".

  30. Block and copy the alignment. Paste it directly into Protein Explorer's MSA3D form. Optionally, also paste into a word processor and save it as a file for later use.

  31. Select the one sequence that matches the 3D structure PDB file you wish to view and color. Copy that sequence into the bottom "3D Sequence" box on the MSA3D Form. Load the corresponding PDB file. Assuming you have done the tutorial above, you will now know how to proceed.

  32. Sometimes the PDB file sequence does not begin anywhere in the alignment listing -- the PDB file sequence line is all dashes. If this happens, use the PE Site Map to open the Sequences display. Note the number of the first residue (we'll call it "N1"). Enter the value -(N1 - 1) in the slot on the MSA3D Alignment Form labeled "slide the PDB file sequence". For example, if N1 is 389, enter -388 in the slot, then press the [Color Alignment and Molecule] button.

  33. Occasionally, CLUSTALW will fail to align a sequence correctly with other sequences. If the alignment is important to you, inspect it carefully in the MSA3D Alignment Listing. Look for crucial sequence motifs known for this family of molecules and make sure they are aligned. (If the alignment is incorrect, I don't know how to fix it. Send suggestion to me.)

Error avoidance and design features of the MSA3D tool.

  1. MSA3D refuses to proceed unless all the aligned sequences have the same length. Were an unaligned sequence of the loaded PDB file to be pasted into the lower box, most likely the length would differ from the alignment, and hence this would be caught.

  2. When a residue in the PDB file sequence is not identical to the residue in that position in the aligned "3D Sequence", both residues will be colored "mismatch" in the listing, and the mismatch color will also be applied to the 3D structure.

  3. The sequence of the PDB file can be longer or shorter than the alignment, and vice versa.

  4. The residue counts (and percentages, and total residues) in the summary at the bottom of the alignment listing include only the portion of PDB file residues that fit underneath the alignment. If sliding to the left causes residues in the PDB file to be skipped, they will neither be listed nor included in the summary counts.

  5. If the PDB file sequence is longer than the alignment, residues beyond the end of the alignment will be listed in the "No Info" color, colored "No Info" in the 3D structure, and excluded from the summary counts at the bottom of the listing window.

  6. When the first residue in the PDB file has a negative or zero sequence number, the negative or zero numbers will be reported in the MSA3D Alignment Listing window (in the status line, when the residues are touched with the mouse).

  7. Gaps, including a leading gap, in the PDB file sequence will be represented by dots (periods) and will require sliding corrections to avoid mismatches.