Fewer or Single Chains
by Eric Martz, April 2001; revised September 2002.
a resource within Protein Explorer

Many PDB files contain multiple chains, but it is often useful to create a PDB file containing only one of those chains, or a specified subset of those chains. For example, if you zoom in closely on a center of interest, extra chains may cause the center to slide out of view when rotating. Obtaining a single chain PDB file usually prevents this (as recommended in the Zoom help of QuickViews). Here are several methods and their limitations.

  1. The site will provide you with each chain individually, but only if the contacts between chains are crystal contacts (not specific oligomeric contacts). Examples are 104l (lysozyme mutant) and 1b6b (serotonin N-acteyltransferase).
    • Check PQS first, because if it provides what you need, the chains will include all nearby solvent and ligand atoms, usually including some solvent molecules not in the original PDB file, based on crystal symmetry.
    • The major disadvantage is that often the interactions between chains are not merely crystal contacts, and in this case, PQS does not offer single chains.
    • A possible disadvantage is that PQS offers only single chains, or complete oligomers -- it will not offer any arbitrary subset of the chains.

  2. ExPASy (Expert Protein Analysis System) has a service called ExPDB that offers any single chain.
    • Limitation: solvent and ligand atoms are lost, unless the ligand atoms are designated with the same chain identifier.
    • Limitation: ExPDB is updated infrequently, so recently published PDB files may not be available.
    • Only single chains are offered -- ExPDB will not provide arbitrary subsets of two or more chains.
    Go To ExPDB

  3. RasMol can be used to write a PDB file containing any desired subset of the original.
    • RasMol enables creation of a PDB file containing any arbitrary subset of chains including the nearby solvent and ligand atoms.
    • For single chains, when PQS offers single chains, the PQS result is better because of the symmetry-included solvent and ligand atoms.
    • You have to download RasMol and learn a little about how to use it.
    Here is the procedure:

    1. Viewing the molecule in PE, write down the one-character name(s) of the chain(s) you want. (Find out the names by checking Sequences, from the PE Site Map, or by clicking on the chains and watching the reports in the message window.)

    2. Write down the names of the chains you don't want.

    3. If you don't have it already, download and install RasMol. (This is really easy.)

    4. Create a new folder (directory).

    5. Put a copy of the PDB file in question in the new folder. (See downloading PDB files.)

    6. Put a copy of the RasMol application (not a shortcut) in the folder. (Macintosh: drag the RasMol application file into the folder. An alias will not work.)

    7. Run RasMol (double click on the application).

    8. RasMol has two windows: a black molecule window, and a white command entry window. Windows: the white window is initially minimized; open it from the taskbar, and move it to below the black window. Macintosh: the white window may be hidden behind the black window; move it so you can get it easily.

    9. Open RasMol's File menu, and Open your PDB file. If your PDB file does not appear on the menu, you have not followed the instructions above.

    10. Now you need to select the chains of interest, and all nearby molecules, excluding the chains you don't want. This is done by entering a single "select" command in the white window. For example, assume you want chains A and C, but not B and D. The command would be
        select within(3.5, (:a,:c)) and not (:b,:d)
      • Notice the crucial colon before each chain name!
      • "3.5" is a distance in Å recommended because it is approximately the maximum donor-acceptor distance for energetically significant hydrogen bonds. If you want to include hetero atoms within a different distance, you can change this value, but be sure to include a decimal point. (For example, "4." will work, while "4" (no decimal point) will be interpreted by RasMol as being in units other than Å.)

    11. Save the new PDB file, giving it a suitable filename. For example, if our PDB identification code was 2hhd, and we saved chains A and C, this command would be appropriate:
        save pdb 2hhd_ac.pdb

    12. Optional (but a very good idea): Edit the HEADER record (first line in the PDB file) to indicate what you have done. The HEADER and COMPND records are shown in PE's Features of the Molecule control panel. It is a good idea if this information shows what is contained in the PDB file, making it clear that this is not the original PDB file as published at the Protein Data Bank! Here are the original first two lines in 1EWQ.PDB:
      HEADER    REPLICATION/DNA                         26-APR-00   1EWQ
      TITLE     CRYSTAL STRUCTURE TAQ MUTS COMPLEXED WITH A HETERODUPLEX
      The portion of the HEADER line in boldface is not very important, so it can be replaced with a message to state what subset you have selected. Here are the first three lines as RasMol saved the file:
      HEADER    REPLICATION/DNA                         13-JUL-93   1EWQ
      TITLE     CRYSTAL STRUCTURE TAQ MUTS COMPLEXED WITH A HETERODUPLEX
      ATOM      1  CD2 LEU A   5      19.508  20.056  -0.144  1.00 85.38
      Notice that RasMol changed the date in the first line from the deposition date of the PDB file in the Protein Data Bank, 26-APR-00 in the HEADER line above, to a meaningless date in 1993. You should restore it to the correct deposition date. Also, you must be sure to keep the date and PDB identification code, which are at the end of the TITLE line, intact and aligned properly (not shifted to the left or right). Software like RasMol and Chime and other programs that read PDB files expect these pieces of information to be in specific columns in the TITLE line, and will misread the information if they are shifted. I saved only the DNA and everything within 10 Å of the DNA from 1EWQ. So here is my edited first line:
      HEADER    SUBSET OF 1EWQ WITHIN 10A OF DNA        26-APR-00   1EWQ
      TITLE     CRYSTAL STRUCTURE TAQ MUTS COMPLEXED WITH A HETERODUPLEX
      ATOM      1  CD2 LEU A   5      19.508  20.056  -0.144  1.00 85.38

    13. You are now ready to use the new PDB file in PE.

    14. If you plan to put your new PDB file on the web for Chime, please gzip it! (RasMol cannot understand gzipped files.)

    Note: the above procedure takes advantage of the fact that RasMol, unlike Chime, saves only the currently selected atoms. Chime always saves the entire PDB file, just as it was received, including the header. Notice also that the RasMol-saved PDB file lacks the header, and that the atomic coordinates are unconditionally recentered. More information is available at the PDB Tools page.


Feedback to Eric Martz.