The following information applies to RasMol version 2.6-beta-2a and Chime 1.0 or 2.0. It does not address any relevant changes which may have been made in RasMol 2.7 or later.
The bottom line is that, by default, for macromolecules, RasMol and Chime use internal algorithms to determine the placements of all covalent bonds. For macromolecules, in nearly all cases, the CONECT records in the PDB file are ignored. However, this default behavior can be adjusted as explained below.
What are the values used by RasMol and Chime for van der Waals radii in the spacefill rendering?
The radii for common elements are listed below. If the PDB file contains hydrogen atoms, the van der Waals radii are used; if not, slightly larger 'united atom' radii are used for C, N, O, and S as detailed below.
Chime or RasMol can be forced to use van der Waals radii (instead of united atom radii) for spacefill and dots by adding a single hydrogen atom to the PDB file with a text editor, a line such as
HETATM 9999 H 0.000 0.000 0.000 1.00 1.00(If this places the atom away from the molecule, you can use the same coordinates as any oxygen molecule, which will place the hydrogen inside the oxygen. Of course you need not display the hydrogen, which can be hidden with the command 'restrict not hydrogen'.)
Roger Sayle states
Most protein structures in PDB do not contain explicit hydrogens as these are not resolved by X-ray crystallography. As a result for molecules without hydrogen atoms, RasMol uses 'united atom' Van der Waal's radii for carbon, nitrogen, oxygen and sulphur. This is the effective size of an atom plus its hydrogens (commonly used in forcefields and a better approximation of the molecule's surface).Note that the increase for carbon is larger than for oxygen, this is caused by both the average number of aditional hydrogens and how closely they're bound to their parent atom.
Unfortunately, RasMol does not maintain a table of ionic radii for each possible charge state of an element, and so the values used by RasMol for displaying salts or similar structures are incorrect. Complete and reliable tables of ionic radii are both rare and harder to encode in computer software. For example, the CRC handbook lists five ionic radii for selenium: -2 (1.91A), -1 (2.32A), +1 (0.66A), +4 (0.50A) and +6 (0.42A). This series fits well with the VdW radius (as used by RasMol) of 0.9A in the uncharged state.
Element | Covalent Radius | van der Waals Radius | United Atom Radius (includes hydrogen) |
H | 0.320 Angstroms | 1.100 Angstroms | (not applicable) |
C | 0.720 | 1.548 | 1.872 |
N | 0.680 | 1.400 | 1.507 |
O | 0.680 | 1.348 | 1.400 |
P | 1.036 | 1.880 | (not applicable) |
S | 1.020 | 1.808 | 1.848 |
Ca | 0.992 | 1.948 | (not applicable) |
Fe | 1.420 | 1.948 | (not applicable) |
Zn | 1.448 | 1.148 | (not applicable) |
Cd | 1.688 | 1.748 | (not applicable) |
I | 1.400 | 1.748 | (not applicable) |
The complete set of radii may be found in RasMol's source code, in the file abstree.h. The values are given in RasMol units, for example, 387 for carbon. One Angstrom equals 250 RasMol units.
How do RasMol and Chime determine which atoms in a PDB file are covalently bonded?
The first decision made by RasMol/Chime is whether the PDB file itself specifies the bonds. Bonds are specified in PDB files with CONECT records.
Covalent Bond | Bond Length | Maximum Length Allowed by RasMol/Chime (<255 atoms or connect true) |
C-H | 1.06-1.11 Angstroms | 1.60 Angstroms |
C-C | 1.5 | 2.00 |
C-N | 1.3-1.5 | 1.96 |
C-O | 1.4 | 1.96 |
C=O | 1.2 | 1.96 |
"P-P" | 2.632 | |
P-O | 2.276 | |
S-S | 2.04 | 2.60 |
S-O | 2.26 | |
Ca-O | 2.232 | |
Zn-S | 3.028 |
For example, if you wish to show hydrogen bonds between atom serial numbers 117-214 and 117-303
(As of August, 1999, set connect save and initscript have not yet been documented in MDL's Chime 2 documentation, nor is the former documented in the Chime 0.99 partial draft manual which documents some of the commands not yet documented by MDLI.)
Hiding arbitrary bonds is a straightforward application of commands adequately documented in the RasMol Reference Manual. One must either select the unwanted bonds, and then issue the command wireframe false, or else "restrict not" the unwanted bonds. Bonds are selected or restricted by selecting or restricting the atoms at the ends. Hence restricting bonds will also hide spheres, while selecting bonds followed by wireframe off will preserve spheres.
By default, atoms at both ends of a bond must be selected to operate on that bond. This is equivalent to the command set bondmode and. Alternatively, after issuing the command set bondmode or, commands operate on all bonds for with an atom at either end is selected.
The hbonds command displays only backbone-to-backbone hydrogen bonds in regions of defined secondary structure, and Watson-Crick inter-nucleotide bonds in DNA double helix. RasMol and Chime have no built-in mechanism to locate and display other types of hydrogen bonds, such as any involving sidechains, inter-chain hbonds, ligand-protein hbonds, etc. The Noncovalent Bond Finder, a tool which employs Chime, can be used to visualize hbonds involving any selected set of atoms. For small ligands, contact surfaces provide an overview which highlights the hbond locations, and the atoms involved. A user-friendly method for generating contact surfaces is provided in the Protein Explorer.
For protein, regions of alpha helix or beta-sheet are identified by application of the algorithm of Kabsch and Sander. Hydrogen bonds are assigned when a distance-dependent electrostatic energy of interaction between donors and acceptors falls below a threshold value. No reference is made to phi or psi angles.
For DNA, Watson-Crick double helix internucleotide bonding rules are applied without reference to actual distances between donors and acceptors. This can lead to gross errors. For example, in 1d66.pdb, cytosine 28 was positioned erroneously (Ronen Marmorstein, personal communication). Two impossible hbonds are drawn, disregarding the actual geometry. One hbond is drawn from G11.N2 to C28.O2 across one intervening bond and one intervening atom, a distance of 7 Angstroms! (Hbond donor-acceptor distances are generally less than 3.5 Angstroms.)
The full specifications for CONECT and all other records in the PDB format are available at the PDB. (While the mmCIF format was recently adopted as the new international standard for atomic coordinate files, neither RasMol 2.6 nor Chime 2.0 can read mmCIF. RasMol 2.7 can read mmCIF but is not documented here.) According to the PDB format, CONECT records are not required for the 20 standard amino acids nor for the 5 standard nucleotides. (Alas, the PDB residue naming standard makes no distinction between the ribo- and deoxyribo- forms of nucleotides.) CONECT records are required only for hetero atoms. Since this means that the number of bonds specified by CONECT records in PDB files obtained from the Protein Data Bank is always far less than the number of atoms for proteins and/or nucleic acids, RasMol and Chime always ignore the CONECT records unless instructed to do otherwise with a set connect false or set connect save command prior to loading.
The record identifier is spelled "CONECT" (rather than having two N's) because the identifier word is limited to 6 characters. Take for example:
CONECT 2 0 3 0 0The zeros are optional. Each atom serial number occupies a 5-character slot filled with leading spaces. The following record means exactly the same as the first one above:
CONECT 3 2 4 8 33
CONECT 2 3The first line means that atom serial number 2 is connected to atom 3. The second line means that atom 3 is connected to atoms 2, 4, 8, and 33. (The second line does not mean that atom 2 is connected to atom 4!) Note that the connection of atoms 2 and 3 are specified twice, once in each direction. This type of double specification does not mean a double bond. Rather it is the officially correct way to specify all single bonds. However, Almost all PDB files contain exceptions to the single-bond double-specification rule. Evidently the Protein Data Bank has never enforced this rule, and therefore no existing software whic reads published PDB files can depend on it. RasMol and Chime, in fact, ignore all CONECT-specified bonds in which the serial number of the second atom is less than the serial number of the first. Thus, in the second line above, the 3-2 specification will be ignored.
RasMol and Chime support a mechanism for specifying double or triple bonds in CONECT records which is not part of the official PDB standard. The PDB format standard makes no provision for specifying double or triple bonds.
According to the PDB format standard, only the second through fifth numbers in a CONECT record designate covalent bonds. The sixth and later numbers designate hydrogen bonds and salt bridges. However, RasMol and Chime interpret as covalent bonds up to six numbers per record, namely those in the second through seventh places. RasMol and Chime ignore any hydrogen bonds or salt bridges specified in the eighth position through the eleventh and final position.
Information in this document was gathered from the history of the RasMol EMail List (mostly from information contributed by Roger Sayle), from tests of RasMol and Chime, and from RasMol's C source code.
Wolfgang Kabsch and Christian Sander, "Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features", Biopolymers, Vol.22, pp.2577-2637, 1983.
Carey, Francis A., 1996, "Organic Chemistry, Third Edition", McGraw-Hill, New York.
Engh R A & Huber R (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst., A47, 392-400.
ProCheck (Biotech Validation Suite for Protein Structures).
Source code references
The function which assigns bonds to pairs of atoms is Testbonded() in molecule.c.
Covalent and van der Waals radii are in the structure ElemStruct in abstree.h.
For protein backbone hbonds, see CalcProteinHBonds() in molecule.c, or search for "Kabsch".