The Protein Data Bank (PDB) Format for Atomic Coordinate Files

The Protein Data Bank was established in 1971 and was maintained at the Brookhaven National Laboratory, Long Island, New York USA until 1999. (The Brookhaven web site is now closed.) In 1999, operation of the Protein Data Bank was assumed by the Research Collaboratory for Structural Bioinformatics. Below is a short overview of the components of PDB files used by RasMol. A comprehensive technical PDB specification is available from BNL. A graphic display of how the atoms are numbered in amino acids and nucleotides is available from U California San Francisco, PDB Atom Definitions.

PDB files are plain text (ASCII) files made up of one-line, 80-character 'records'. (Originally each record was an IBM punch card.) RasMol takes a major interest in only two of the record types, ATOM and HETATM, although it also uses several other types if present. HETATM records specify positions of hetero atoms, which are atoms not in the primary molecule. An example is an atom in a water molecule within a protein crystal.

Here are the first two PDB file lines for a protein molecule. The first line is for the N-terminal nitrogen; the second, for the alpha carbon, both in an asparagine residue. The column header line beginning RTyp provided here is not part of the PDB file. (Some spaces have been omitted so the spacing is not the exact count required by the PDB format).

RTyp  Num  Atm Res Ch  ResN X       Y       Z      Occ  Temp   PDB   Line
ATOM    1  N   ASP L   1    4.060   7.307   5.186  1.00 51.58  1FDL  93
ATOM    2  CA  ASP L   1    4.042   7.776   6.553  1.00 48.05  1FDL  94

RTyp: Record Type
Num: Serial number of the atom.  Each atom has a unique serial number.
Atm: Atom name (IUPAC format).
Res: Residue name (IUPAC format).
Ch: Chain to which the atom belongs (in this case, L for light chain
    of an antibody).
ResN: Residue sequence number.
X, Y, Z: Cartesian coordinates specifying atomic position in space.
Occ: Occupancy factor
Temp: Temperature factor (atoms disordered in the crystal have high
    temperature factors).
PDB: The PDB data file unique identifier.
Line: Line (record) number in the data file.

Here are the first two complete amino acids in this protein. Hydrogens are omitted as their positions cannot be resolved by X-ray diffraction.

ATOM      1  N   ASP L   1       4.060   7.307   5.186  1.00 51.58      1FDL  93
ATOM      2  CA  ASP L   1       4.042   7.776   6.553  1.00 48.05      1FDL  94
ATOM      3  C   ASP L   1       2.668   8.426   6.644  1.00 49.84      1FDL  95
ATOM      4  O   ASP L   1       1.987   8.438   5.606  1.00 50.83      1FDL  96
ATOM      5  CB  ASP L   1       5.090   8.827   6.797  1.00 50.57      1FDL  97
ATOM      6  CG  ASP L   1       6.338   8.761   5.929  1.00 54.09      1FDL  98
ATOM      7  OD1 ASP L   1       6.576   9.758   5.241  1.00 56.90      1FDL  99
ATOM      8  OD2 ASP L   1       7.065   7.759   5.948  1.00 51.06      1FDL 100
ATOM      9  N   ILE L   2       2.249   8.961   7.803  1.00 45.48      1FDL 101
ATOM     10  CA  ILE L   2       0.920   9.547   7.949  1.00 38.04      1FDL 102
ATOM     11  C   ILE L   2       0.950  11.039   7.634  1.00 39.85      1FDL 103
ATOM     12  O   ILE L   2       1.800  11.770   8.153  1.00 39.76      1FDL 104
ATOM     13  CB  ILE L   2       0.438   9.271   9.402  1.00 37.16      1FDL 105
ATOM     14  CG1 ILE L   2       0.290   7.766   9.577  1.00 31.41      1FDL 106
ATOM     15  CG2 ILE L   2      -0.884   9.974   9.690  1.00 34.89      1FDL 107
ATOM     16  CD1 ILE L   2       0.141   7.273  11.009  1.00 32.83      1FDL 108

PDB files also contain many other kinds of information. Here are sample excerpts from 1FDL.PDB containing many of the different types of records (but this one happens to lack HETATM records).

Further details on PDB files are in the RasMol Reference Manual


This page is maintained by emartz@microbio.umass.edu