Hydrogens in PDB files.

X-ray crystallography cannot resolve hydrogen atoms in most protein crystals, so in most PDB files, hydrogen atoms are absent. Sometimes hydrogens are added by modeling. Hydrogens are always present in PDB files resulting from NMR analysis, and usually present in theoretical models. For a brief introduction to X-ray crystallography, resolution, and NMR, see Nature of 3D Structural Data.

In proteins, the average number of hydrogens per non-hydrogen atom, weighted to take into account the frequencies of amino acids, is 1.01. Thus, hydrogens are ~50% of all atoms in protein. Nucleic acids have fewer, ~35%. High resolution protein crystallography (1.2 Ångstroms or less) can assign some hydrogen positions from the electron density map. Thus, the X-ray model of a tyrosine kinase SH2 domain 1lkk at 1.0 Angstrom resolution contains 902 hydrogens and 923 non-hydrogen protein atoms (ratio 0.98, 49%), so approximately all of the hydrogens actually present are assigned positions.

NMR methods also determine some hydrogen positions. Typically all hydrogens are modeled in before the molecule is folded to fit the NMR interatomic distance restraints; hence, all hydrogens are usually present in NMR models submitted to the PDB. The calmodulin ensemble of 25 NMR models 1cfc contains 1096 protein hydrogens and 1166 non-hydrogen protein atoms per model (ratio 0.94, 48.5%), thereby assigning positions for approximately all of the hydrogens actually present.

Most macromolecular crystals do not provide enough resolution to detect hydrogen positions. The X-ray model in PDB file 1hho for oxyhemoglobin (2.1 A resolution) contains no hydrogens, while the X-ray file 1lfa (1.8 A resolution; an integrin adhesion protein domain) contains 312 waters each with 2 hydrogens, and 645 protein hydrogens for 2,941 non-hydrogen protein atoms, accounting for only 22% of the hydrogens actually present in this protein. The protein hydrogens consist of one hydrogen on each backbone nitrogen (three hydrogens/amino terminal nitrogen), and hydrogens on sidechain oxygens or nitrogens in ser,thr,tyr, lys,arg,his, asn,gln. (None of the hydrogens covalently bonded to carbons are present.) The hydrogens which are present are required for the molecular dynamics stages of refinement of the X-ray model in the popular crystallographic refinement program X-PLOR; some authors strip them out before submitting a PDB file and others leave them in. (The Protein Data Bank accepts X-ray models either way, according to the preference of the depositor.)

Adding Hydrogens

If you wish to add hydrogens to a PDB file, see methods listed at hydrogen.


References & Acknowledgements

Average protein hydrogens per non-hydrogen protein atom, weighted by average frequencies of amino acids, are based on 1,021 unrelated proteins of known sequence. Weights are tabulated on page 5 in Thomas E. Creighton's book "Proteins, Structures and Molecular Properties", 2nd ed. 1993, W. H. Freeman and Co.

Thanks to John Badger for contributing important information included in this document.