What percentage of the human proteome has known structure?
  1. ~40,000 genes in the human genome.

  2. "Known"? Drug companies solve a large number of structures but most are not deposited in the Protein Data Bank.

  3. ~81,000 entries in the Protein Data Bank:
    1. ~27,000 entries with less than 50% sequence identity.
    2. ~7,000 of these are human.
    3. These entries are mostly single domains or fragments of proteins. (Divide by 2?)
Answer (empirical): ~20%/2? = ~10%
Answer (homology modeling): ~40% of domains

Solution: Structural Genomics?

This estimate does not take into account redundancy among the ~40,000 human genes. If you know how to estimate that redundancy, please tell me (emartz@microbio.umass.edu).


by Eric Martz, University of Massachusetts, July 2003 (revised February 2004, September 2005, October 2006, April 2007, May 2012)


Rough estimates in agreement with the 2006 numbers were stated by Kevin Karplus in October, 2006.