What percentage of the human proteome has known structure?
  1. ~40,000 genes in the human genome.

  2. "Known"? Drug companies solve a large number of structures but most are not deposited in the Protein Data Bank.

  3. ~45,000 entries in the Protein Data Bank:
    1. ~7,000 sequence-distinct entries of good quality.
    2. ~1,500 of these are human.
    3. These entries are mostly single domains or fragments of proteins.
Answer (empirical): ~2%
Answer (homology modeling): ~40% of domains, so
~20% of whole proteins?

Solution: Structural Genomics?

This estimate does not take into account redundancy among the ~40,000 human genes. If you know how to estimate that redundancy, please tell me (emartz@microbio.umass.edu).


by Eric Martz, University of Massachusetts, July 2003 (revised February 2004, September 2005, October 2006, April 2007)


Similar rough estimates were stated by Kevin Karplus in October, 2006.