Fold: A group of superfamilies that share similar secondary structural
features and toplogies, but for which there is little or no evidence
to suggest a common evolutionary origin.
Superfamily: a group of families with common structural features
or functions that imply a common evolutionary origin.
Family: A group of sequences with a clear common evolutionary origin.
John-Marc Chandonia and Steven E. Brenner
(2006).
Science311:347-351.
The first structure solved in a protein family provides a
template for comparative modeling of other members, and may
suggest function. In 2004, about half of such new structures came
from structural genomics (SG) centers (despite the fact that
they contribute only 20% of new structures), and the other half from
traditional crystallography or NMR laboratories.
In the year ending May 2004, 16% of new structures from SG (PSI) centers represented new families,
while only 4% did from traditional labs.
The average cost of a structure solution from a traditional lab is very roughly
$250,000, while that from an SG (PSI) lab is $138,000.
The cost of a novel family structure is several million dollars by traditional
methods, but several fold less from PSI centers.
The
cost of solving a new structure at the most efficient SG center in the US
is one quarter that of a solution by traditional methods. However,
more than 50% of the SG solutions are members of previously solved families
(at the level >30% sequence identity), so the cost of a solution for a new
family is higher.
"... The efficiency of the top sructural biology labs -- even
though they work on very challenging structures, is comparable to
that of SG centers. Moreover, traditional structural biology
papers are cited significantly more often, suggesting greater
impact. ... Publication is a bottleneck not easily adapted to high-throughput
methods."
As of February 1, 2005, 36% of the 7,677 Pfam protein families contain
a member with known structure.
"Without a formal definition of its goals, structural genomics was adopted
as a broad research goal by a loosely structured coalition of researchers
around the world."
Although the primary goal for structural genomics is comparative modeling
to provide structural coverage of all known proteins, "comparative modeling
may not be able to predict small details and the error margin
is still too large for applications such as drug design."
The structural genomics strategy "leaves out several groups of proteins,
such as the membrane proteins."
Estimated coverage of the Thermotoga maritima genome with templates
for comparative modeling varies from 26% (by sequence similarity) to
70% (by advanced profile-profile methods). Coverage is much lower for
eukaryotic genomes.
Further, 10-15% of genes have no detectable homologs (called "ORFans").
Initial optimism has been tempered by the slow rate of solving new folds --
tens of thousands (an accurate estimate is not available) are needed
for complete coverage of the protein universe. But 90% of the 600-model
output of structural genomics to date is homologous to previously solved
structures. The percentage of novel folds solved by structural genomics
groups is only slightly highter than among solutions by traditional
structural biology groups.
"... the original goal of structural genomics, to cover the entire protein
space with accurate models, appears to be moving farther and farther away.
At the same time, for some model systems, such as selected bacterial
genomes, complete structural coverage is getting excitingly close."