The following PDB files and job conditions
were tested by Eric Martz in ConSurf 3 before it was released.
ConSurf job parameters not
specified below (in tan color)
remained at the defaults:
- Bayesian = Method
- ConSurf generates MSA and tree
- Swiss-Prot provides sequence homologs
- 50 sequence homologs used
- 1 PSI-BLAST interation
- 0.001 = E cut off
- JTT = Model of substitution for proteins
The term gap is used below to mean one of two things:
- A physical gap in the model (a break in the continuity of the main
chain), due to amino acids present in SEQRES
but lacking coordinates (typically due to disorder).
- A gap in the numbering of residues, where there are missing
numbers between the sequence numbers of two adjacent amino acids,
that are connected by a peptide bond.
- 2VAA chain A,
150 sequence homologs
(274 AA, no gaps).
No insufficient data with 150 sequences.
This result is in the ConSurf Gallery.
It is a satisfying result with known functions for the highly variable groove
region, and the highly conserved anchor pocket, CD8 binding site, and
inter-chain contact region. Other highly conserved surface patches have
no identified function known to me.
- 2VAA chain A (274 AA, no gaps, 16 insufficient data with default
50 homologs). Provides an
example having insufficient data without sequence complications such
as gaps or insertions.
- 1D66 chain A (66 AA, 5 insufficient data). Tests handling of
leading (7) and trailing (2) physical gaps. No embedded gaps.
Tests handling of two identical chains.
- 2ACE chain "none" (537 AA, 11 insufficient data). Tests handling of a single unnamed
chain. Tests handling of a leading (3) gap, a single embedded physical
gap (5, at position 485), and a trailing gap (2).
- 1FDL chain H (281 AA), no gaps. The entire VH domain
has insufficient data.
- 1QKZ chain L (214 AA including 5 inserted, 119 insufficient data including
one entire domain). Tests handling
of sequence insertions, and embedded gaps. One gap is a numbering gap;
the other is a physical gap.
- 1CBN, single chain with no name (46 AA; 37 insufficient data). Only 3 homologs in Swiss-Prot (too few to
for ConSurf);
must use UniProt.
It finds 7 unique homologs; ConSurf
warns that at least 10 are recommended.
Tests handling of sequence microheterogeneity.
[Possible microheterogeneity is detected when
sum of consurf_grade_freqs_isd (46)
!= number of groups w/ coordinates (48).]
- 1TAB chain E (223 AA). There are 6 numbering gaps totalling 25
residues, plus one leading physical gap. Three insertions.
- 1FLO chain A (405 AA, only 9 sequence homologs found in
Uniprot;
34% insufficient
data). Four identical chains.
The first residue in ATOM records is
given sequence number 2, yet it matches the first residue in SEQRES.
Thus, SEQRES does not begin at sequence number 1.
- 1IGY chain B (435 AA): (Some entire domains have insufficient data.
For discussion, see 1IGY in the
Examples.)
First residue in SEQRES is sequence number 2.
Two insertions at 52 and 82. There are
identical amino acids in the inserted block at 82, which may have different
conservation grades.
The insertion at block 82 is LSSL (two pairs of
identical amino acids), and the members of each pair may have different conservation
grades (depends on details of the job).
This is a case where ConSurf 3 fails to color the second members of
these pairs correctly in the molecular view.
See the next item below for more on this bug.
- 1HAG chain E (295 AA), 1C1W chain L (36 AA):
Numerous insertions, including insertion at position 1. Some insertions
have multiple copies of the same amino acid in the same inserted
block, with different conservation grades. Some of these residues
are colored incorrectly in the 3D view, but they are colored
correctly in the ConSurf Seq3D listing. (For example, in 1HAG:E, Glu14H
should be yellow.) Others are colored correctly (for example
Asp14 and Asp14L in 1HAG:E). This bug is on the list to be fixed.
Tests that did not add important information beyond the tests listed above.
- 1UCY chain K (259 AA; 10 insertions). Chains H and K start with residue
numbered 16, which is the first residue in SEQRES.
Known Limitations
Rare PDB files are not handled correctly by
Protein Explorer, mostly due to sequence format irregularities.
PE's Clickable ConSurf-Colored Sequence 3D will not work correctly
for most of these files. For details, please see
PDB Files Handled Poorly or Incorrectly
by Protein Explorer.
Feedback to Eric Martz.