Sequence Irregularities

Incomplete sequences
Gaps in sequences
Multiple residues assigned one sequence number

Sequence irregularities handled poorly or incorrectly.

Incomplete sequences.

The sequences displayed are for residues with 3D positions assigned (atomic coordinates given in the PDB file). Often these sequences represent only a portion of the molecule. They are not necessarily identical to the sequences listed in the SEQRES records, visible in the PDB file header. Nonstandard residues may be shown as gaps even when their positions are assigned (see below).

Gaps in sequences.

Gaps in sequences are represented by periods, for example:
Gaps may represent any of three different things, which can be distinguished by examining the 3D structure: The three kinds of gaps are:
  1. Disorder leading to inadequate resolution in the electron density map may preclude the assignment of positions of some residues known to be present. If large enough, the gap will be apparent in the backbone trace. (This kind of gap occurs only in X-ray diffraction results; NMR studies typically model all residues.)  
    The ends of chains may be disordered in crystals, so it is common for the leading or trailing residues in a chain to be missing in the 3D structure. When a chain is numbered according to the complete chain sequence, and the leading residues are missing, Protein Explorer will show a leading gap as a series of dots (.....). Examples may be seen in chains 2 and 4 of 1FOD. Protein Explorer's sequence display will not show trailing gaps, due to limitations in the sequence reporting mechanism of Chime. To determine whether the trailing end was disordered and unresolved, compare Protein Explorer's Sequences listing with the SEQRES records in the PDB file header (available in the Features of the Molecule control panel, at the link "See the entire header of this PDB file").

  2. Numbering the sequence to align with a reference sequence may require gaps. In such a case, the backbone trace will not have a physical gap.

  3. Nonstandard residues produce apparent gaps in Protein Explorer's sequence display. Residues other than the 20 standard amino acids are supposed to be designated as "hetero" atoms. Chime does not report heteroatoms in its "show sequence" report, which is what Protein Explorer depends upon to construct its sequence display. Therefore, Protein Explorer's sequence display will show gaps where hetero residues occur.

    A single hetero residue will generally not make a gap in the backbone trace, but its alpha carbon position will not occupy a bend in the trace trace since backbone positions are assigned only to non-hetero residues.

Contents   Close

Multiple residues assigned one sequence number.

Two or more residues may be assigned the same sequence number. Such positions are indicated by a green asterisk (). This can represent either of two different things.
  1. If the numbering of the sequence is according to alignment with a reference sequence, and if certain residues do not occur in the reference sequence, these may be treated as "inserted" residues and given the same sequence number. An example is chain B in 1IGT, which contains three such blocks.

  2. In some cases the protein may have been heterogeneous, having alternative residues at one or more positions. Since the alternatives occur at a single position, they are given the same sequence number.

To distinguish which of the above is the case, read the PDB file header or the journal article that describes the structure.
Contents   Close