Journal of Molecular Biology
Volume 274, Issue 4, 12 December 1997, Pages 530-545
Journal home page for Journal of Molecular Biology

Regular article
Sequence profiles of immunoglobulin and immunoglobulin-like domains 1

https://doi.org/10.1006/jmbi.1997.1432Get rights and content

Abstract

Immunoglobulins (Ig) are highly modular proteins, consisting of variable and constant domains, which have clear, conserved sequence patterns. These sequence patterns have allowed T-cell receptor (TCR) and major histocompatibility complex (MHC) molecule domains, as well as some cell adhesion, cell surface receptor and muscle protein domains, to be identified as forming a superfamily of related proteins together with the Ig-domains. The domains of these proteins have been grouped into four sets: variable (V-set), constant-1 (C1-set), constant-2 (C2-set) and intermediate (I-set). X-ray and NMR studies have shown that these domains form a Greek-key β-sandwich structure with the sets differing in the number of strands in the β-sheets as well as in their sequence patterns. The conserved sequence elements in the major sets of Ig and Ig-like molecules have previously been reported as general sequence profiles. This work examines the variability within these sets. Detailed sequence profiles and consensus sequences for these sets and groups have been constructed and a novel form of presentation has been developed to overcome some of the drawbacks of current methods of presenting consensus sequences. The profiles that were constructed allow a comparison of the similarities and differences among the sets of Ig and Ig-like sequences and provide a means by which sequences can be tested for compatibility with Ig-like sequence motifs. As well, the sequence separations of the main residues in the characteristic “pin” structure of Ig-like molecules were examined for variation among the groups. From the profiles constructed here, measures of the degree of conservation within the groups of molecules were determined. These measures were used to assist in a reconsideration of possible evolutionary pathways between the major structural groups of the Ig-superfamily.

Introduction

Immunoglobulin (Ig) and Ig-like domains, because of their fundamental importance in the immune system and in intercellular interactions, have been extensively studied and reviewed. Major reviews by Williams and Barclay 1988, Hunkapiller and Hood 1989 delineated the Ig superfamily into major sequence sets and defined means to identify sequences as members of these sets. These classifications used the separation between the two “invariant” cysteine residues, which make the predominantly conserved disulfide bridge, and sequence patterns specific to each set Williams 1987, Williams and Barclay 1988, Hunkapiller and Hood 1989. More recent reviews have been able to use the more extensive structural data from X-ray and NMR studies to make more detailed evaluations of various aspects of Ig superfamily sequence/structure relationships and evolution Hsu and Steiner 1992, Jones 1993, Wagner and Wyss 1994, Bork et al 1994, Vaughn and Bjorkman 1996.

Ig superfamily domains were initially classified into variable (V) or constant (C) like sets with the C-set being divided into the C1-set (Ig-related molecules) and the C2-set (Williams & Barclay, 1988) or H-set (Hunkapiller & Hood, 1989). V-set domains occur in Ig, T-cell receptor (TCR), cell surface receptor and cell adhesion molecule proteins, C1-set domains are found in Ig, TCR and major histocompatibility complex (MHC) proteins and C2- (or H-) set domains are from non-Ig related molecules. Structural studies have shown that Ig superfamily domains are antiparallel, Greek-key, β-sandwich proteins Richardson 1981, Hutchinson and Thornton 1993 and the structures are reviewed by Bork et al. (1994). The sets differ by having varying numbers of strands in each of the β-sheets that form the sandwich. By convention, the strands have been labelled a to g in sequence (based on the C1 domain structure) with the two strands present between the c and d strands in V domains being labelled c′ andc″. One β-sheet consists of strands a, b, e and possibly d while the other contains strands c, f, g and possibly c′ and c″. In addition, the C-terminal ends of strands a and g may form a small stretch of parallel β-sheet, disrupting the original strands and giving rise to strands a′ and/or g′. Bork et al. (1994) classified the folds into v-type (for the V-set) having all strands, c-type (for the C1-set) lacking the c′ and c″ strands, s-type (for the C2-set) having the c′ strand but not the c″ or d strands and a fourth group, the h-type fold, which lacks the c″ strand.

As the body of sequence data accumulated on non-Ig molecules that were members of the Ig superfamily, doubts were raised about whether some domains were members of the C2 or V-sets. In their analysis of fasciclin III, Grenningloh et al. (1990) quote A. F. Williams as saying that the Ig-like domains of fasciclin III “could be long C2 types but may fold as a V type”. With the availability of the structure of the muscle protein telokin (Holden et al., 1992), Harpaz & Chothia (1994) re-examined the data on the C2-set and determined that some previously claimed members of this set formed a new set, which they called the I-set (intermediate). This corresponds to the h-type (hybrid) structural fold defined by Bork et al. (1994) and has sequence features of the V-set while having a much shorter distance between the cysteine residues of the conserved disulfide bridge. Later, the structures of titin (Pfuhl & Pastore, 1995), vascular cell adhesion molecule domain 1 (Jones et al., 1995) and neural cell adhesion molecule domain 1 (Thomsen et al., 1996) confirmed this analysis and descriptions of the I-set have been given by Harpaz and Chothia 1994, Thomsen et al 1996.

An examination of early X-ray structures of Ig domains by Lesk & Chothia (1982) showed that the nearly invariant cysteine residues that form the disulfide bridge (located in the b andf strands) and the nearly invariant tryptophan residue (c strand) form a structural motif. This motif, where the tryptophan packs against the disulfide bridge, was called the pin (Lesk & Chothia, 1982). Portions of six other residues are also included in the pin (Lesk & Chothia, 1982), however, here the pin will be referred to in terms of the nearly invariant cysteine and tryptophan residues that make a dominant contribution to it. Bourgois (1975) also noted the particular conservation of these Cys and Trp residues and their structural arrangement and suggested that the Trp residue might facilitate the dimerisation of the primordial half domains of his hypothesis. While the separation of the cysteine residues and the placement of the invariant tryptophan corresponding to the pin structure are a clear signature of an Ig-like domain, they are not sufficient to determine that a sequence forms an Ig-like domain (Williams, 1987). Consequently, several sequence profiles of V, C1 and C2(H)-set domains Williams and Barclay 1988, Hunkapiller and Hood 1989, Hunkapiller et al 1989, Bork et al 1994, Vaughn and Bjorkman 1996 and of I-set domains (Harpaz & Chothia, 1994) have been published to assist in identifying sequences as members of the Ig superfamily. Also, Xue & Wong (1995) have investigated the sequence separations between the Cys-Trp and Trp-Cys residues of the pin, specifying a sequence motif that is compatible with the pin structure.

Much interest has been expressed in the evolution of the vertebrate immune system (e.g. see Beck et al 1994, Hsu and Steiner 1992). Underlying the gene duplications and divergence believed to create the multi-domain molecules seen now Williams and Barclay 1988, Hunkapiller and Hood 1989, Hunkapiller et al 1989 is the idea of one of the domain sets being primordial and giving rise to the other sets Williams and Barclay 1988, Hunkapiller and Hood 1989, Hunkapiller et al 1989. Williams & Barclay (1988) favour the V-set as the ancestral domain while Hunkapiller & Hood (1989) argue for the H(C2)-set. Kabat and colleagues have collected and continue to maintain a database of Ig-related sequences (Kabat et al., 1991; http://immuno.bme.nwu.edu/) which are, for the most part, aligned. From these aligned sets of sequences it is possible to perform many analyses bearing on the divergence of Ig-related sequences, such as calculating the variability index (Wu & Kabat, 1970; see the web site above and links therein). Means to check the validity of sequences of Ig variable domains and perform searches on this database have been provided by Martin (1996).

Currently available sequence profiles of Ig and Ig-like sequences (see above) provide only a limited amount of information on the highly conserved sequence positions. If more complete data, considering the variability at each sequence position, can be presented in a clear manner then a finer examination of the components of variability within and among sequence groups can be made and sequences can be tested for group membership with greater reliability. Accordingly, this work searches through the Kabat database (Kabat et al., 1991) to create profiles and consensus sequences for the major groups of sequences (Ig, TCR and MHC; the Ig-related sequences) within each of the previously identified domain sets. This provides a much greater level of detail on the conservation and variability within a domain set and allows the groups within each set to be compared. A collection of cell adhesion and cell surface receptor molecules (non-Ig-related sequences) previously identified as containing Ig-like domains Barclay et al 1993, Pigott and Power 1993 was used to create profiles for the non-Ig-related V, C2 and I-sets. As information from sequence profiles and consensus sequences is difficult to present, a method to visualise the degree of conservation in sequence profiles was developed. It is demonstrated how this technique can assist in defining and examining the variability between groups of related molecules. A more detailed consideration of the separation of the two cysteine residues that form the conserved disulfide bridge, and how this delineates the sets, is presented. This is done by considering, for each of the groups of sequences, the separations among the positions of the residues that correspond to the three dominant residues of the pin structure described by Lesk & Chothia (1982). These three residues are Cys-Trp-Cys in sequential order and are referred to here as positions 1, 2 and 3 of the pin, respectively. Occasionally other residues are found in the pin positions and sequences where this occurs are included in the analysis. As the measures of variability commonly used for identifying hypervariable regions in Ig-related sequences Wu and Kabat 1970, Jores et al 1990 are not suitable for examining levels of conservation, measures based on information theory (e.g. see Schneider & Stephens, 1990) have been used to examine conservation within the sequence sets. The data collected here are used to re-examine the potential evolutionary pathways between the sets of Ig-related domains.

Section snippets

Results

Sequences of V-like domains and I or C2-set domains for non-Ig related molecules were taken from the sources mentioned in Materials and Methods. After a multiple alignment Hogeweg and Hesper 1984, Smith 1986, the tree generated by the alignment process was examined and clusters of sequences that formed around a sequence of known structure were selected and manually realigned to take account of the structural data and any errors in the automatic alignment. The clusters were then merged. Due to

Profiles

Most of the previously published profiles (or consensus sequences) for Ig and Ig-related domains that are available consist of a one-line description of the major sets into which the domains are classified. The profiles specify between 11 and 32 of the approximately 110 sequence positions and include some regions of potential sequence length variations Hunkapiller and Hood 1989, Bork et al 1994, Harpaz and Chothia 1994, Vaughn and Bjorkman 1996. An elaborated approach was used by Harris &

Sequences and alignments

Sequences were taken from the Kabat Data Base (October, 1996) of sequences of immunological interest (http://immuno.bme.nwu.edu/) and from the SWISS-PROT Data Base (Bairoch & Apweiler, 1997) for all the molecules listed by Barclay et al 1993, Pigott and Power 1993 as containing Ig-like domains. Where sequences were taken from the SWISS-PROT Data Base, all available species were used and if a sequence contained multiple Ig-like domains all of these were included. However, identical sequences and

Acknowledgements

We thank Professor J. T.-Z. Wong for discussions and a critical reading of the manuscript. This work was supported in part by a grant to H. X. from the Hong Kong Department of Industry.

References (48)

  • W.S Reznikoff et al.

    E. coli promoters

  • B Rost et al.

    Prediction of protein secondary structure at better than 70% accuracy

    J. Mol. Biol.

    (1993)
  • M.A Seeger et al.

    Characterization of amalgama member of the immunoglobulin superfamily from Drosophila

    Cell

    (1988)
  • L.E Stanfel

    A new approach to clustering the amino acids

    J. Theor. Biol.

    (1996)
  • W.R Taylor

    Identification of protein sequence homology by consensus template alignment

    J. Mol. Biol.

    (1986)
  • D.E Vaughn et al.

    The (Greek) key to structures of neural adhesion molecules

    Neuron

    (1996)
  • G Wagner et al.

    Cell surface adhesion receptors

    Curr. Opin. Struct. Biol.

    (1994)
  • A.F Williams

    A year in the life of the immunoglobulin superfamily

    Immunol. Today

    (1987)
  • H Xue et al.

    Interferon induction of human tryptophanyl-tRNA synthetase safeguards the synthesis of tryptophan-rich immune-system proteinsa hypothesis

    Gene

    (1995)
  • A Bairoch et al.

    The SWISS-PROT protein sequence data bank and its supplement TrEMBL

    Nucl. Acids Res.

    (1997)
  • A Bairoch et al.

    The PROSITE database, its status in 1997

    Nucl. Acids Res.

    (1997)
  • A.N Barclay et al.

    The Leucocyte Antigen Facts Book

    (1993)
  • G Beck et al.

    Editors of Primordial immunityfoundations for the vertebrate immune system

    Ann. N. Y. Acad. Sci.

    (1994)
  • G Grenningloh et al.

    Molecular genetics of neuronal recognition in Drosophilaevolution and function of immunoglobulin superfamily cell adhesion molecules

    Cold Spring Harbor Symp. Quant. Biol.

    (1990)
  • Cited by (90)

    • Discovery, classification, evolution and diversity of Siglecs

      2023, Molecular Aspects of Medicine
      Citation Excerpt :

      Alternative splicing may also yield protein isoforms with a different number of Ig-like domains (Kitzig et al., 2002; Wang and Neumann, 2010) or with a different length of cytoplasmic tail (Aizawa et al., 2002; Lai et al., 1987). Extracellular domains, in particular the amino-terminal Ig-like domain belonging to so-called V-set Ig-like domain (Smith and Xue, 1997), recognize glycans containing Sia. A conserved Arg residue in this domain is essential for Sia recognition (Supplementary Fig. 1).

    • Sequence defined antibodies improve the detection of cadherin 2 (N-cadherin) during zebrafish development

      2018, New Biotechnology
      Citation Excerpt :

      Zebrafish Cdh2-ECD is composed of 5 Ig-like domains in series. This architecture is very common [67–69] and can be found in 2503 different proteins in zebrafish (almost 10% of all the zebrafish protein coding genes according to the zebrafish genome sequencing project [70]) as shown by the InterPro domains classification (ID: IPR007110) [71]. Protein structure classification of the cadherin superfamily (CATH Superfamily 2.60.40.60) indicates the low sequence/structure diversity of the cadherins compared to other protein families.

    • The ancient immunoglobulin domains of peroxidasin are required to form sulfilimine cross-links in collagen IV

      2015, Journal of Biological Chemistry
      Citation Excerpt :

      Bacterial Ig domains are often in enzymatically active proteins, whereas animal Ig domains are typically found as cell surface adhesion receptors primarily involved in immune and nervous system development and function (20, 21). Ig domains are categorized into 4 sets based on structural similarity to the analogous antibody domains: constant (C1 and C2 sets), variable (V set), and intermediate (I-set) (21). The Ig domains of peroxidasin are classified within the I-set, consistent with the hypothesis that the I-set is the primordial Ig domain of the animal kingdom (21, 22).

    View all citing articles on Scopus
    1

    Edited by I. A. Wilson

    View full text