Introduction

Nasopharyngeal carcinoma (NPC) is a malignancy of the head and neck regions developed from epithelial cells that cover the surface and line the nasopharynx.1 Its incidence is low in most parts of the world, but it is one of the most common cancers in some geographical regions, including Southeast Asian countries, with a high incidence of 15–50 per 100 000. Intermediate incidence rates ranging from 15 to 20 per 100 000 persons have also been reported in Alaskan Eskimos and in the Mediterranean basin including Southern Italy, Greece and Turkey.2 In Peninsular Malaysia, NPC ranked the second among men and the twelfth among women in terms of cancer incidence, in the year 2003. The Chinese had the highest incidence compared with that in other ethnic groups, namely, Malays and Indians.3 The age-standardized incidence in Chinese men and women was 17.0 and 6.6 per 100 000 individuals, respectively, in 2003–2005 in Peninsular Malaysia.4 A high risk of NPC has also been observed among indigenous groups, particularly in Bidayuh in Sarawak in East Malaysia.5

Similar to most types of cancer, NPC is a complex disease, and its etiology is also considered to be caused by interactions of multiple factors including an Epstein–Barr virus (EBV) chronic infection, environmental factors and genetic susceptibility. Latent EBV infection is observed in nearly all NPC tumor cells, and EBV DNA is detected in all undifferentiated NPC cells, suggesting that EBV has a critical role in the pathogenesis of NPC.2, 6 Significant environmental factors, including exposure to various carcinogens such as nitrosamine and polycyclic hydrocarbon, and dietary factors, including a high consumption of salted preserved foods such as salted fish and salted eggs, have been suggested to increase the risk of NPC in Malaysian Chinese.7

NPC shows remarkable differences in ethnic and geographical distributions. Strong evidence from familial aggregation of NPC, migrant studies and case–control association studies of specific markers in certain ethnic groups suggests that genetic factors contribute significantly to NPC development.8, 9, 10 Most genetic linkage analyses conducted among Chinese show that HLA alleles, A2, B14, B46, AW19 and BW17, are associated with an increased risk of NPC.11, 12 Deletion of chromosome 3p was the most common genetic alteration reported in NPC.13, 14 A recent study has linked a susceptibility locus to a 13.6-cM region on chromosome 3p21.31–21.2 to familial NPC. Investigation on this region has identified many candidate tumor suppressor genes, for example, DLC1, RASSF1A, CACNA2D2 and FUS115. In addition, Feng et al.16 have also provided evidence of susceptibility loci for NPC on chromosome 4p15.1–q12 from a whole-genome scan in families at high risk of NPC from Guangdong Province, China.

Many recent studies have attempted to search for NPC susceptibility genes through a case–control candidate-gene approach. The possible association of genetic polymorphisms in enzymes that are involved in carcinogen metabolism, such as GSTM1 (glutathione S-transferase M1), CYP2E1 (cytochrome P450 2E1) and CYP2A6 (cytochrome P450 2A6), with NPC in Taiwanese and Thais have been shown.17, 18 GSTM1 is a phase II enzyme known to detoxify several carcinogens, including those found in tobacco smoke, and homozygous deletions of GSTM1 are associated with an increased risk of NPC.19 The CYP2E1 and CYP2A6 enzymes are known to activate nitrosamines and related carcinogens, and are possibly involved in the development of this disease. Certain variants of the CYP2E1 and CYP2A6 genes are thought to be more highly expressed than others and thus induce higher levels of cellular damage.

Association with NPC development has also been reported for genetic variations in DNA repair enzymes, XRCC1 (X-ray repair cross complementing group 1)20 and hOGG1 (8-oxoguanine glycosylase 1).21 In addition to them, single-nucleotide polymorphisms (SNPs) in some other genes, namely, PLUNC (palate, lung and nasal epithelial clone),22 MMPs (matrix metalloproteinases),23 CCND1 (Cyclin D1),24 TLR 4 and TLR 10 (toll-like receptor 4, 10),25, 26 may also influence susceptibility to NPC. Although a number of genetic variants have been reported to have associations with NPC susceptibility, many of them could not be replicated in subsequent studies in other populations. The genetic factors involved in determination of an individual's susceptibility to NPC are still puzzling, and we still do not understand the interactions between NPC susceptibility genes and other risk factors.

SNPs are the most abundant DNA sequence variations. The construction of a large body of SNP information through the International HapMap project27, 28 and rapid technological advances enabled us to perform genome-wide association studies (GWAS) routinely for identifying the genetic determinants for many complex diseases. In this study, we aimed to identify multiple and novel susceptibility loci of NPC through population-based case–control GWAS, using high-throughput SNP genotyping technologies.

Materials and methods

Participants

All the individuals who participated in this SNP association study were unrelated Chinese from different states of Peninsular Malaysia. For an initial screening, a total of 111 NPC patients and 260 healthy volunteers (Panel A) were recruited from the University Malaya Medical Centre (UMMC) and from the NCI Cancer Hospital. For a replication study, a second set of samples, consisting of 168 NPC patients and 252 control individuals, were later recruited from UMMC, the NCI Cancer Hospital and the Tung Shin Hospital. Of the 279 cases, 68.5% were male and their ages ranged from 14 to 79 years. Of the 512 controls, 70.5% were male, and their ages ranged from 18 to 60 years. All participants gave their written informed consent. The study was approved by both the ethical committees of the Yokohama Institute, The Institutes of Physical and Chemical Research (RIKEN), Yokohama, Japan and UMMC.

Genotyping of SNPs

Genomic DNA was extracted from peripheral blood leukocytes using a conventional method. A genome-wide analysis on Panel A samples was conducted using Illumina HumanHap550v3 Genotyping BeadChip, according to the manufacturer's protocols (San Diego, CA, USA). To validate the Illumina BeadChip genotyping results, we performed genotyping using multiplex-PCR-based Invader assays (Third Wave Technologies, Madison, WI, USA)29 or by direct sequencing of PCR products using a 96-capillary 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA) for the top 200 SNPs showing the smallest P-values, and compared the data obtained by the two platforms. SNPs that passed through the validation process were further evaluated using the second set of samples. The call rates for the landmark SNP in the GW scan, rs2212020 of ITGA9, were 0.99 and 1.00 in cases and controls, respectively. The P-values for the Hardy–Weinberg Equilibrium (HWE) test in cases and controls (Panel A) were 0.761 and 0.223, respectively.

To further analyze SNPs within the 40-kb linkage disequilibrium (LD) region including the landmark SNPs (rs2212020), 19 tSNPs (squared correlation coefficient between the two SNPs (r2)>0.8, minor allele frequency (MAF) of >0.05) were selected from the International HapMap project database (http://www.hapmap.org/index.html.en) and genotyped on 279 cases and 512 controls by multiplex-PCR-based invader or TaqMan assay (Applied Biosystems). The LD plot was annotated from the International HapMap Project database.

Statistical analyses

We carried out statistical analyses for association and HWE. For association, the allele and genotype distributions in cases and controls were compared and evaluated in allelic, dominant- and recessive-inheritance models by two-tailed Fisher's exact test. Statistical analyses were carried out for the data obtained from both Panel A and a combined set of samples. SNPs were sorted according to the lowest P-value in a combined set of samples in one of these models.

Results

Case–control GWA genotyping

We conducted a case–control whole-genome association study to identify genes susceptible to NPC. We first genotyped 111 NPC and 260 control individuals (Panel A) by means of a genome-wide SNP analysis. More than 500 000 SNPs derived from the International HapMap project were examined. The overall call rates of all individuals were 0.98 or higher. Of the 554 496 SNPs genotyped, 21 448 SNPs had no genotyping data and were removed from the study and 533 048 SNPs on autosomal chromosomes were further analyzed. We first generated a quantile–quantile (Q–Q) plot to inspect possible population stratification effects by comparing the distribution of the observed P-values with expected distribution under the null hypothesis of no population stratification. The Q–Q P-value plot showed no evidence of population stratification of observed statistics versus expected statistics (Figure 1). The observed P-values matched the expected P-values under the null distribution over the range of 1<−log10 (P)<5. There was a departure of distribution at the extreme tail with low P-values at −log10 (P) of >5, suggesting that the associations of these SNPs with NPC are likely to be true rather than the population stratification.

Figure 1
figure 1

Log quantile–quantile (Q–Q) P-value plot showing the distribution of observed statistics by allelic test for all utilized 533 048 single-nucleotide polymorphisms from a genome-wide association study of 111 NPC patients and 260 controls of a Malaysian Chinese population (Panel A). The diagonal line shows the values expected under the null hypothesis.

Validated genotyping

To validate the genotyping results of the Illumina assay, we re-genotyped all individuals in Panel A for the top 200 SNPs showing the smallest P-values in the initial GWA study by multiplex-PCR based Invader assay or direct sequencing. We compared the genotype frequencies and allele frequencies for each of the 200 SNPs obtained from the two assays (Illumina vs Invader assay or direct sequencing). When the concordant rates of genotyping calls in these two assays were less than 98%, we did not perform an additional analysis for such SNPs. Therefore, the possibility of less accurate genotyping data affecting the GWAS analysis in this study was extremely low.

Replicated genotyping

After confirmation of the GWAS data, we further evaluated the top 200 SNPs showing the smallest P-values in an independent set of samples consisting of 168 NPC patients and 252 controls by multiplex PCR-based Invader assay or direct sequencing for their association with NPC. The association of NPC and each SNP was analyzed under the three genetic models (allelic, recessive- and dominant-inherited models) using the combined set of cases (n=279) and controls (n=512). Statistical analysis (Fisher's exact test) of the top 200 SNPs in the combined set identified 10 SNPs with P-values ranging from 10−5 to 10−7, showing possible associations with NPC. Of these, one SNP, rs2212020 (P=8.27 × 10−7), was associated with NPC with an odds ratio (OR) of 2.24 (95% confidence interval (CI), 1.59–3.15) (Table 1). The rs2212020 is an intronic SNP in the ITGA9 (Integrin-α 9) gene.

Table 1 Relationship between SNPs in ITGA9 and NPC susceptibility in Malaysian Chinese

SNPs on ITGA9 gene associated with NPC susceptibility

The ITGA9 gene contains 28 exons and spans approximately 367.5 kb. LD blocks were plotted using the data of Han Chinese in the International HapMap Project database (r2>0.8) (Figure 2), and we found that the LD block including the landmark SNP is located within one gene, ITGA9. To further define a genomic region of interest, we analyzed 58 SNPs that cover an approximately 0.25-Mb genomic region, including a part of ITGA9 and C3orf35, but found no SNP to be significantly associated with NPC (Figure 2).

Figure 2
figure 2

Case–control association plots, linkage disequilibrium (LD) map and the genomic structure of an ITGA9 region on chromosome 3p21. Closed and open diamond represent –log10 (P-value) obtained from the genome-wide association study (Panel A) and fine mapping (combined samples), respectively. Pairwise LD (r2) was based on the genotype data of Han Chinese in the International HapMap Project database.

The landmark SNP is located in intron 3 of the ITGA9 gene and is included in a 40-kb LD block that covers 2.4 kb of their promoter region and up to intron 4 (Figure 2). From the Han Chinese genotype data in the International HapMap Project database, 19 tSNPs with an MAF of >0.05 were identified within this 40-kb LD block. We subsequently genotyped these 19 tSNPs for 279 cases and 512 controls and found two SNPs, rs197721 and rs149816 in intron 1, revealing some level of associations (P<10−5) (Table 1). We then genotyped five SNPs completely linked with rs197721 and rs149816 (r2>0.8), and confirmed their strong association with NPC (P<10−6) (Table 1). Among the eight associated SNPs identified, rs189897 revealed the most significant association with NPC (P=6.85 × 10−8, OR=3.18, 95% CI=1.94–5.21).

Discussion

We report here the association of SNPs in ITGA9 with the susceptibility to NPC in Malaysian Chinese through a GWA analysis. Although no study has so far addressed the functional consequences of the SNPs we identified, one particular SNP or some haplotype may function as an enhancer and cause considerable difference in the expression and activities of ITGA9, which in turn determine individual susceptibility to NPC. We also cannot exclude the possibility that other unidentified candidate functional polymorphisms are present in ITGA9.

ITGA9 is also known as ITGA4L (integrin-α 4-like), ALPHA-RLC (alpha related to the development of lung cancer). The gene is located at the chromosomal 3p22–21.3 segment, which is known to be commonly deleted in various types of carcinoma including NPC. Loss of heterozygosity on 3p is observed in almost all primary NPC.14, 30 A linkage study also mapped an NPC susceptibility locus to chromosome 3p21.31–21.2, indicating that the genes in this region are crucial for the formation of NPC.15 Hence, ITGA9 might be a novel NPC susceptibility gene.

ITGA9 encodes an integrin, α9 subunit. Integrins constitute a superfamily of integral membrane glycoproteins bound to extracellular matrix ligands, cell surface ligands and soluble ligands, which mediate cell–cell and cell–matrix adhesion. Each integrin consists of an alpha (α) and a beta (β) chain. In humans, 18 α-subunits and 8 β-subunits are known, forming 24 integrin heterodimers.31 In addition to mediating cell adhesion, signals from integrins are now known to modulate cell behaviors including proliferation, survival or apoptosis, maintenance of polarity, shape, motility, haptotaxis, gene expression, differentiation and malignant transformation.31, 32, 33

The integrin α9 is a 1035 amino acids polypeptide that contains a large N-terminal extracellular domain with seven conserved repeats, a transmembrane segment and a short C-terminal cytoplasmic tail. The α9 subunit associates with the β1 chain only to form a single integrin, α9β1, which is widely expressed, including in the human airway epithelial cells, and binds to various diverse ligands such as tenascin, VCAM-1, osteopontin, uPAR, plasmin, angiostatin, ADAM, thrombospondin-1, VEGF-C and VEGF-D.31

At present, the functional relevance of α9 on NPC development is still unclear. The risk of NPC associated with multiple variants in ITGA9 may be under the influence of other genetic factors and/or environmental factors. Herpes viruses have been shown to exploit integrin for cell entry and infection. Their outer envelope glycoproteins contain a consensus motif that can bind to integrin to promote cell entry and infection.34 The apparent molecular mechanisms by which EBV invades and infects epithelial cells are still largely unknown, although a surface molecule, which is antigenically related to the CD21 receptor, has been described and may serve as a receptor for virus internalization.35, 36 Alternatively, it has been suggested that EBV may enter nasopharyngeal cells through IgA-mediated endocytosis.37 Therefore, it is tempting to hypothesize that a consensus domain of the EBV capsid protein may interact with the α9β1 expressed on the surface of epithelial cells for attachment and/or cell entry. Ligation of α9β1 by EBV may subsequently elicit potent signaling responses that could promote cell proliferation and viral pathogenesis.

This study was carried out on a relatively small initial sample containing 279 cases and 512 controls. Although our approach did not have sufficient statistical power to identify all the genetic determinants, it should have the power to identify a gene(s) having a relatively large effect such as ITGA9. To our knowledge, this was the first report of the GWA approach to identify genetic risk factors of NPC. Replication studies with a larger number of samples as well as in other independent populations need to be carried out for further confirmation of the association of ITGA9 and NPC.

In conclusion, we suggest that ITGA9 on chromosome 3p21 may be a genetic risk factor for NPC development in Malaysian Chinese. Our finding is of great interest as it sheds new light on the association between integrin α9β1 and NPC. Further studies are needed to elucidate the causal SNP or haplotype and to provide a plausible biological mechanism for the observed association between polymorphisms and susceptibility to NPC. The identification and characterization of NPC susceptibility genes help us to understand the pathogenesis of NPC and could possibly lead to an improvement of treatment in the future.