Abstract
Background/Aim: Deletions in chr9p22.1-21.3 locus have been related to the development of several types of cancer, mainly due to the presence of CDKN2A and CDKN2B genes. However, there are several other genes in the region with potential importance in tumorigenesis. We, therefore, aimed to analyze in silico the potential prognostic significance of alterations in copy number and expression of genes present in the chr9p22.1-21.3 locus in 33 TCGA datasets (approximately 10,000 patients). Materials and Methods: We analyzed which of the 27 genes are expressed in the datasets. Additionally, we associated the deletion of the locus with survival (log rank analysis) and hazard ratio (HR) (univariate cox regression). Finally, we performed univariate, multivariate, and overall survival analyses in 13 datasets considering the expression of 10 genes present in the locus. Results: We identified 10 genes of the chr9p22.1-21.3 locus expressed in the datasets (MLLT3, FOCAD, PTPLAD2, KLHL9, IFNE, MTAP, CDKN2A, CDKN2B, DMRTA1 and ELAVL2). Moreover, we found that deletion in at least 1 of these genes was associated with poor survival and increased HR in 13 datasets: adrenocortical carcinoma (ACC), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), low-grade glioma (LGG), lung adenocarcinoma (LUAD), mesothelioma (MESO), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC) and uterine corpus endometrial carcinoma (UCEC). Finally, we found an association of survival/HR and altered expression of MLLT3 in the MESO dataset, of FOCAD in the READ dataset, of PTPLAD2 in the KIRP dataset, of KLHL9 in the LGG and UCEC datasets, of IFNE in ACC, GBM, KIRC and LUAD datasets, of MTAP in LGG, LUAD and MESO datasets, of CDKN2A in the HNSC, KIRC and MESO datasets, of CDKN2B in the LGG and READ datasets, of DMRTA1 in SARC datasets and of ELAVL2 in the LGG dataset (p<0.01 for all associations). Conclusion: Besides CDKN2A and CDKN2B, numerous other genes are possibly related to cancer development, requiring further investigation.
Cancer is the second leading cause of death-related worldwide. According to the World Health Organization (WHO) 10 million people died of cancer in 2020 (1), while only in the USA 609,000 people are likely to die of cancer in 2022 (2). Its high mortality is due to several factors, including diagnosis in late stages, which was aggravated by the COVID-19 pandemic, and lack of effective treatments.
Treatment of cancer depends on tumor type, stage, and molecular profile, although the standard treatment for cancer is usually surgery, followed by chemotherapy and radiotherapy (3). Chemotherapy uses different types of agents to target cancer cells and induce cell death, however there is still a lack of chemotherapeutic treatments targeting specific molecules approved by the FDA (4). Therefore, there is a crucial need to identify molecular markers for cancer to be able to improve personalized treatment and diagnosis. In addition, being able to understand the underlying biology of a tumor can provide a new target therapy for that tumor.
The progression of cancer involves several steps related to genetic changes in multiple genes or chromosomes. One of the common genetic alterations observed in cancer is homozygous deletions of recessive cancer genes and fragile sites (5). Our research group found that about 50% of glioblastoma patients have deletion of the 22.1-21.3 region of the short arm of chromosome 9 (6). This region has been associated with the development of several types of cancers. The deletion of genes on chromosome 9p is described as an early event in the development of cancers, and the frequency of loss is similar in both non-invasive and invasive tumors, which can indicate the presence of important tumor suppressor genes in the locus (7-35). This region harbors two important tumor suppressors: CDKN2A (p16) and CDKN2B (p15) that have been widely studied (7, 14, 25, 26, 28, 31, 33, 35-38). CDKN2A is known to be a tumor suppressor gene, which inhibits the formation of complexes CDK4 and CDK6, inducing cell-cycle arrest in G1 and G2 phases (35). Activation of CDK kinases is also prevented by p15 which is a cell growth regulator that controls cell-cycle G1 progression. Genetic and epigenetic changes in CKDN2A and CDKN2B have been described in the development of cancer, metastasis, recurrence, and poor prognosis in several types of tumors (35, 38). However, chr9p22.1-21.3 region harbors 25 more genes that are less studied and may have a role in cancer development. The Cancer Genome Atlas (TCGA) is an outstanding initiative of the National Institute of Health (NIH) aiming to describe the main genetic changes found in several types of cancers (39).
Therefore, the aim of the study was to identify the role of the genes present in the locus chr9p22.1-21.3 in 33 types of cancers from the TCGA, using in silico tools. Moreover, we intended to associate these data with clinicopathological features and describe potential new driver genes with clinical impact in carcinogenesis.
Materials and Methods
This study was approved by the Barretos Cancer Hospital Ethics Committee (protocol number 1394/2017) and all methods were performed according to relevant guidelines and regulations. The results published here are based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Data import and selection of potentially important genes. The overall design of in silico analyses is depicted on Figure 1. In order to test which genes of chr9p22.1-p21.3 locus are expressed in cancer, normalized RSEM (RNA-Seq by Expectation-Maximization) data of RNA sequencing of 27 genes in the chr9p22.1-p21.3 locus (MLLT3, FOCAD, SNORA30, PTPLAD2, IFNB1, IFNW1, IFNA21, IFNA4, IFNA10, IFNA16, IFNA7, IFNA17, IFNA14, IFNA5, KLHL9, IFNA6, IFNA13, IFNA2, IFNA8, IFNA1, IFNE, MTAP, C9orf53, CDKN2A, CDKN2B, DMRTA1, ELAVL2) were imported from 33 different TCGA datasets (9,659 samples) (Table I). These values were transformed to a logarithmic scale (log10+1). In order to select genes with potential impact on the datasets, the expression of genes present in the chr9p22.1-p21.3 locus across all patients was analyzed in the 33 datasets.
Experimental design depicting the in silico experiments performed using data from The Cancer Genome Atlas (TCGA).
Datasets from the The Cancer Genome Atlas (TCGA) showing the type of cancer and number of samples.
Copy number alteration analysis. Firstly, it was determined if the gene was expressed on the datasets or not. Then, the genes most prominently expressed in the datasets were selected. The 10 selected genes were MLLT3, FOCAD, PTPLAD2, KLHL9, IFNE, MTAP, CDKN2A, CDKN2B, DMRTA1 and ELAVL2. The GISTIC2 results from CGH Custom Microarray 2x415K data from TCGA were obtained to analyze the copy number variation in the chr9p22.1-p21.3 locus in 10,815 samples across the 33 datasets. The patients were distributed in 2 groups, depending on the presence of deletion in chr9p22.1-p21.3 locus. If there was at least 1 gene with homozygous deletion in the region, the patient was allocated in the “Deleted” group. Otherwise, the patient was considered to have no deletion. In order to determine the prognostic impact of the deletion in chr9p22.1-p21.3 locus among the cancer types, a univariate cox analysis was performed. The datasets presenting potential prognostic significance were considered for gene expression analyses. Then, overall survival analysis was performed in adrenocortical carcinoma (ACC), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), low-grade glioma (LGG), lung adenocarcinoma (LUAD), mesothelioma (MESO), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC) and uterine corpus endometrial carcinoma (UCEC) datasets. Finally, the individual percentage of deletion of each gene of the locus was determined.
Gene expression analysis. The 13 datasets with potential prognostic significance in copy number analysis were selected for gene expression studies. The Z-score was calculated for all genes obtained of RNASeq analysis. Gene expression was considered High if Z-score ≥1.5 or Low if Z-score ≤−1.5. Otherwise, gene expression was considered normal. Univariate cox analysis was performed in each gene and dataset. Those presenting p≤0.1 were selected for multivariate cox analysis per dataset. Finally, the genes/datasets with statistical significance after multivariate analysis were submitted to overall survival analysis.
All analyses were performed through the R v.4.0.2 software, using the packages RTCGAToolbox and TCGAbiolinks (40, 41). HR and 95% confidence interval (95%CI) were generated using Uni- and Multivariate cox analysis and survival analysis was performed constructing Kaplan-Meier curves using the Survminer package. The group contrasts were considered statistically significant when p≤0.05.
Results
Ten genes of the chr9p22.1-p21.3 locus were found expressed among 33 TCGA datasets. In order to select genes, that are expressed in the studied tumors, the expression of each of the 27 genes present in the locus was analyzed across all datasets. Overall, among the genes analyzed, 10 genes were found expressed across all datasets (MLLT3, FOCAD, PTPLAD2, KLHL9, IFNE, MTAP, CDKN2A, CDKN2B, DMRTA1 and ELAVL2) (Figure 2). The remaining genes presented number of reads close to 0 in most patients and datasets; therefore, they were excluded from subsequent analyses.
Expression of the 27 genes located in chr9p22.1-21.3 locus using RNA sequencing data of 9,659 cancer patients from The Cancer Genome Atlas (TCGA).
Deletion in chr9p22.1-p21.3 locus is a prognostic factor in 13 TCGA datasets. Once expressed genes present in all datasets were selected, we analyzed the frequency of homozygous deletion of the chr9p22.1-p21.3 region across the 33 datasets. We found 7 datasets with more than 30% of patients presenting deletion in the chr9p22.1-p21.3 locus: BLCA (33.3%), DLBC (31.3%), ESCA (41.8%), GBM (58.6%), HNSC (32.2%), MESO (47.1%) and SKCM (33.2%) (Figure 3).
Percentage of deletion of chr9p22.1-21.3 locus in the 33 The Cancer Genome Atlas (TCGA) datasets. A patient was considered as having the deletion if there weas at least one gene deleted in the locus.
Univariate cox analysis of copy number results points to a potential prognostic significance of chr9p22.1-p21.3 deletion in 13 datasets (ACC, GBM, HNSC, KIRC, KIRP, LGG, LUAD, MESO, PAAD, PRAD, READ, SARC and UCEC). Interestingly, LGG results showed a 7.93-higher chance of death in the patients presenting the deletion (p<0.0001, Figure 4).
Hazard ratio (HR) and p-values of univariate cox analysis of deletion in the chr9p22.1-21.3 locus of The Cancer Genome Atlas (TCGA) datasets.
Overall survival analysis shows the deletion associated with poorer prognosis in ACC (p=0.012), GBM (p=0.006), HNSC (p=0.0021), KIRC (p<0.0001), KIRP (p<0.0001), LGG (p<0.0001), LUAD (p=0.016), MESO (p<0.0001), PAAD (p=0.094), PRAD (p=0.0094), READ (p=0.02), SARC (p=0.0021) and UCEC (p=0.027) (Figure 5).
Overall survival of patients according to deletion of chr9p22.1-21.3 locus in 13 The Cancer Genome Atlas (TCGA) datasets.
We further evaluated which genes present in the locus are most frequently deleted in the region. Except for UCEC, in most patients of the other datasets, the deletion in the region harbors mainly MTAP, CDKN2A, and CDKN2B genes (Figure 6).
Percentage of deletion of each gene present in the chr9p22.1-21.3 locus in 13 The Cancer Genome Atlas (TCGA) datasets.
Gene expression analysis shows potential prognostic significance of chr9p22.1-p21.3 gene expression mainly in LGG and MESO. Uni- and multivariate cox analyses point to potential prognostic significance of 10 genes across 11 datasets, mainly low-grade gliomas (4 genes), and mesotheliomas (3 genes). Low expression of MLLT3, FOCAD, and PTPLAD2 is associated with decreased survival in MESO, READ, and KIRP patients, respectively. Multivariate analysis showed that high KLHL9 expression is associated with decreased survival in LGG and UCEC. High IFNE expression is associated with decreased survival in GBM, KIRC, and LUAD. Low MTAP expression is associated with decreased survival in LUAD, whereas high MTAP expression is associated with decreased survival in LGG, LUAD, and MESO. High CDKN2A expression is associated with decreased survival in KIRC and MESO, whereas low expression is associated with increased survival in HNSC. High expression of CDKN2B is associated with decreased survival in LGG and READ. Low expression of DMRTA1 is associated with decreased survival in SARC. Finally, high expression of ELAVL2 is associated with increased survival in LGG.
Accordingly, overall survival analysis showed that 4 genes have potential prognostic importance in LGG (KLHL9, MTAP, CDKN2B and ELAVL2, p<0.0001) whereas 3 genes present potential prognostic importance in MESO (MLLT3, MTAP and CDKN2A, p<0.001). Two genes were associated with survival in READ (FOCAD, CDKN2B, p<0.05), KIRC (IFNE, CDKN2A, p<0.0001) and LUAD (MTAP, IFNE, p=0.05), while in ACC (IFNE, p<0.0001), KIRP (PTPLAD2, p<0.0001), GBM (IFNE, p<0.0001), HNCS (CDKN2A p<0.01), SARC (DMRTA1 p<0.001) and UCEC (KLHL9 p<0.05) only one gene was associated with survival (Figure 7).
Overall survival of patients according to expression of 10 genes across 13 The Cancer Genome Atlas (TCGA) datasets.
Discussion
In the present study, we inquired the deletion and expression of genes present on chr9p22.1-21.3 locus in 33 different types of cancer across TCGA datasets. Overall, we found a potential prognostic significance of chr9p22.1-21.3 deletion in 13 datasets (ACC, GBM, HNSC, KIRC, KIRP, LGG, LUAD, MESO, PAAD, PRAD, READ, SARC and UCEC) and that differential expression of 10 genes (MLLT3, FOCAD, PTPLAD2, KLHL9, IFNE, MTAP, CDKN2A, CDKN2B, DMRTA1 and ELAVL2) is present in at least one of these datasets. Moreover, we found a potential prognostic significance of genes KLHL9, MTAP, MLLT3 and ELAVL2 in cancer, mainly in LGG and MESO.
Deletion of chr9p22.1-21.3 is associated with poor survival in 13 datasets. Our results demonstrated that deletion of chr9p22.1-21.3 was associated with a poor prognosis in 13 datasets. To the best of our knowledge, loss of heterozygosity (LOH) of the locus has been more frequently analyzed (7, 8, 10, 11, 21, 23, 27, 33, 34, 36, 42-44), explaining the differences of frequency between other studies and ours. The loss of chr9p22.1-21.3 region is the most described genetic alteration for HNSCC (32, 34). One study found the LOH of 9p in 72% patients with HNSCC (11). In our analysis, we found 32.2% of patients with homozygous deletion in the HNSCC dataset, and a poorer survival of these patients. In regards to lung cancer, Panani et al. using fluorescence in situ hybridization (FISH) showed that 73% of small cell carcinomas, 84% of adenocarcinomas and 100% large cell carcinomas had the loss present (25). We found that 18.2% of lung adenocarcinoma patients with homozygous deletion in this locus, also presented a poorer survival. Another study that used FISH to detect deletion of the locus was that of Luo et al. (45), where they found a loss of 50% in patients with pancreatic adenocarcinoma. Another study using single cell comparative genomic hybridization (SCOMP) found a loss of the locus in 31% of patients (42), whereas in our study the homozygous deletion frequency was 8.3%. For clear cell renal cell carcinoma (ccRCC), LOH was detected in 44% of ccRCCs patients and the loss was associated with higher stage, larger tumors, necrosis, micro vascular, and renal vein invasion (13). Moreover, a study showed that 88% of ccRCC patients without loss of the chromosome 9p survived 5 years after diagnosis vs. 43% of patients with 9p loss (8, 14). In the present study, we demonstrated that 3.0% of clear cell renal cell carcinoma patients presented homozygous deletion in the locus, and less than 50% of those survived 50 months after diagnosis, whereas about 70% of patients without deletion survived this period. Considering kidney cancer, we also found 4.5% of kidney renal papillary cell carcinoma patients with homozygous deletion in the locus, with a worse survival compared to those with no deletion. Concerning mesotheliomas, Cheng et al. (1993) described loss of the region in 87% of patients and 83% of cell lines, being 43% deletions (11). We found deletion of the region in 45.1% of the patients. Interestingly, all patients presenting the deletion died before 60 months after diagnosis, whereas about 25.0% of the patients with no deletion survived longer. For sarcomas, our study found deletion in 14.8%, while in the literature a study described 15% copy number alterations in the locus, being 5 homozygous deletions and one hemizygous deletion, and the CNA was associated with poor survival (46). Finally, besides a low frequency, we found homozygous deletion in adrenocortical carcinoma, prostate adenocarcinoma, uterine corpus endometrial carcinoma and rectum adenocarcinoma, and this low frequency has been related in the literature (47-49). The loss in 0.6% of rectum adenocarcinoma was described for the first time in this study. Importantly, for gliomas, James et al. (1993) found loss of the region in 67% of glioma-derived cell lines and 37% of primary cell lines (18), suggesting that this loss is an early event in the development of gliomas. We found deletion in 13.1% of patients diagnosed with LGG and 58.7% of GBM, suggesting that this deletion may also be correlated to malignancy of this tumor type. Overall, these studies indicate the presence of potential tumor suppressor genes in this region and, in fact, CDKN2A and CDKN2B are well known tumor suppressors genes (35, 38). Analysis of deletion of individual genes in the locus point to recurrent deletion of mainly CDKN2A and CDKN2B (corroborating the literature) and MTAP genes. However, the other genes presented in the region (MLLT3, FOCAD, PTPLAD2, KLHL9, IFNE, DMRTA1 and ELAVL2) are shown as deleted in at least 20-25% of the samples. Therefore, we analyzed the potential prognostic significance of the expression of these genes in order to investigate if they can also play a role as tumor suppressors.
Besides CDKN2A and CDKN2B, there are several genes of chr9p22.1-21.3 with potential prognostic value in several tumor types. By using our approach, we found two datasets, in which the differential expression of genes in chr9p22.1-21.3 may be more related to survival and death risk: LGG and MESO. Our data point to these datasets as good candidates for further studies considering the possibility of finding novel tumor suppressor genes in the 9p22.1-21.3 region.
Interestingly, besides being a well-recognized tumor suppressor gene, high expression of CDKN2A was correlated to worse survival in 2 datasets. This behavior has been shown in several studies, mainly studies related to viral infection and cancer development, showing potential of p16 (protein encoded by CDKN2A) to be a surrogate marker of human papillomavirus infection in cancer (50). This gene, implicated in the cyclin D1/retinoblastoma pathway, is shown to be disrupted in the majority of human hepatocellular carcinomas (51-53). Indeed, its methylation has been associated with several parameters of liver hepatocellular carcinoma (LIHC), including virus infection (54), and CDKN2A expression was correlated to a poor prognosis, similarly, in the uterine cervix (55). Furthermore, CDKN2A is part of a six-gene signature in UCEC which is related to poor prognosis (56). Lamperska et al. studied using immunohistochemistry the expression of p16 in uveal melanoma (UVM), and their findings showed upregulation of p16, cyclin D1, cyclin 3, as well as abnormal pRB and E2F binding during the development of human UVM (57, 58). High expression of CDKN2A was associated with better overall survival in HNSC patients, corroborating the literature (59).
Of interest, low MTAP expression was associated with poor survival in 3 datasets (LGG, LUAD and MESO). In gliomas, one study found loss of MTAP expression in 27.8% of diffuse astrocytomas, 50.0% of anaplastic astrocytomas, 45.6% of adult glioblastomas and 54.8% of pediatric glioblastomas, however the loss was not associated with clinicopathological features and survival (60). A bioinformatics study using gliomas TCGA cases found the loss of MTAP of 25.8% in LGG and 60% in glioblastoma and the survival of patients with loss of the MTAP gene was shorter than that in other patients (61). Another study found loss of MTAP in 12.2% of grade I and in 62.5% of grade IV gliomas and the loss was correlated with shorter overall survival (p=0.011) and a shorter progression-free survival (p=0.016) (62). For lung adenocarcinoma, only one study described the loss of MTAP (12.9% of MTAP loss in LUAD), however the study did not associate the loss with survival (63). For MESO, MTAP loss was found in 76.2% of malignant MESO (64) and in 74% in malignant pleural MESO of which 37.5% were heterozygous and 12.5% homozygous deletions (65). In fact, studies for MESO and MTAP expression are not focused in survival, but in expression, since immunohistochemistry for MTAP is a reliable surrogate for CDKN2A fluorescence in situ hybridization for diagnosis of malignant MESO (66), and our results point that MTAP may be a surrogate marker of the locus loss in other cancer types.
We also found MLLT3, DMRTA1, and ELAVL2 as interesting subjects of study in specific datasets, since low expression was associated with poor survival or high expression was associated with good survival. MLLT3 gene, which encodes AF9 protein, is part of an elongation complex, required in increasing the catalytic rate of RNA polymerase II (67). This, in turn, it regulates the control checkpoint elongation, and its loss of function is associated with the development of hematologic cancers, and, in fact, we found that this loss led to a worse prognosis. The function of this gene has been well described in common rearrangements found in leukemia (68), important in the maintenance of intermediate precursor brain cells, preventing their premature cell-cycle exit through epigenetic modifications (69). We found low expression of this gene drastically associated with low survival in mesothelioma patients, pointing to a potential role as a tumor suppressor.
DMRTA1, a gene part of the DMRT family that encodes transcription factors involved in sexual development, controls testicular development, including the differentiation of germ cells and Sertoli cells (70). Another study showed a loss of DMRTA1 expression in 27% of bladder cancer cell lines (71). Also, a distinctive intergenic insertion between DMRTA1 and LINC01239 had oncogenic effects through activation of the mammalian target of rapamycin (mTOR)/4EBP/S6K pathway in HBV-infected intrahepatic cholangiocarcinomas (72). The ELAVL family are RNA-binding proteins that bind with high specificity and to AU instability factors in the 3′UTR region of mRNAs from genes involved in cell growth (73). ELAVL2 binds with high specificity to the 3′UTR region of c-Myc mRNAs (74), HIF-1a, PTBP1 and VEGF (75). In addition, ELAVL2 binds to the 3′-UTR of the MycN mRNA and increases its stability in neuroblastoma (76). ELAVL2 also regulates several aspects of neuronal function, including neuronal excitability and synaptic transmission, both of which are critical for normal brain function in cognition and behavior (77). In cancer, ELAVL2 methylation was correlated with better rates of progression-free and overall survival in oropharyngeal cancer (78). ELAVL2 overexpression is a risk factor for poor response to chemotherapy in patients with esophageal squamous cell carcinoma (79). Of note, we found that low DMRTA1 expression was associated with poor survival in sarcoma, whereas high ELAVL2 expression was associated with increased survival in LGG.
Conclusion
Besides CDKN2A and CDKN2B, we discovered several other genes in the chr9p22.1-21.3 locus that are possibly associated with cancer development, namely MLLT3, DMRTA1, MTAP, and ELAVL2. Furthermore, these genes may be related to cell cycle, differentiation, and metabolism thus may be important for future studies, especially in gliomas, MESO, and sarcomas. These genes enrich the piece of evidence regarding cancer development and may contribute towards a better understanding of underlying molecular biology of cancer development.
Acknowledgements
This study was funded by the Fundação de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP), grant number 2016/21727-4 for LTB. PGG was funded by the Fundação de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) fellowship number 2017/09749-5. Both contributed to the design of the study and collection, analysis, and interpretation of data.
Footnotes
Authors’ Contributions
PGG contributed to the analysis and interpretation of the data and wrote the first draft. RMR contributed to conceptualization of the project, interpretation of data, writing reviewing and editing. LTB contributed to the analysis and interpretation of data, methodology, conceptualization of the project, funding acquisition, final writing, reviewing, and editing.
Availability of Data and Materials
The datasets analyzed during the current study are available in The Cancer Genome Atlas repository [https://tcga-data.nci.nih.gov].
Conflicts of Interest
The Authors declare that they have no competing interests.
- Received July 6, 2022.
- Revision received September 3, 2022.
- Accepted September 14, 2022.
- Copyright © 2022 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.













