Abstract
Background/Aim: Advances in whole-genome sequencing have revealed that many RNA genes have important functions in various cellular processes and long non-coding RNAs (lncRNAs) have been attracting increasing attention. In the present study, we characterized a large number of lncRNAs from cancer patients and classified them based on their relationships with immune response-associated genes and cancer prognosis.
Materials and Methods: 435 lncRNAs were identified as differentially expressed genes from 5,013 cancer patients using the limma package and were narrowed down to 80 lncRNAs commonly expressed in more than 4 cancer types. The relationships of these lncRNAs with immune response-associated genes were examined using Pearson’s correlation and the tumor immune scoring system. Through a receiver operating characteristic (ROC) analysis of highly scored lncRNAs with the tumor-infiltrating lymphocyte (TIL) status using QuanTIseq, a cancer prognostic analysis was conducted using a multivariate Cox proportional hazards model.
Results: In a correlation analysis of 80 lncRNA genes and immune response-associated genes, lncRNA genes were scored and classified as high, intermediate, and low immune status score groups based on the Tumor Immune Status Scoring Algorithm (TIMMUSCORA). The ROC analysis of lncRNA genes with a high score (>200) revealed high sensitivity for the identification of TILs, such as CD8+ T cells and B cells. Furthermore, 11 of 80 lncRNA genes were identified as prognostic genes.
Conclusion: Twelve lncRNA genes with a high immune status score were identified as immunogenic and CD8+ T cell infiltration-associated genes, with two being indicators of a positive cancer prognosis. These results indicate the potential of 12 lncRNA genes with a high immune status score as immunogenic biomarkers associated with a positive immune response and prognosis.
- Long non-coding RNA
- whole genome sequencing
- tumor microenvironment
- immune status scoring
- tumor-infiltrating lymphocyte
Introduction
Human genome sequencing has revealed that approximately 93% of DNA is transcribed into RNA, 2% of which comprises protein-coding mRNAs and the remaining 98% is non-coding RNAs (ncRNAs), called dark matter (1, 2). Linear ncRNAs are divided into two groups based on their length: short ncRNAs [<200 nucleotides (nt)] and long ncRNAs (lncRNAs, >200 nt) (3, 4).
Research on lncRNAs showed that many RNA genes play important functions in various cellular processes, including DNA transcription and cell apoptosis and differentiation, through interactions with DNA, RNA, and proteins (5, 6). One of the important mechanisms of action of lncRNAs is specific binding to miRNAs, which may compete with mRNAs, and controlling transcription by regulating intracellular target mRNAs. Furthermore, oncology studies have revealed that a large number of lncRNAs are associated with oncogenic driver or tumor suppressor genes in different cancer types (7-9).
LncRNAs also serve as templates for protein translation. Advances in transcriptome technology, ribosome profiling, and mass spectrometry have led to the identification of specific lncRNA peptides that are not detected by conventional technology (10-12). Functional peptides encoded by lncRNAs have been attracting increasing attention due to their emerging roles in cancer regression or progression mechanisms.
Functional peptides derived from lncRNAs are also of interest to researchers due to their effects on the tumor immune microenvironment (TIME) (13-15). Wu et al. reported that lncRNA-LINC00665 inhibited the infiltration of macrophages and dendritic cells (DCs), suppressed regulatory T cells (Tregs), and prevented T-cell exhaustion by acting as a competing endogenous RNA to its target, FTX (15). However, the immunological functions of the majority of lncRNAs in the tumor microenvironment (TME) have yet to be elucidated.
In the present study, we characterized a large number of lncRNAs obtained from 5,013 cancer patients enrolled in Project HOPE (16) and classified them based on their relationships with immune response-associated genes and cancer prognosis.
Materials and Methods
Study design. The Shizuoka Cancer Center launched Project HOPE in 2014, which consists of multiomics analyses, including whole exome sequencing (WES) and gene expression profiling (GEP). Ethical approval was obtained from the Institutional Review Board at the Shizuoka Cancer Center (authorization number: 25-33). Written informed consent was obtained from all patients enrolled in the study. All experiments using clinical samples were performed in accordance with the Ethical Guidelines for Human Genome and Genetic Analysis Research. This study was conducted in accordance with the Declaration of Helsinki.
Clinical specimens and GEP analysis. Tumor tissue samples weighing ≥0.1 g were dissected from surgical specimens, along with samples of surrounding normal tissue, and tumor tissues were visually confirmed to have a tumor content ≥50%. RNA isolation was previously described (17). Briefly, total RNA was extracted using the miRNeasy Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. RNA samples with an RNA integrity number ≥6.0 were used for microarray analyses. Labeled RNA samples were hybridized to the SurePrint G3 Human Gene Expression 8×60 K v2 Microarray (Agilent Technologies, Santa Clara, CA, USA).
LncRNA gene list. Specific probes for 2,329 lncRNA genes were arranged on the SurePrint G3 Human Gene Expression 8×60 K v2 Microarray tip. The expression levels of lncRNA genes were measured using a DNA Microarray Scanner (Agilent Technologies).
Immune response-associated gene panel. The current immune response-associated gene panel was recently reported (18) and includes 300 immune response-associated genes consisting of the following categories: 123 immune cell genes, 50 cytokine and chemokine signal genes, 47 tumor necrosis factor (TNF) and TNF receptor superfamily genes, 23 regulatory T- cell-associated genes, and 57 interferon-g pathway genes.
Statistical analysis. Identification of differentially expressed genes (DEGs) of lncRNAs in cancers. In total, 5,013 gene expression profiles from both normal and tumor tissues were initially selected from twelve cancer types for analysis. DEG were detected by the limma R package (https://www.bioconductor.org/packages/release/bioc/html/limma.html), with 2329 microarray probes for lncRNAs (19). DEGs were identified with a cut-off of log2 (fold change) >1 and a significant difference was confirmed by a p-value <0.05 (Benjamini-Hochberg correction). To visualize GEP with a volcano plot, adjusted p-values were transformed to -log10 values with the ggplot2 R package (https://cran.r-project.org/web/packages/ggplot2/index.html)
Pearson’s correlation analysis of differentially expressed lncRNA genes with 300 immune response-associated genes. The relationships between the expression of immune response-associated genes and differentially expressed lncRNA genes were examined by Pearson’s correlation coefficient analysis using the stats package (https://search.r-project.org/R/refmans/stats/html/00Index.html). The results obtained were shown in a pie chart of single lncRNA genes with 300 values for the coefficients for immune response-associated genes, with the white slice indicating p-values >0.05. A positive correlation was defined as r>0.3 and a negative correlation as r<−0.3.
Development of the tumor immune status scoring algorithm (TIMMUSCORA). The immune status (categories A, B, C and D) of 5,013 cases was classified by plotting the transcriptional expression values of CD8B against those of CD274 (PD-L1), as previously reported (20). DEG analyses of the 300 immune response-associated genes of category A against categories B, C, and D as well as category C against categories A, B, and D were performed by the limma package (https://www.bioconductor.org/packages/release/bioc/html/limma.html). After subtracting the values of log2 (fold change) of category C from those of category A, variations in the fold change were used to generate a heatmap with clustering by the pheatmap R package (https://cran.r-project.org/web/packages/pheatmap/index.html). Immune genes, aligned by variation values for the fold change, were scored from 1 to 4 in accordance with the quantiles of these values.
TIMMUSCORA was developed to investigate the immune stimulatory status of cancers by combining Pearson’s correlation coefficients and the reference scores for immune response-associated genes. TIMMUSCORA scores were calculated by subtracting the sum of the reference scores of negative correlation genes from that of positive correlation genes.
The equation is expressed as follows:
ScoreTIMM: TIMMUSCORA score
Ref.Scorepositive.corr.gene: Reference score for positive correlative immune genes
Ref.Scorenegative.corr.gene: Reference score for negative correlative immune genes
TIMER2.0 analysis. The TIMER2.0 analysis was performed as previously described (https://compbio.cn/timer2/) (21). Twenty-two differentially expressed lncRNA genes were suitable for the analysis with the available Cancer Genome Atlas (TCGA) dataset. To calculate Spearman’s correlation, the QuanTIseq algorithm in Timer2.0 was selected to evaluate TIMMUSCORA cut-off values. This algorithm is one of the most recent deconvolution tools and is based on a constrained least square regression and novel signature matrix including not only Tregs, but also M1 and M2 macrophages (22). The results shown on the website were downloaded for further analyses. Tile plots and receiver operating characteristic (ROC) curves were drawn by ggplot2 and pROC of R (https://cran.r-project.org/web/packages/pROC/index.html). TCGA study abbreviations are listed on the website (https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations).
Prognostic analysis. Prognostic analyses of differentially expressed lncRNA genes were performed for each cancer type using the survival and survminor packages of R (https://cran.r-project.org/web/packages/survminer/index.html, https://cran.r-project.org/web/packages/survival/index.html)(23). Kaplan-Meier plots and the Log-rank test were performed between two sub-populations divided by the median GEP values of lncRNAs. The multivariate Cox proportional hazard risk was assessed by these packages, and a forest plot was drawn using the forestploter package (https://cran.r-project.org/web/packages/forestploter/index.html). A multivariate analysis of the Cox proportional hazard risk is a type of linear regression model; therefore, the models examined were evaluated by the Akaike Information Criterion provided by the MASS package (https://cran.r-project.org/web/packages/MASS/index.html). At the end of the analysis, the Log-rank test was performed on all eleven prognostic pancancer lncRNA candidates using the survival data of 5,013 profiles. The significance of differences was set as a p-value <0.05.
Results
Screening and profiling of lncRNA genes using the limma package. The screening platform of lncRNA genes derived from 5,013 cancer patients is shown in Figure 1. The different cancer types examined were as follows: breast cancer, n=246; colon cancer, n=1,142; head and neck cancer, n=316; kidney cancer, n=45; liver cancer, n=272; lung cancer, n=890; melanoma, n=34; pancreas cancer, n=119; rectal cancer, n=842; skin cancer, n=30; stomach cancer, n=596; and uterine cancer, n=99. In total, 435 differentially expressed lncRNAs were identified using a volcano plot and narrowed down to 80 lncRNAs referred to as PANCUPlnc80, which were commonly expressed in more than 4 cancer types (Table I). The profiles of differentially expressed lncRNA genes in various cancer types are shown in Figure 2. A high number of DEGs were confirmed in kidney cancer and melanomas, while pancreatic cancer had a low number of DEGs.
Schema of the screening platform for long non-coding RNA (lncRNA) genes as cancer-specific biomarkers from 5,013 cancer patients enrolled in Project HOPE. Briefly, 435 differentially expressed lncRNAs were identified using the limma package and narrowed down to 80 lncRNAs (PANCUPlnc80) across various cancer types. The relationships of these lncRNAs with immune-associated gene expression were analyzed using Pearson’s correlation and our tumor immune scoring system TIMMUSCORA.
List of up-regulated lncRNAs in twelve cancer types.
Profiles of lncRNA genes differentially expressed by more than 2-fold in various cancer types. (A) The number and fold change of lncRNA genes differentially expressed by more than 2-fold in various cancer types are shown, using a dot plot analysis. Each column shows the median and lncRNA genes up-regulated by more than 2-fold are shown in orange. (B) The profile of lncRNA genes up-regulated by more than 2-fold with a p-value <0.05 in many cancer types using a volcano plot analysis is shown. The two vertical dotted lines show log-scaled fold changes of 2.0 and 0.5, respectively. The horizontal dotted line represents a log-scaled p-value of 0.05.
PANCUPlnc80 genes were evaluated for their relationships with immune response-associated genes and the immune status using a specific scoring system or the tumor-infiltrating lymphocyte (TIL)-analyzing algorithm, and a prognostic analysis was then performed according to the screening platform shown in Figure 1.
Relationships of PANCUPlnc80 gene expression with immune response-associated gene expression. Pearson’s correlation analysis revealed that PANCUPlnc80 was positively correlated with 36 immune-related genes, negatively correlated with 17, and showed no significant correlation with 27 genes (Figure 3). The cut-off co-efficient values for positive and negative correlations were >0.3 and <−0.3, respectively.
Correlation analysis of PANCUPlnc80 lncRNAs with the expression of 300 immune response-associated genes using Pearson’s correlation shown in pie charts. Correlations were classified into three categories according to the coefficient value: r>0.3 (p<0.05) in pink, r ≤0.3 (p<0.05) in gray, and r<−0.3 (p<0.05) in blue. The space in white indicates no significance (p>0.05).
Scoring of immune response-associated genes based on the immune status. Based on the previously reported expression profiles of immune response-associated genes, category A exhibited an immunogenic status and category C had an immunosuppressive status (Figure 4). A volcano plot analysis indicated that immune response-associated genes belonging to categories A and C showed up- and down-regulation patterns, respectively. According to a heat map analysis of immune response-associated gene expression data from categories A and C, 300 immune response-associated genes were scored from 1 to 4 based on TIMMUSCORA, with a higher score indicating a stronger immunogenic status (Table II).
Scoring of immune response-associated genes based on the tumor immune status. (A) CD274 and CD8B gene expression profiles of 5,013 cancer patients enrolled in Project HOPE. (B) Immune response-associated gene expression profiles belonging to immune categories A and C. Contrasting expression patterns were observed between categories A and C. (C) A heat map analysis of immune response-associated gene expression data from categories A and C. Based on expression differences between categories A and C, 300 immune response-associated genes were scored from 1 to 4 based on the Tumor Immune Status Scoring Algorithm (TIMMUSCORA).
TIMMUSCORA reference score of immune response-associated genes.
Score ranking of PANCUPlnc80 genes using TIMMUSCORA. Based on the profiles of PANCUPlnc80 genes with immune response-associated gene expression, PANCUPlnc80 genes showed specific scores and were evaluated in terms of the tumor immune status (Figure 5). Twelve lncRNA genes exhibited high scores of >200, while 36 lncRNA genes had low scores of <0 (Figure 5A). Scores for lncRNA genes in various cancer types are shown in Figure 5B, with high scores for skin, kidney, head and neck cancers, and melanomas and low scores for lung and liver cancers.
Score ranking of PANCUPlnc80 genes using the tumor immune status scoring algorithm (TIMMUSCORA). (A) The tumor immune status scores of PANCUPlnc80 lncRNA genes is shown. PANCUPlnc80 genes showed specific scores calculated using TIMMUSCORA. PANCUPlnc80 genes were tentatively classified into three categories: high score >200 (12 genes), intermediate score 0-200 (32 genes), and low score <0 (36 genes). (B) Immune status scores of lncRNA genes in various cancer types, are shown in the columns in pink. PANCUPlnc80 gene numbers are included in each cancer type and are also shown in the columns in blue.
Relationship between the immune status score and the TIL status using QuanTIseq of TIMER2.0. Twenty-three lncRNA genes were selected from PANCUPlnc80 genes according to the TIMER 2.0 database, and the immune status scores obtained from TIMMUSCORA for these lncRNAs were evaluated based on their relationship with the TIL status, such as CD8+ T cells, Tregs, B cells, and macrophages. A ROC analysis of lncRNA genes with a high score (>200) showed high sensitivity with an AUC value >0.8 for the identification of CD8+ T cells and B cells (Figure 6).
Relationships of immune status scores with the TIL status using the QuanTIseq software of TIMER2.0. Twenty-three lncRNA genes were selected out of PANCUPlnc80 genes according to the TIMER 2.0 database. Immune status scores calculated from TIMMUSCORA of lncRNAs were evaluated in terms of their relationship with the TIL status; the X axis of the plot shows TCGA cohorts and the Y-axis shows selected lncRNA genes. (A) CD8+ T cells, (B) regulatory T cells, (C) B cells, (D) myeloid dendritic cells (DC), (E) M1 macrophages, and (F) M2 macrophages. In the ROC analysis, cut-off values of 200 (in red) and 0 (in blue) as immune status scores were used. Each AUC value is shown in the plot. TCGA cancer cohorts are shown as follows: breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), head and neck squamous cell carcinoma (HNSC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PAAD), rectal adenocarcinoma (READ), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), uterine corpus endometrial carcinoma (UCEC), and uterine carcinosarcoma (UCS).
Relationship of PANCUPlnc80 gene expression levels with the overall survival of cancer patients. The relationship of PANCUPlnc80 gene expression levels with overall survival was investigated. Overall survival at five years between the positive (higher expression than the median level) and negative (lower expression than the median level) groups was compared using the multivariate Cox proportional hazard model. Eleven of 80 lncRNA genes were identified as prognostic genes (8 genes; high expression was associated with a better prognosis, and 3 genes; high expression was associated with a worse prognosis), with two showing a high immune status score and good prognosis (Figure 7).
Relationship of PANCUPlnc80 gene expression levels with the overall survival of cancer patients. (A) Relationships of PANCUPlnc80 gene expression levels with overall survival using a multivariate Cox proportional hazard model analysis. Eleven of 80 lncRNA genes were identified as prognostic genes (8 genes; high expression was associated with a better prognosis, and 3 genes; high expression was associated with a worse prognosis). (B) Kaplan-Meier curves of 11 prognostic lncRNA genes. Overall survival at five years was compared between the high (higher expression than the median level, in orange) and low (lower expression than the median level, in blue) groups. p<0.05, significant difference. Pie chart data represent relationships with immune response-associated genes.
Discussion
Advances in NGS technology, including scRNA-seq and Ribo-seq, have revealed that ncRNAs and chromosomal instability are the “dark matter” of the human genome. LncRNAs have been shown to exhibit multiple biological functions for the regulation of DNA, RNA, and protein metabolism in various cancer research categories using transcriptomics, metabolomics, and peptidomics. LncRNA gene profiling for pancancer patients has been performed by several research groups that investigated common features in various cancer types using a large amount of genomic data from the TCGA database (7, 24, 25). Li et al. demonstrated that it was possible to characterize specific lncRNAs in terms of the tumor mutation burden, TIL, immune status, and cancer prognosis across cancer types, which was similar to our approach (7). Project HOPE is unique in that it employs WES and whole genome sequencing combined with a multi-omics analysis of clinical specimens from cancer patients in Japan and is performed at a single research facility in a major cancer hospital.
Several lncRNA genes have been identified as cancer-specific biomarkers for the subtype classification and prognostic evaluation of cancers (15, 26, 27). Specifically, several previous studies have reported that specific lncRNAs are associated with cancer prognosis, invasion and metastasis in solid cancers and leukemia (28-33). Interestingly, some researchers demonstrated the significance of lncRNA levels in the exosome and serum (30, 32). Gao et al. identified six lncRNA signatures for the immunophenotype classification of glioblastoma (27). Other lncRNA signatures have been associated with TIME (34) and cancer metabolism (35).
One of the main advantages of lncRNA profiling is that it allows for the identification of novel neoantigens. Although ncRNAs account for the majority of the human genome, few investigations on ncRNA-derived neoantigens have been performed to date due to the lack of integration between current NGS technology and the classical neoantigen-seeking approach using LC/MS. Previous studies that identified neoantigen peptide sequences from a large amount of lncRNA sequencing based on in silico ribosome profiling and mass spectrometry, screened many neoantigen candidates with HLA-class I and II restrictions. Another study showed that specific neoantigen peptides had a prognostic role and exhibited CD8+ T cell stimulatory activity in vitro (36-39). In addition, the use of lncRNA-derived neoantigen peptides in ICI therapy has been reported, and Das et al. demonstrated that combination therapy with an anti-CTLA-4 antibody and anti-PD-1 antibody efficiently up-regulated lncRNA gene expression (40).
Many knowledge databases have recently been developed, driven by the integration comprehensive human whole genome sequencing plus RNA sequencing and lncRNA data into various multi-omics-based investigations, and include the lncRNA spatial atlas of expression [LncSpA (http://bio-bigdata.hrbmu.edu.cn/LncSpA)] (41), a resource for lncRNA and circRNA cancer associations [Lnc2Cancer3.0 (http://www.bio-bigdata.net/lnc2cancer)] (42), the lncRNA modulator atlas in pancancer (LncMAP) (43), and the curated knowledge base of human lncRNA [LncBook (http://bigd.big.ac.cn/lncbook)] (44). These databases are user-friendly tools and particularly LncBook is a comprehensive and efficient tool that is applicable for various purposes.
In the multivariate cancer prognostic analysis of lncRNA genes using a Cox proportional hazard model, 11 lncRNA genes were identified as being prognostic. Moreover, 2 of the 11 prognostic lncRNA genes, MIAT and LINC01094, showed high immune status scores of >200 and were associated with a good prognosis in breast cancer patients. This is in contrast to previous findings (45), which revealed that MIAT was associated with a poor prognosis in many solid cancers, including breast, gastric, and cervical cancers. However, there is a limitation in the present study because our gene expression data were obtained using ver2 of Agilent microarray that can cover only 2,300 lncRNA genes, but the latest ver3 microarray can cover >30,000 lncRNA genes. Therefore, further studies of other cancer tissues in terms of a wide range of lncRNA gene expression are warranted in the future.
In the present study, we characterized a large number of lncRNAs from cancer patients enrolled in Project HOPE and classified them in terms of correlations with immune response-associated genes, immune cell infiltration, and cancer prognosis based on TIMMUSCORA, which was developed in-house. QuanTIseq software from Timer2.0 and other TIL-analyzing software were efficient for the investigation of the TIL status with lncRNA expression levels (21). TIMMUSCORA has been developed as a novel tumor immune status-evaluating tool based on scores for 300 immune response-associated genes. Thus, TIMMUSCORA indicates that a high score (>200) showed an immunogenic tumor immune status and a high level of TIL, such as CD8+ T cells and B cells, and a low score (<0) showed an immunosuppressive status and a low level of TIL; it may be applied to coding RNA genes in the context of the prediction of clinical responses to cancer immunotherapy, which may be beneficial for newly diagnosed cancer patients enrolled in Project HOPE.
Conclusion
Our own specific pancancer lncRNA profiles associated with immune status scores, the TIL status, and cancer survival may be novel biomarkers for immune responses and predictors of cancer outcomes.
Acknowledgements
This research was supported by AMED (Grant Number: JP22k0106689). We thank the members of the Shizuoka Cancer Center Hospital and Research Institute for their support and suggestions. This work was supported by the Shizuoka Prefectural Government, Japan.
Footnotes
Authors’ Contributions
Tomoatsu Ikeya designed the study, drafted the manuscript, and supported all experiments. Yasuto Akiyama organized the project and was responsible for completing the study. Haruo Miyata, Tadashi Ashizawa, and Yasufumi Kikuchi helped to collect genetic data from Project HOPE. Akira Iizuka, Chie Maeda, Akari Kanematsu, and Kazue Yamashita helped to summarize genome data and write the manuscript. Kenichi Urakami and Takeshi Nagashima analyzed DNA sequence data and performed the statistical analysis. Keiichi Ohshima and Ken Yamaguchi reviewed the manuscript. All Authors read and approved the final draft of the manuscript.
Data Availability Statement
The datasets generated and/or analyzed during the present study are available in the NBDC Human Database repository (https://humandbs.dbcls.jp, accession number JGAS000274).
Conflicts of Interest
The Authors declare that they have no conflicts of interest.
Artificial Intelligence (AI) Disclosure
No artificial intelligence (AI) tools, including large language models or machine learning software, were used in the preparation, analysis, or presentation of this manuscript.
- Received June 3, 2025.
- Revision received July 17, 2025.
- Accepted July 18, 2025.
- Copyright © 2025 The Author(s). Published by the International Institute of Anticancer Research.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.













