Abstract
Background/Aim: Breast cancer is a common type of cancer in women, and metastasis frequently leads to therapy failure. Using next-generation sequencing (NGS), we aspired to identify the optimal differentially expressed genes (DEGs) for use as prognostic biomarkers for breast cancer. Materials and Methods: NGS was used to determine transcriptome profiles in breast cancer tissues and their corresponding adjacent normal tissues from three patients with breast cancer. Results: Herein, 15 DEGs (fold change >4 and <0.25) involved in extracellular matrix (ECM)–receptor interaction signaling were identified through NGS. Among them, our data indicated that high HMMR expression levels were correlated with a poor pathological stage (p<0.001) and large tumor size (p<0.001), whereas high COL6A6 and Reelin (RELN) expression levels were significantly correlated with an early pathological stage (COL6A6: p=0.003 and RELN: p<0.001). Multivariate analysis revealed that high HMMR and SDC1 expression levels were significantly correlated with poor overall survival (OS; HMMR: adjusted hazard ratio [aHR] 1.93, 95% confidence interval [CI]=1.10-3.41, p=0.023; SDC1: [aHR] 2.47, 95%CI=1.28-4.77, p=0.007) for breast cancer. Combined, the effects of HMMR and SDC1 showed a significant correlation with poor OS for patients with breast cancer (high expression for both HMMR and SDC1: [aHR] 3.29, 95%CI=1.52-7.12, p=0.003). Conclusion: These findings suggest that HMMR and SDC1 involved in the ECM–receptor interaction signaling pathway could act as effective independent prognostic biomarkers for breast ductal carcinoma.
Breast cancer is the most common type of cancer in women worldwide. Although the incidence of breast cancer in some Western countries has recently fallen, its incidence and associated mortality rate continue to increase in Asia (1). Breast cancer is a heterogeneous disease with considerable variation in histological and biological features. Depending on the gene expression signature, breast cancers are classified into four subtypes, namely luminal A, luminal B, epidermal growth factor receptor 2 (HER2), and basal like (2, 3). In general, luminal A-type tumors have high estrogen receptor (ER) and progesterone receptor (PR) expression levels and no HER2 expression (ER+/PR+/HER2−), whereas luminal-B type tumors have not only ER and PR expression but also HER2 expression (ER+/PR+/HER2+). Moreover, HER2-type tumors exhibit high HER2 expression but no ER or PR expression (ER−/PR−/HER2+). By contrast, basal-like tumors do not have ER, PR, or HER2 expression (ER−/PR−/HER2−); thus, this type of breast cancer is also called triple-negative breast cancer (TNBC). Studies have revealed that patients with basal-like cancer have the least favorable prognosis and 5-year survival, whereas luminal A, luminal B, and HER2 have more favorable prognoses and a well-differentiated cellular morphology (3). The unfavorable prognosis of patients with breast cancer is strongly associated with metastatic events (4-6). Furthermore, metastasis is the major cause of cancer-related deaths, which has been a major challenge in treating breast cancer. Therefore, understanding the mechanism of metastasis and identifying favorable diagnostic and prognostic biomarkers would facilitate improving the survival rate of patients with breast cancer.
Gene expression signatures are commonly used as biomarkers for predicting breast cancer progression and prognosis (7, 8). Next-generation sequencing (NGS) is a powerful, high-throughput technology for comprehensively profiling gene expression and enables for screening of aberrant genes and determining gene signatures for breast cancer (9, 10). The Cancer Genome Atlas (TCGA) is an online database that collects substantial NGS gene transcriptome data and clinical data related to breast cancer. These clinical data and information were de-identified and released for research use. Zhong et al. identified the six critical gene signatures by analyzing TCGA database. Furthermore, six-gene expression signatures can be used as independent prognostic markers in survival prediction of patients with breast cancer (11). Using TCGA database, Ong et al. successfully identified a powerful biomarker panel for predicting the prognosis progression of patients with head and neck cancer (12). Therefore, by analyzing these big data, researchers can perform prospective clinical trials and evaluate the impact of candidate genes on breast cancer (13).
In this study, the expression profiles of breast cancer tissue and adjacent normal tissue obtained from three patients were determined by using the NGS approach. Through pathway enrichment analysis, several differentially expressed genes (DEGs) involved in the extracellular matrix (ECM)–receptor interaction signaling pathway were identified. By further analyzing TCGA database, it was determined that HMMR and SDC1 deregulation could be an effective independent prognostic biomarker for breast cancer.
Materials and Methods
Clinical samples. Clinical samples were collected from patients with breast cancer who had signed an informed consent form and agreed to undergo a surgical procedure at Kaohsiung Veterans General Hospital. This study was approved by the Institutional Review Board (IRB) of Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan (IRB No. VGHKS16-CT10-08). In this study, a total of six clinical samples, comprising three invasive ductal carcinoma (IDC) and three corresponding adjacent normal tissue samples, obtained from three patients with breast cancer were subjected to NGS.
Laser capture microdissection. Fresh-frozen breast cancer specimens were embedded in ornithine carbamyl transferase (Thermo Fisher Scientific Inc., Waltham, MA, USA) stored at −80°C and were then sectioned at a thickness of 10 μm and mounted onto MMI-Membrane Slides (Leica Microsystems, Wezlar, Germany). The frozen sections were fixed with ethanol for 30 sec and then stained with hematoxylin and eosin staining before dehydration (30 sec each in 95% and 100% ethanol). After air drying, the sections were laser microdissected using an MMI Microdissection Instrument (Olympus, Tokyo, Japan). Laser capture microdissection (LCM) caps were used to collect captured cells from the slices until the caps were filled with tumor cells obtained from a single sample from each patient.
Extraction of RNA from LCM-captured cells. The total RNA of LCM-captured clinical samples was extracted using the miRNeasy Micro Kit (Qiagen, Hilden, Germany) as directed in the instruction manual. Briefly, tissue samples were homogenized in 700 μl of QIAzoI Lysis Reagent at room temperature (15°C-25°C) for 5 min and mixed with 140 μl of chloroform to extract the DNA. The upper aqueous phase was transferred to a new collection tube, and 525 μl of 100% ethanol was added and mixed thoroughly through pipetting. In total, 700 μl of sample was added into a RNeasy MinElute spin column and then centrifuged at ≥8,000 × g for 15 sec. RNA integrity and quantity were evaluated using a Nanodrop 8000 platform (Thermo Fisher Scientific Inc.).
NGS process. Three high-purity N-T paired tissues were individually separated through LCM and were transferred to an RNA extraction kit (Qiagen) for RNA extraction. The library preparation and NGS processes were performed by a biotechnology company (Genomics, Taipei, Taiwan). Briefly, total RNA was treated with DNase I, and mRNA was further isolated using Oligo(dT) primer. Subsequently, mRNA was further fragmented into small templates for cDNA synthesis. A single nucleotide A (adenine) was then added to the terminal end of the fragments and ligated with adapters. Suitable fragments were selected for polymerase chain reaction (PCR) amplification. After the library preparation was completed, the library quality and concentration were examined using an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System. Subsequently, the library was sequenced using an Illumina HiSeq 4000 system.
Identification of DEGs. After NGS analysis, raw sequences were obtained from Illumina GA Pipeline software CASAVA Version 1.8 and were expected to generate more than 30 million reads per sample. After low-quality data were filtered, qualified reads were analyzed using TopHat/Cufflinks to estimate gene expression, which was calculated in fragments per kilobase of transcript per million mapped reads (FPKM). For differential expression analyses, CummeRbund was used to statistically analyze the gene expression profiles. The reference genome and gene annotations were retrieved from the Ensembl database. DEGs were screened with the following thresholds: false discovery rate (FDR)<0.001 and log2 (fold-change) >2 or <−2.
Functional annotation of DEGs. DEGs (tumor versus normal >4 and <0.25) were selected from NGS data and then mapped onto the Kyoto Encyclopedia of Genes and Genomes pathways using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Subsequently, the hypergeometric test was performed to identify significantly enriched pathways.
Expression data from TCGA. From TCGA (https://cancer-genome.nih.gov) data set, 1215 pieces of transcriptome data were downloaded and the corresponding clinical data, including those related to 113 normal cases and 1,102 breast cancer cases. Among these transcriptome profiles, those of 753 cases were included in the clinical pathological features and survival analysis in this study (the expression data of patients were excluded if the patient was male or the tissue type was ductal carcinoma in situ [DCIS]).
Statistical analysis. The paired t-test, repeated-measures analysis of variance (ANOVA), and Wilcoxon matched-pairs signed-ranks test were used to compare transcriptome expression levels between the IDC tissues and the adjacent normal tissues. The Mann–Whitney U-test and Kruskal–Wallis one-way ANOVA were used for comparing transcriptome expression levels between different clinicopathologic groups in the tumor tissues. In addition, the log-rank test and Cox proportional hazards model were used to evaluate the effects of different transcriptome expression levels (high vs. low) on breast cancer patients' overall survival (OS), disease-free survival, and development of multiple primaries. The difference was considered significant at p<0.05.
Fifteen differentially expressed genes involved in extracellular matrix–receptor interaction are shown.
Results
Transcriptome profiles of breast cancer. The transcriptome profiles of three patients with breast cancer were obtained through NGS (Figure 1A). Briefly, the subtype of the three cases of breast cancer was determined to be triple-negative type, and the cell morphology was IDC. Three IDC tissues and their corresponding adjacent normal tissues were precisely collected from the three patients through LCM. After RNA extraction and library preparation, the transcriptome profiles of the six samples were determined using the Illumina HiSeq 4000 platform. More than 30 million clean reads were obtained in each library. After mapping the clean reads to the genome, the gene expression abundances were quantified and presented in FPKM. Total of 19,170 genes were identified on average from the sample ([FPKM]>1). Most of these genes were consistently expressed between the adjacent normal group and breast cancer group.
Identification of DEGs. After obtaining the transcriptome profiles, DEGs were identified with the following thresholds: FDR<0.001 and fold change (in FPKM values between tumor tissues and the corresponding adjacent normal tissues) >4 or <0.25. As shown in Figure 1A, our data revealed that 3219 genes were dysregulated (fold change >4 or <0.25) in tumor tissues compared with normal tissues obtained from patient #1; among these, 1,914 and 1,305 genes were up-regulated and down-regulated, respectively. Moreover, 5158 genes were dysregulated (fold change >4 or <0.25) in tumor tissues compared with normal tissues obtained from patient #2; among these, 3,256 and 1,902 genes were up-regulated and down-regulated, respectively (Figure 1A). Additionally, 3,824 genes were dysregulated (fold change >4 or <0.25) in tumor tissues compared with normal tissues obtained from patient #3; among these, 1,392 and 2,432 genes were up-regulated and down-regulated, respectively (Figure 1A). Combining these results revealed that 330 genes were synchronously up-regulated (fold change >4 or <0.25 and FDR<0.001) and that 401 genes were down-regulated (fold change >4 or <0.25) in breast tissues compared with the corresponding adjacent normal tissues (Figure 1B).
Analysis of putative biological function of DEGs through pathway enrichment analysis. The aforementioned data revealed that 330 and 401 genes were synchronously up-regulated and down-regulated in the three breast cancers, respectively. Subsequently, these DEGs (up-regulation: 330; down-regulation: 401) were subjected to pathway enrichment analysis using the DAVID database. As presented in Figure 1C, these DEGs were significantly enriched in 19 signaling pathways: the ECM–receptor interaction signaling pathway (p<0.001), malaria (p=0.013), staphylococcus aureus infection (p=0.002), lysosome (p=0.003), ABC transporters (p=0.003), focal adhesion (p=0.005), the PPAR signaling pathway (p=0.01), axon guidance (p=0.011), regulation of lipolysis in adipocytes (p=0.012), protein digestion and absorption (p=0.016), phagosome (p=0.019), hematopoietic cell lineage (p=0.036), biosynthesis of amino acids (p=0.047), the PI3K-Akt signaling pathway (p=0.058), proteoglycans in cancer (p=0.064), the Rap1 signaling pathway (p=0.088), the adipocytokine signaling pathway (p=0.092), biosynthesis of antibiotics (p=0.093), and the AMPK signaling pathway (p=0.095). Among them, the ECM–receptor interaction signaling pathway plays a crucial role in modulating breast cancer metastasis (14-16). Therefore, DEGs in the ECM–receptor interaction signaling pathway were prioritized for further examination in this study. As shown in Table I, eight genes (COL5A1, COL5A2, COL11A1, COMP, FN1, HMMR, SDC1, and SPP1) were significantly up-regulated in breast cancer and enriched in ECM–receptor interaction, whereas seven genes (CD36, COL6A6, ITGA7, ITGA10, LAMC3, RELN, and TNXB) were significantly down-regulated.
The expression levels of 15 candidate genes were examined in an independent cohort by analyzing TCGA database. A total of 1,215 transcriptome expression profiles and pieces of clinical pathological data were downloaded from TCGA database, comprising 113 adjacent normal tissues and 1,102 breast cancer tumors. As presented in Figure 2 and Table II, the expression levels of the eight genes, namely COL5A1, COL5A2, COL11A1, COMP, FN1, HMMR, SDC1, and SPP1, were significantly increased (fold change≥3 and p<0.001) in breast cancer tissues compared with the adjacent normal tissues. Furthermore, the expression levels of seven genes, CD36, COL6A6, ITGA7, ITGA10, LAMC3, RELN, and TNXB, were significantly decreased (fold change<0.5 and p<0.001) in breast cancer compared to adjacent normal tissues. In summary, our data revealed that the expression levels of all DEGs (15 out of 15) in breast cancer that were determined through TCGA database analysis were consistent with our NGS data.
Next-generation sequencing profiles of three N-T-paired breast cancers. (A) Flowchart of identification of differentially expressed genes through next-generation sequencing. Genes with differential expression in breast cancer compared to the adjacent normal samples were filtered with the following thresholds: fold change ≥4 or fold change <0.25 and FDR <0.001. The numbers of differentially expressed genes in the three patients with breast cancers are respectively showed in the figure panels. (B) Venn diagrams of the number of up-regulated and down-regulated gene candidates in the three N-T-paired breast cancers. (C) Protein coding genes with differential expression were subjected to pathway enrichment analysis (bottom panels).
Expression levels of differentially expressed genes involved in the extracellular matrix–receptor interaction signaling pathway were examined by analyzing The Cancer Genome Atlas. (A) and (B) Expression data of eight up-regulated (COL5A1, COL5A2, COL11A1, COMP, FN1, HMMR, SDC1, SPP1) and seven down-regulated genes (CD36, COL6A6, ITGA7, ITGA10, LAMC3, RELN, TNXB) in 1,102 breast cancers compared to 113 corresponding adjacent normal tissues were obtained from The Cancer Genome Atlas. Boxplots of gene expression values (RPKM) for the up-regulated genes in breast cancer are shown.
Expression levels of differentially expressed genes analyzed in 1102 breast cancer cases (113 normal vs. 1102 tumor) using The Cancer Genome Atlas.
Evaluation of association between DEG expression and survival for breast cancer by Kaplan–Meier plotter. Overall, 15 DEGs involved in the ECM–receptor interaction signaling pathway were identified. To further evaluate the clinical impacts of these DEGs, an online survival analysis tool, the Kaplan–Meier plotter, was used to rapidly assess the effect of the 15 DEGs on breast cancer survival. This database collects the correlation between the survival curve and individual gene expression by analyzing the microarray data of 1809 patients (17). As shown in Table III, the expression levels of four genes, HMMR (hazard ratio [HR] 1.69, 95% confidence interval [CI]=1.36-2.1, p<0.001), SDC1 (HR=1.06, 95%CI=1.05-1.63, p<0.001), COL6A6 (HR=0.57, 95%CI=0.41-0.79, p<0.001), and RELN (HR=0.77, 95%CI=0.62-0.95, p=0.016), were significantly correlated with the OS curve of breast cancer (log-rank test p<0.05).
Evaluation of impact of 15 differentially expressed genes on overall survival of breast cancer by using Kaplan–Meier plotter.
Evaluation of clinical impact of DEGs through TCGA database analysis. The clinical impacts of HMMR, SDC1, RELN, and COL6A6 were further examined by analyzing TCGA database. A total of 753 cases were used in this study to evaluate the correlation between survival and DEG expression (data on male individuals and DCIS were excluded). Using this independent cohort, we confirmed the clinical effects of these four genes in breast cancer. As presented in Table IV, high HMMR expression levels were significantly correlated with a poor American Joint Committee on Cancer (AJCC) pathological stage (p<0.001) and large tumor size (p<0.001), whereas no correlation was observed between clinical pathological features and SDC1 expression. Low COL6A6 expression levels were significantly associated with a high AJCC pathological stage (p=0.003) and large tumor size (p<0.001; Table V). High RELN expression levels were significantly correlated with the age of patients with breast cancer (<40, 40-59, vs. ≥60 years; p<0.001). Furthermore, low RELN expression levels were highly associated with a poor pathological stage (p<0.001; Table V). Notably, our data indicated that the expression levels of HMMR were significantly higher in IDC with the triple-negative subtype compared with those with the luminal A and luminal B subtypes (Table IV and Figure 3). Conversely, higher SDC1 expression levels were observed in breast cancer with Her+ compared with the luminal A, luminal B, and triple-negative subtypes (Table IV and Figure 3). In addition, the expression levels of COL6A6 and RELN did not correlate with any breast cancer subtypes (Table V and Figure 3). These results imply that the expression levels of HMMR and RELN might facilitate classifying breast cancer molecular subtypes.
Correlation of HMMR and SDC1 expression with clinicopathological characteristics of patients with breast cancer.
HMMR and SDC1 expressions as independent survival biomarkers. The association of the expression levels of four genes, namely HMMR, SDC1, COL6A6, and RELN, with survival was also evaluated in patients with breast ductal carcinoma. Kaplan–Meier analysis revealed that high HMMR and SDC1 expression levels were significantly correlated with a poor OS curve (HMMR on OS: p=0.009 and SCD1 on OS: p=0.002; Figure 4A and B). A multivariate Cox regression model revealed a significant association between high HMMR and SDC1 expression levels with poor OS (HMMR: adjusted HR [aHR] 1.93, 95%CI=1.10-3.41, p=0.023; SDC1: aHR=2.47, 95%CI=1.28-4.77, p=0.007; Table VI).
In addition, high COL6A6 expression levels exhibited a borderline significant association with a satisfactory OS curve of breast cancer (p=0.049; Figure 4C), whereas those of RELN exhibited no significant association (p=0.219, Figure 4D). The multivariate Cox regression model revealed no significant association between COL6A6 and RELN expression and the OS of patients with breast cancer (COL6A6: aHR=0.57, 95%CI=0.29-1.11, p=0.097; RELA: aHR=0.76, 95%CI=0.42-1.40, p=0.385; Table VI). Furthermore, the effects of HMMR and SDC1 expression were combined to assess the correlation with survival in breast cancer. High HMMR and SDC1 expression levels were defined as a risk for increasing breast cancer deaths. As shown in Figure 5, high expression levels of both HMMR and SDC1 were associated with significantly shorter survival for patients with breast cancer (OS: p<0.001), even after adjustment for clinicopathologic factors (OS: aHR=3.29, 95%CI=1.52-7.12, p=0.003; Table VII).
Discussion
In the present study, 330 up-regulated and 401 down-regulated genes were identified in three TNBC samples through NGS. These DEGs were enriched in several cancer-related signaling pathways. Finally, we focused on ECM–receptor interaction signaling. The ECM comprises proteins and proteoglycans/glycosaminoglycans that provide structural support in human cells. Studies have shown that dysregulation of ECM components plays a crucial role in promoting cancer cell invasion and progression (14-16). HMMR are major components of the ECM and are involved in regulated cancer cell growth, migration, and invasion (16). In this study, our data reveal that HMMR overexpression was highly correlated with poor pathological stages and poor survival rates. Previous studies have revealed that HMMR is a multifunctional extracellular and intracellular protein and can regulate cell motility and the cell cycle (18). Stevens et al. reported that HMMR overexpression in primary lung adenocarcinoma was associated with poor prognosis (19). Akent'eva et al. showed that breast cancer cell RHAMM-selective peptides can induce apoptosis and necrosis of breast cancer cells, implying that HMMR-selective peptides have potential as a target therapy for cancer (20). Wang et al. reported that RHAMM overexpression was observed in breast cancer and that high RHAMM expression was significantly associated with poor prognostic value (21).
Analysis of the expression levels of HMMR, SDC1, COL6A6, and RELN in specific subtypes of breast cancer. A total of 653 pieces of breast cancer data were downloaded from The Cancer Genome Atlas, comprising of 385 luminal A, 103 luminal B, 33 Her2 overexpression, and 132 TRBC, respectively. The expression levels of (A) HMMR, (B) SDC1, (C) COL6A6, and (D) RELN were evaluated with RPKM by analyzing The Cancer Genome Atlas.
Correlation of COL6A6 and RELN expression with clinicopathological characteristics of patients with breast cancer.
SDC1 belongs to the syndecan family and comprises cell surface heparan sulfate proteoglycans (HSPGs) (22). SDC1 has been demonstrated to play a critical role in regulating ECM architecture and facilitating breast cancer invasion and progression (23-25). Numerous studies have revealed that SDC1 was frequently down-regulated in most human cancers, including colon cancer, head and neck cancer, and gastric cancer (26-29). However, the expression levels of SDC1 have been reported to increase in breast cancer and pancreatic cancer (30, 31). A previous study reported that SDC1-overexpression was involved in promoting cell growth and cell invasion ability in breast cancer (32). Furthermore, SDC1 knockdown could interrupt colony and spheroid formation by regulating cancer stem cell–related genes in breast cancer (33). In this study, SDC1 was significantly up-regulated in breast cancer and high SDC1 expression was positively associated with poor survival of patients with breast cancer. Our results revealed that SDC1 might play an oncogenic role in regulating breast cancer progression. Previous studies have reported similar results that SDC1 was significantly up-regulated in breast cancer and that high SDC1 expression was correlated with poor prognostic value for breast cancer (31, 34, 35). Nguyen et al. reported that high SDC1 expression was significantly associated with a higher histological grade and had a negative effect on both OS and disease-free survival for patients with breast cancer (35). In addition, soluble SDC1 was reported to be detectable in sera, and SDC1 levels were increased in the sera of patients with breast cancer compared to healthy controls; moreover, high SDC1 levels were highly associated with breast tumor sizes (36). Overall, SDC1 could not only be a diagnostic biomarker, but also a prognostic biomarker for breast cancer.
Analysis of the prognostic significance of HMMR, SDC1, COL6A6, and RELN in breast cancer. (A-D) A total of 743 overall survival curves were assessed depending on the HMMR, SDC1, COL6A6, and RELN expression levels in breast cancer tissues.
Univariate and multivariate Cox regression analysis of gene expression for overall survival of patients with breast cancer patient
Combination of HMMR and SDC1 expression levels to analyze their prognostic significance in breast cancer. Overall survival was compared according to the combined HMMR and SDC1 expression in breast cancer.
Univariate and multivariate Cox's regression analysis of HMMR and SDC1 for overall survival of breast cancer patients.
In conclusion, the present study determined gene expression profiles and identified 15 DEGs involved in the ECM–receptor interaction signaling pathway. Using the TCGA database, further evaluation was executed and it was observed that two genes, HMMR and SDC1, can be used as an effective independent prognostic biomarkers in survival prediction for breast ductal carcinomas.
Acknowledgements
This work was supported by funding from the Ministry of Science Technology (MOST 106-2320-B-075B-004) and Kaohsiung Veterans General Hospital (VGHKS 105-072 and VGHKS105-135).
Footnotes
Conflicts of Interest
The Authors declare that there are no conflicts of interest to disclose.
- Received May 7, 2018.
- Revision received June 13, 2018.
- Accepted June 20, 2018.
- Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved