Abstract
Background: Cancer genomic signatures may vary using different platforms. We compared the differential gene expression in non-small cell lung cancer (NSCLC) between two platforms in order to find the most relevant genomic signatures of tumor recurrence. Materials and Methods: We analyzed gene expression in frozen lung cancer tissue from 59 selected patients who had undergone surgical resection of NSCLC. These patients were divided into two groups: group R, patients who had a tumor recurrence within four years, n=37; group NR, patients who remained disease-free four years following initial surgery, n=22. Each RNA sample was assayed twice using both Affymetrix and Illumina GeneChip. Data were analyzed by principal component analysis and leave-one-out cross-validation. Results: Using the same filtering criteria, 13 genes that were differentially expressed between R and NR were identified by Affymetrix, while 21 genes were identified by Illumina GeneChip. In common, a total of six genes were detected by both systems. Using univariate analysis, four (lipocalin 2, LCN2; parathyroid hormone-like hormone, PTHLH; ras-related protein Rab-38, RAB38; and four jointed box 1, FJX1) of these six genes were associated with survival. A risk score of survival was calculated according to the four-gene expression. There was a significant difference in overall survival between low- and high-risk groups. Conclusion: A four-gene signature is associated with survival among patients with early-stage NSCLC. Further validation of these findings is warranted.
Lung cancer is the leading cause of cancer death. Non-small cell lung cancer (NSCLC) accounts for 80% to 85% of all lung cancer cases. Most patients present with advanced disease (1). Despite recent advances in multi-modal therapy, the overall 5-year survival rate for NSCLC remains of the order of 8 to 12%. Surgery is still the first choice of treatment for localized NSCLC if the patient's physical condition allows it. However, the result of surgical treatment remains unsatisfactory, and 35-50% of such patients will experience disease relapse within 5 years.
The microarray techniques, first described in 1994 by Drmanac and Drmanac (2), use the method of hybridization of large-scale cDNA with a mass probe to identify the expression of individual genes. A method by Affymetrix GeneChip utilizes silicon chips where more than 400,000 oligonucleotides can be synthesized on a single 1.6-cm2 microscopic glass slide (3). The major advantage of DNA microarray is the screening of large numbers of genes with greater sensitivity using a smaller amount of sample. Currently, up to 30,000 cDNA probes can be placed on a small microscope glass slide, with the future goal to screen the entire human genome (approximately 30,000 expressed genes, that almost cover the entire genome) in one experiment. It will be a powerful tool in studying the effects of changes made in cellular signaling pathways and identifying the changes that occur in the function of other genes downstream of various genetic alterations. Gene expression analysis using microarrays, combined with supervised clustering data analysis, could provide insight into the molecular signature of lung cancer and its treatment and prognosis. This would provide the basis for more effective therapeutic intervention.
There have been several advances in the molecular prediction of individual clinical outcome by microarray technologies. In breast cancer, the success was manifested in the commercial gene tests, such as Oncotype DX (4) and MammaPrint (5, 6). In current studies of biomarker identification, genes are ranked according to their association with the clinical outcome. Nevertheless, each individual gene selection algorithm has different strengths and limitations. A modified model combining multiple gene selection might provide a better method to identify novel biomarkers. In 2007, Chen et al. used computer-generated random numbers to assign 185 frozen specimens for microarray analysis, real-time reverse-transcriptase polymerase chain reaction (RT-PCR) analysis, or both (7). They studied gene expression in frozen specimens of lung cancer tissue from 125 randomly selected patients who had undergone surgical resection of NSCLC and evaluated the association between the level of expression and survival. A five-gene signature (including dual-specificity phosphatase 6, DUSP6; monocyte-to-macrophage differentiation associated protein, MMD; signal transducer and activator of transcription 1, STAT1; v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3, ERBB3; and lymphocyte-specific protein tyrosine kinase, LCK) was an independent predictor of relapse-free and overall survival. However, there is no fully-validated and clinically applied model for predicting lung cancer recurrence after surgery.
To our knowledge, ours is the first study to use two microarray platforms (Affymetrix and Illumina) to analyze the genomic signatures of tumor recurrence in resectable lung cancer.
Materials and Methods
Selection of adequate samples from the Chang Gung Memorial Hospital (CGMH) tissue bank. We used the lung tumor tissues collected by the CGMH tissue bank according to its standard procedures, including written informed consent obtained from patients. A pathologist reviewed a tissue section of the archived fresh frozen tissue. Only samples containing at least 50% cancer cells were used. We collected 549 lung tumor tissues from the CGMH tissue bank during the period between January 2004 and December 2007. According to the clinical database obtained from the Cancer Registry in CGMH-Linkou Medical Center, from these further selected 183 lung tumor specimens for RNA extraction. A total of 59 acceptable samples of RNA were isolated for microarray analysis with Affymetrix and Illumina GeneChip.
Identification of clinical information. The accuracy of the clinical database had been confirmed by two doctors from the Division of Thoracic Surgery and Chest Medicine in CGMH. All of the cases had been selected by one doctor at first, then the selected patients were cross-confirmed by the other doctor, both followed the same standard operating procedure (Table I) to confirm the criteria needed.
Tumor classification based on tumor recurrence within four years and RNA extraction. The tumor and non-tumor parts of each sample were identified and sampled by a pathologist. The patients were divided into two groups: group R, patients who had a tumor recurrence within four years of surgery; group NR, patients who remained disease-free four years following initial surgery. Tumor samples from the 37 recurrent (R) and 22 non-recurrent (NR) NSCLC cases were utilized. A pathologist reviewed the tissue section of the archived fresh frozen tissue. Only samples containing at least 50% cancer cells were used. One section was stained by hematoxylin and eosin (H&E) to confirm the adequacy of the tumor tissue. The rest of the samples were used for total RNA extraction. Following mechanical tissue disruption, total tumor RNA was extracted using the RNeasy Mini kit (QIAGEN, Hilden, Germany). RNA integrity was assessed with the total RNA Nano Chip Assay on an Agilent 2100 Bioanalyzer (Agilent Technologies GmbH, Berlin, Germany). The RNA integrity number (RIN score) was generated for each sample on a scale of 1-10 as an indication of RNA quality. Total RNA with RIN score >5 was used for microarrays. Prior to array analysis, one round of T7 promotor-based RNA amplification was performed.
Microarrays, microarray data processing and normalization. Affymetrix® HG-U133 Plus 2.0 mRNA expression arrays (Affymetrix, Santa Clara, CA, USA) were used in order to determine the expression of 47,400 transcripts, corresponding to 38,500 human genes (8). These arrays have proven high reproducibility for mRNA expression analysis (9). Briefly, 1-15 μg total RNA was reverse-transcribed into cDNA, followed by RNase H-mediated second-strand cDNA synthesis. The double-stranded cDNA was purified and served as a template in the subsequent in vitro transcription (IVT) reaction. The IVT reaction was carried out in the presence of T7 RNA polymerase and a biotinylated nucleotide analog/ribonucleotide mix for complementary RNA (cRNA) amplification and biotin labeling. The biotinylated cRNA targets were then cleaned up, fragmented, and hybridized to GeneChip expression arrays. A hybridization cocktail was prepared, including the fragmented target, probe array controls, bovine serum albumin (BSA) and herring sperm DNA. It was then hybridized to the probe array during 16 hours' incubation. Specific experimental information was defined using Affymetrix® Microarray Suite on a PC-compatible workstation.
Illumina BeadChip Ref-6 (Ambion, Inc., Austin, TX, USA) containing 48,804 probes was used to perform a whole human gene expression profile. Biotin-labeled cRNA for hybridization was generated by in vitro transcription based on Eberwine protocol using Illumina's recommended commercial kits. Briefly, 500 ng total RNA was reverse-transcribed into cDNA, followed by linear amplification steps according to the Illumina TotalPrep RNA Amplification kit for Illumina. Hybridization was performed with 1.5 μg biotin-labeled cRNA in each array of BeadChip Ref-6. After 16 hours incubation at 58°C, BeadChips were washed, stained and scanned according to the manufacturer's manual. Scanning used the Illumina BeadArray Reader software together with the Illumina BeadStation 500 platform. Processing and analysis of the microarray data were performed with the Illumina BeadStudio v3.3.7 software. The data were normalized using the quintile option.
Recurrence definition.
Top 20 differentially expressed genes in the non-recurrence group by the Affymetrix platform in 16 cases.
Identification of predictive transcripts by leave-one-out cross-validation. To identify predictive transcripts, a leave-one-out process was used. Predictive gene signatures were generated using the expression profiles and sensitivity data of all 59 test tumors as a training set. Leave-one-out cross-validation (LOOCV) involved removing a single tumor from the original training set of 59 tumors and using the remaining 59 tumors as the training set and the removed tumor for validation. This procedure was repeated in away that each tumor in the original training set was used once for validation.
Network visualization and analysis. Network analyses of differentially expressed genes were performed using the MetaCore Analytical Suite (GeneGo Inc., St Joseph, MI, USA; http://www.genego.com) (10, 11). MetaCore is a web-based computational platform and useful for analyzing a cluster of genes in the context of regulatory networks and signaling pathways. For the network analysis of a group of genes, MetaCore can be used to calculate the statistical significance (p-value) based on the probability of assembly from a random set of nodes (genes) of the same size as the input list (11, 12).
Results
Functional analysis of gene expression signature by the Affymetrix platform. After the differentially expressed genes were selected under the criteria (p<0.05 and fold change>2), principal component analysis (PCA) on the resulted 97 genes could clearly separate R tumors from NR tumors. All of 97 genes were ranked by fold changes and the top 20 genes were listed in Table II. The differentially expressed genes listed in Table II represent the genes that were expressed higher in NR tumors, i.e., they were the down-regulated genes in R tumors. In order to explore the potential biological functions of these genes, we uploaded these 20 genes into the MetaCore database and algorithm and derived the following functional networks that contained 7 root genes (denoted as blue circles in Figure 1). The up-regulated genes in NR tumors included lipocalin 2 (LCN2), which was involved in induction of the epithelial to mesenchymal transition.
Functional networks of 16 individual lung cancer tumors. Some genes are labeled by aliases. The genes (blue circles) that were up-regulated in non-recurrent tumors (also listed in Table II) were generalized into the networks using the MetaCore program.
Comparison and correlation between Affymetrix and Illumina platforms. Following validation by LOOCV, we analyzed those genes which were either up-regulated or down-regulated by two-fold using Student t-test (NR vs. R, p<0.05, |NR-R|>1), a total of 13 genes were selected and identified with the Affymetrix (Figure 2A) platform. Furthermore, using the same filtering criteria, we were able to identify 21 genes in the parallel Illumina database (Figure 2B). There were six common genes identified in both the Affymetrix and Illumina platforms. Using univariate analysis, four (lipocalin 2, LCN2; parathyroid hormone-like hormone, PTHLH; ras-related protein Rab-38, RAB38; and four jointed box 1, FJX1) out of these six genes were associated with survival.
Survival prediction using the four-gene prognostic model. A four-gene signature was further analyzed for correlation of gene expression and the disease-free (DFS) and overall survival (OS) from initial surgery by Kaplan-Meier analysis. The risk score was generated for each sample on a scale of 0-4 as an indication of expression of these four genes. A score of 1 was assigned for each gene if it is up-regulated in the tumor. A tumor risk score ≥3 was identified as high-risk (patients who had low tumor expression of these four genes) and ≤2 as low-risk (those who had high tumor expression of these four genes). As a result, a four-gene signature was identified, which was well-associated with DFS and OS. The four-gene signature stratified patients into high- and low-risk groups with distinct postoperative survival in Kaplan-Meier analyses (log-rank p<0.05). The median DFS (Figure 3A) was not reached in the low-risk group and was 20.4 months (95% confidence interval (CI)=3.6 to 37.2 months) in the high-risk group (p=0.001). The median OS (Figure 3B) was not reached in the low-risk group and was 37 months (95% CI=2.6 to 71.4 months) in the high-risk group (p=0.014). The four-gene risk score was also correlated with tumor recurrence. Thirty-one (78%) of 40 high-risk patients developed a recurrence, as compared to 6 (32%) of 19 low-risk patients (p=0.001). There was thus a significant difference in overall survival between the low- and high-risk groups using the four-gene prediction model.
Different gene expression between recurrence (R) and non-recurrence (NR) groups in Affymetrix (A) and Illumina (B) platforms. There were 13 genes selected and identified in the Affymetrix platform, while 21 genes were found in the Illumina platform. Student t test (p<0.05, |NR-R|>1) was used.
Discussion
Lung cancer continues to be a major deadly malignancy. The mortality caused by this disease could be reduced by improving the ability to predict cancer patients' survival. It is also important to identify clinically relevant prognostic biomarkers in order to develop personalized treatment. In this study, we identified a four-gene signature, differently expressed between the R and NR groups in NSCLC using both Affymetrix and Illumina microarray platforms. The identification of four genes that can predict the clinical outcome in patients with NSCLC may reveal targets for the development of therapy for lung cancer. LCN2, a member of the lipocalin family that transports small lipophilic ligands, has gained recent attention as both a potential biomarker and a modulator of human cancer (13). LCN2 has been shown to induce the epithelial to mesenchymal transition in breast cancer cells and to promote breast tumor invasion (14). PTHLH is an important chondrogenic regulator; however, the gene has not been directly linked to human disease (15). RAB38 is a member of the RAB small GTPase family that regulates intracellular vesicle trafficking (16). RAB proteins and their effectors coordinate multiple stages of membrane transport, such as vesicle formation, motility, and tethering to their target compartments. These proteins can also be found in both membrane-bound and cytosolic forms and are prenylated at their carboxyl termini (17).
Several studies of NSCLC have reported the ability to generate expression signatures for identification of grouping subjects according to their survival outcomes. However, most studies are small and typically collect data from a single institution. Shedden et al. reported a large, training-testing, multisite blinded validation study aiming to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas (18). They examined whether microarray measurements of gene expression, either alone or combined with basic clinical covariates (stage, age, sex), can be used to predict OS of patients with lung cancer. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early-stage lung cancer. They also provided the largest available set of microarray data, including four institutions (University of Michigan Cancer Center (UM), Moffitt Cancer Center (HLM), Memorial Sloan-Kettering Cancer Center (MSK) and the Dana-Farber Cancer Institute (DFCI)), with extensive pathological and clinical annotation for lung adenocarcinomas. They used several methods to analyze these data sets, including gene clustering (method A), univariate testing (methods B to G) and analysis on a mechanistic basis (method H). Method A, which worked with all tumor samples or with stage I samples alone, both with and without clinical covariates, showed the best overall predictive ability. In the MSK test, if sensitivity was 0.9, specificity was 0.3. In DFCI test, if sensitivity was 0.9, specificity was 0.2. To date, a microarray model with high sensitivity and specificity to predict tumor recurrence and patient survival has yet to be developed.
Kaplan-Meier plot of disease-free survival (DFS) (A) and overall survival (OS) (B) for patients with high-risk versus low-risk tumors. The median DFS was 20.4 (95% CI=3.6 to 37.2) months and not reached, respectively. The median OS was 37 (95% CI=2.6 to 71.4) months and not reached, respectively.
In this study, we utilized two microarray platforms in order to analyze the correlation of genomic signature and tumor recurrence. Although there were six genes to both systems, a further selected four-gene signature was able to predict the DFS and OS of patients who might have good prognosis from initial surgery. This four-gene signature also stratifies stage II lung adenocarcinoma patients into two distinct DFS (log-rank p=0.014) and OS (log-rank p=0.036) groups (data not shown). The next step is the validation of the prediction of tumor recurrence within four years using this scoring system (four-gene signature) in the clinic.
In conclusion, we identified a four-gene signature of differently expressed genes between R and NR groups in NSCLC by both Affymetrix and Illumina platforms. The selected four-gene signature was able to predict the DFS and OS from surgical therapy. Further investigation of functional analysis and clinical validation is warranted.
Acknowledgements
Authors are grateful to Dr. Lee Yun-Shien for data evaluation and Dr. Hong Ji-Hong for administrative support. This study was supported by a grant to John W.C. Chang from CGMH (CMRPG380761 and CMRPG3B0081), Taiwan.
- Received January 16, 2012.
- Revision received February 23, 2012.
- Accepted February 27, 2012.
- Copyright© 2012 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved