Abstract
Background/Aim: We report on survival data of 595 patients with stage I-III lung cancer with respect to TNM classification. Materials and Methods: We constructed a basic model consisting of stage and grade, and assessed the improvement of survival prediction after adding comorbidity data, spirometric data, clinical and laboratory parameters. Results: Body mass index (BMI) and presence of a cardiac disease reached statistical significance for prediction of overall survival in a Cox regression model. In addition to BMI (<25 kg/m2) and the presence of cardiovascular disease, the spirometric variable (FEV1) predicted early death (less than five months postoperatively). When the survival random forest method was employed to predict disease outcome, creatinine levels and VO2 max became additional variables of interest for predicting survival. Conclusion: We propose that our lung cancer database may help to identify variables (aside from histomorphological variables) that are suitable for identifying patients at risk of death after surgical treatment of lung cancer.
Lung cancer, even during early stages, generally has a poor prognosis (1, 2), especially if all patients are considered (3). Only 20-30% of all patients with lung cancer are suitable for tumor resection. When a diagnosis of lung cancer is established, three consecutive decisions must be made: (i) whether to perform immediate operation, (ii) down staging and (iii) non-surgical choice. The standard variables used to describe a particular case of lung cancer (which are also necessary for these decisions) are tumor size (T), number of affected lymph nodes (N), presence/absence of distant metastasis (M), histological type, grade, carcinomatous lymphangiosis (L), carcinomatous angiosis (V) and residual status. Three out of these parameters, T, N, and M, are used to calculate tumor stage (TNM Classification Editions sixth and seventh) (4, 5). Tumor stage is the basis for these far-reaching treatment decisions (6-10). However, other variables may influence disease outcome, including age, sex, comorbidities, biochemical variables such as creatinine level, and lung function tests (7-10). These data are usually available, but are only rarely taken into consideration by mathematical models (8). Surgical and treatment decisions are binary. If such a binary decision must be made, models that estimate the risk of death after the operation may be helpful for all of the participants in treatment decisions, including the physician, the patient, and the patient's family. In this study, we tested whether well-defined comorbidities, any exemplary laboratory parameter, or any spirometric variables have effects on long-term survival and early lethality (death between 1 and 24 months after the operation) in a post-resection population of lung cancer patients.
Materials and Methods
Patients. A total of 595 patients were included in the present database. Tumor diagnosis was performed between 1993 and 2007, and all pathomorphological diagnoses were performed in one Department of Pathology. Patients with small-cell cancer (SCLC), and non-small cell carcinoma (NSCLC) without further specification, and carcinoid tumors, or M1 were excluded. The mean follow-up time was 2.68±2.01 years. The last update of survival data took place in May 2008.
The following clinical parameters were taken into consideration: status (alive, dead, unknown), sex (female/male or not applicable), smoker anamnesis (yes, no, ex-smoker, unknown), pack-years, age at the time of diagnosis, and time-to-death. In the patient population, 456 patients underwent a lobectomy, 80 underwent a pneumonectomy, 37 underwent a segment resection, and the operative method was not documented in 22 patients.
The body mass index (BMI) was calculated using the following formula: BMI=m/l2 (kg/m2); m=mass in kg, l=height in m. The range for a normal BMI was 18.5-25 kg/m2 for men and 19-24 kg/m2 for women. Grade I obesity was characterized by a BMI of 30-35 kg/m2, grade II by a BMI of 35-40 kg/m2, and grade III by a BMI of >40 kg/m2. Low-underweight was characterized by a BMI of 17-18.5 kg/m2, moderate-underweight by a BMI of 16-17 kg/m2, and severe-underweight by a BMI of <16 kg/m2.
Histological classification. The following variables were recorded (TNM sixth edition): pT (levels 1-4), pN (levels 0-3), M (levels 0 and 1), grade (levels 1-3), L (levels 0 and 1), V (levels 0 and 1), and residual status (levels 0-2). All descriptors are part of the TNM system (5). The elements of the described sets were changed in the TNM 7th edition (4). Although we were unable to take these changes into consideration in our retrospective analysis, all variables were taken from the reports of the Department of Pathology responsible for tumor classification. The same holds true for the histological classification, where SCLC and NSCLC were distinguished. NSCLC was classified into the following subsets: squamous cell carcinoma, adenocarcinoma, carcinoma, bronchiolo-alveolar carcinoma, and NSCLC not otherwise specified. SCLC, carcinoid tumor, and NSCLC with non-specified tumor type were excluded from the present study. Stage was calculated from the TNM data (stages I-IV). As M1 cases were excluded from our study, stage IV lung cancer patients are not part of our study.
Comorbidity. The following diagnoses were extracted from the medical reports: presence of diabetes (yes, no, unknown), presence of chronic lung disease (yes, no, unknown), and presence of cardiac disease (yes, no, unknown). No further specification of the comorbidities was possible.
Spirometric data (Table IV). We evaluated the following spirometric data: inspiratory vital capacity (IVC), forced expiratory volume in one sec (FEV1), quotient (FEV1/IVC), and maximal O2 uptake (VO2 max) in relation to reference values. The units were litres for IVC and FEV-1, none for FEV-1/VC, and percent for VO2 max. In the survival analysis, we calculated the individual cut-off using the following formulas for men and women (11): KG is body mass in kg, L is height in cm, A is age in years and ± is the standard error for the results: IVC=6.10 KG - 0.028 A - 4.65±0.92 for men; FEV1=4.30 KG - 0.029 A - 2.49±0.84 for men; FEV1/IVC (%)=−0.18 A + 87.21±11.8 for men; IVC=4.66 KG - 0.024 A - 3.28±0.69 for women; FEV1=3.95 KG - 0.025 A - 2.60±0.62 for women; FEV1/IVC (%)=−0.19 A + 89.10±10.7 for women; VO2 max is expressed as a percentage of debit value for both men and women.
Based on these individual cut-off values, we classified each spirometric term as normal or reduced. For VO2 max, we examined different chosen cut-off points ranging from 60% to 90% of the reference value.
Laboratory data. We used each patient's medical report to assess the serum creatinine levels this value was used as a numerical variable in the survival analysis.
Database structure. The database structure was constructed in PostgreSQL (version 9.0, PostgreSQL Global Development Group) with a tomcat management system (version 7.0, Apache Software Foundation). The software was written in Java (version 6.17, Oracle Cooperation), Javascript (version 1.8.5 Oracle) and HTML (version 5, World Wide Web Consortium). The data were stored in a PostgreSQL database. For statistical analysis, the data were read out in a comma-separated values data structure for easy availability in R (12).
Statistical analysis. Unless otherwise noted, data are presented as mean±standard deviation (SD). Data analysis was performed in R, version 2.15.1 (12). Binary logistic regression was adopted to find predictors of early mortality. Survival data were analyzed using the Kaplan- Meier method. Cox regression analysis was applied to study dependency of survival of a set of factors. Model selection was executed using Akaike's information criterion (13), starting with a basic model including stage and grade. V, L, and residual status were not included in the basis model because the proportion of missing values exceeded 10%. Recently, random survival forests have proven to be a very flexible non-parametric method to predict survival time, given a high-dimensional set of covariates (14). This algorithm is implemented in the R package randomSurvivalForest, version 3.6.3 (15). It outputs, among other quantities, an estimate of the prediction error rate, and measures of the predictiveness, minimal depth (smaller values corresponding to a more predictive variable), and variable importance (VIMP, larger values corresponding to a more predictive variable) of each variable. In addition, the partial effect of a predictor variable on survival is visualized after averaging out the effects of all other variables. Minimal depth was adopted to select variables. p-Values <0.05 were considered statistically significant and values <0.001 were considered highly significant. Reported confidence intervals (CI) are assumed to have a coverage probability of 95%.
Results
Basic data. Basic data are summarized in Table I. The mean follow-up time for patients was 3.21±2.77 years (median=2.51 years, range=0.0065-14.1 years); 39.5% of all patients died. The cause of death was not known. Age at disease onset did not differ significantly between females (61.8±11.1 years) and males (62.3±8.3 years, p=0.63).
Pathomorphological findings. The pathomorphological findings are summarized in Table II. Most patients had T2, N0 (60.2%) diseases and were classified as G2 (50%); 53.9% had stage I lung cancer. The proportions of squamous carcinoma and adenocarcinoma were nearly equal.
Comorbidities. Comorbidity data are summarized in Table III. Fourty percent of the study patients were obese, 11.8% had a BMI >30 kg/m2 and 3.6% had a BMI exceeding 35 kg/m2. In all, 14.4% of the study patients were diabetic and approximately 20% had an additional diagnosis of chronic bronchitis. Almost one-half (48.8%) of the study patients yielded a cardiovascular diagnosis, and 5.6% yielded a creatinine level >1 mg/ml.
Characteristics of patients.
Spirometric data. All spirometric parameters were transformed into factors with only two possible levels (normal or decreased). Univariate analysis did not uncover a significant difference in survival for IVC (log-rank=2.7, p=0.10), FEV1 (log-rank=0.4, p=0.51), or FEV1/IVC (log-rank=0.5, p=0.50). We observed no statistically significant difference between patients with low or high VO2 max values, independent of the cut-off point; for the cut-off point of 60%, we observed a log-rank value of 1.0, p=1.0).
Laboratory values. The preoperative creatinine value (cut-off value=1mg/ml) was not a univariate predictor of worse survival in our data cohorts (HR (hazard ratio)=1.0026, 95% CI=0.999-1.005, p=1.0) (see Table III).
Smoking habits. Smoking is clearly related to the development of lung cancer. Among the squamous lung cancer samples, non-smokers comprised 18.9% of the female group and 13.0% of the male group (p=0.53). Non-smokers comprised 47.1% of the female adenocarcinoma samples and 20.2% of the male adenocarcinoma samples (p=0.000014).
Disease outcome. One-year survival was 86.9% (95% CI=84.1%-89.7%; 467 patients at risk). Three-year survival was 64.7% (95% CI 60.6%-69.2%; 249 patients at risk). Five-year survival was 53.9% (95% CI=49.2%-59.1%; 123 patients at risk). Ten-year survival (36.7%, 95% CI=9.9%-45.2%, 17 patients at risk) was only of limited value because of the very limited number of at-risk patients. Stage-related survival data are reported in Table V.
TNM classification of patients.
Univariate risk factors for non-survival.
Univariate survival analysis. N was found to predict disease outcome to a highly significant degree, while T, grade, and L were significant predictors of overall survival (OS). Stage, which is calculated using N, was a highly significant prognostic factor (HR=1.75 for stage I/II and 2.97 for stage I/III) (Figure 1). Tumor grade and the presence of lymphangiosis were weakly-significant for OS (HR 2.23 for G1/G2, p=0.04 and 3.10 for G1/G3, p=0.0039). Carcinomatous angiosis was not reported frequently enough to calculate an HR for it. The histological subtype was of no interest except for the adenosquamous subtype, which had an HR of 2.257 (p=0.038). Using the univariate significant variables (Table II), we calculated a Cox regression model and considered multivariate variables significant as the basic model for survival against which all other models were tested, whether or not they improved the prediction of disease outcome further. This basic model consisted of stage and tumor grade.
Spirometric variables.
Survival data.
Height and body weight did not predict OS. However, calculating the BMI yielded a significant decrease of survival probability with decreasing BMI. A cut-off point of 25 kg/m2 exhibited a trend towards reduced survival probability. However, when we used cut-off points between 20 to 24 kg/m2 the p-values were 0.0375, 0.0231, 0.0035, 0.0033, and 0.0575 respectively in the log-rank test. Therefore, the optimal cut-off for a binary system was a BMI of 21 kg/m2. Diabetes was observed in 15.1% of all patients under study. Patients with diabetes as a comorbidity tended to experience better survival (5-year survival of non-diabetic patients was 52.7%, 95% CI=48.1%-58.9%; 5-year survival of diabetic patients was 60.7%, 95% CI=48.4%-76.0%; Figures 2 and 6). However, this difference did not reach statistical significance (HR=0.787; p=0.292). Chronic bronchitis and cardiovascular disease (Figure 3) did not influence long-term disease outcome (HR=1.129; p=0.508; HR=1.161 p=0.307). Spirometric parameters did not discriminate significantly between patients with good or worse disease outcome (Figure 4). Increased creatinine level had a non-significant impact on OS (HR=1.0026, p=0.066).
Multivariate survival analysis. When Cox regression analysis was applied, we noted that stage (calculated using T and N) and tumor grade were significant multivariate predictors of disease outcome, with z values of 2.18 for the stage I/II step (p=0.030, 95% CI=1.04-2.43), 5.67 for the stage I/III step (p<0.0001, 95% CI=1.97-4.05), and 2.23 for the G1/G2 step (p=0.022, 95% CI=1.15-6.12). Existence of cardiac disease and creatinine >1 mg/ml, each exhibited a trend toward worse survival, with z values of 1.80 (p=0.072 95% CI=0.99-1.59) for the presence of cardiac disease and 0.099 for creatinine (p=0.99, 95% CI=1.0-1.01). For patients with diabetes (not further classified), we noticed a trend toward better survival (z=1.64, p=0.094, 95% CI=0.49-1.01). Neither the histology nor any of four spirometric values revealed significant p values in a multivariate Cox regression model for the whole time period (Table VI).
Cox regression revealed two additional significant predictors of survival when we used an automated Cox regression model selection procedure (function stepAIC, taken from R library MASS): the presence of cardiovascular disease (z=2.09, p=0.037) and BMI (z=-2.09, p=0.037) (Table VI).
Risk of early death. Variables identifying patients who died early after lung cancer resection are provided in Table VII. For the probability of surviving the first months after resection, higher BMI index, normal FEV1, and absence of a cardiovascular disease predict a low risk of early mortality. After six months, only tumor stage (stage III and to some extent stage II and G3) is a predictor of death.
Random survival forest method. The top variables in the random forest method were (in order of decreasing VIMP) tumor stage (depth=1.982, VIMP=0.103), grade (depth=4.574, VIMP=0.005), cardiovascular disease (depth=4.928, VIMP=0.005), BMI (depth=2.218, VIMP=0.016), age (depth=2.73, VIMP=0.008), creatinine (depth=2.114, VIMP=0.002), and VO2 max (depth=2.317, VIMP=0.015) (Figure 5). For the basic model, which consisted of stage and grade, we obtained an estimated error rate (EER) of 41.62%. After adding the clinical variables BMI, cardiac disease, chronic bronchitis, and diabetes to the basic model, the EER was reduced to 39.33%. When the spirometric parameters FEV1 and FEV/IVC were added, the EER was 37.11%. When all thirteen variables were included, the EER was 36.17%. When minimal depth was used to select variables, the resulting model consisted of seven variables, with the top predictor being stage (depth=2.1) followed by BMI (depth= 2.11), creatinine (depth=2.16), VO2 max (depth=2.31), age (depth=2.56), grade (depth=4.35), and cardiac disease (depth=4.68), as well as an EER of 39.8%. Owing to missing values in predictor variables, models with a larger number of predictors were based on a smaller number of cases, which in turn inflated the EER.
Discussion
Three questions must be addressed before attempting a detailed analysis of our data: Are the data reliable for the population of lung cancer patients after surgical intervention? Can we exclude the possibility of significant bias from the data collection? For which patient populations are the data valuable?
Regarding the first question, we conclude that our data are comparable with the findings of other published studies. Respectively, 1-, 2-, and 3-year OS were 86%, 75%, and 72% in a study by Pijl et al. (10) versus 83.4%, 72.5%, and 63.5% in our study. A study by Brundage et al. (1) reported very similar survival data. For example, in stage IIA, we observed 74% 3-year survival, as opposed to the 66% published by Brundage and co-workers (1). Five-year survival was 64.8% in the present study, compared with 55% published by Brundage et al. (1). In a study by Birim et al. (9), the 1-year survival was 77% (95% CI=74-81%), compared with 83.4% (95% CI= 81.3-85.7%) in our study patients. Therefore, we conclude that our data are in agreement with published survival data (1, 9, 10). The 30-day early lethality in our study (2.9%) was also comparable to the one reported by Pijl et al. (10).
Cox regression model for multivariate survival analysis.
Regarding the second question, whether there is some bias in our data concerning the standard variables (TNM variables) or the additional disease-modifying variables, we found no hint of bias in our data. The proportion of adenocarcinoma exceeded the proportion of squamous carcinoma by a factor of 1.18 (400/340; Table II); in most published studies, squamous carcinoma is the most frequent subtype of NSCLC (16). The incidence of diabetes (prevalence in patients aged 60-64 years without considering sex: 10.7%, versus 13.6% in our study) and chronic bronchitis (prevalence 15-25% in older, predominantly male patients, mean age of 62 years) is roughly within the limit of a group of patients without lung cancer (17). However, the percentage of smokers in our lung cancer population clearly exceeds the incidence of smokers in the general population. This holds true for squamous lung carcinoma across the sexes; however, it is not true among female patients with adenocarcinoma. This observation is in agreement with recent publications of Charloux and co-workers (18) and Nordquist and co-workers (19). Curie et al. (20) recently published the unexpected observation of better survival among diabetic patients.
The basic model for survival analysis (in our data) did not include L, V, or histology as multivariate predictors of disease outcome. The histological classification in our retrospective data analysis was the WHO classification as published by Travis and co-workers (2); newer classification systems may change this situation (21). However, using alternative analysis methods (binary logistic regression for early postoperative death) adds new information about distinct covariables that may be of clinical value, namely BMI, creatinine, VO2 max, FEV1, and cardiovascular disease.
Early mortality from lung cancer. Analysis by binary logistic regression.
Overall survival (OS) according to lung cancer stage, N=595.
Overall survival of resected lung cancer patients in diabetes (hazard ratio 0.787, p=0.292), N=578.
Overall survival of lung resection patients according to cardiac disease, N=576.
Overall survival according to forced expiratory volume in one sec (FEV1) (hazard ratio 1.246, p=0.107), N=575.
To predict survival, a random survival forest consisting of 1,000 survival trees was fitted to the data (n=412) to achieve an estimated error rate of 39.33%. The sequence of figures above depicts the partial influence of each considered predictor on mortality after adjusting for all other predictors. For discrete variables, boxplots summarize estimates for partial values where whiskers extend out two standard errors from the mean. For continuous variables, red points indicate estimates of partial influence smoothed by dashed black lines, with dotted red lines indicating ± two standards errors. Ticks on the horizontal axis show observed values of respective variables. The figures are sorted according to the variables' relative influence on predicting mortality.
Our data are valuable post-resection for patients with stage I to stage III NSCLC. All cases with histology not specified beyond NSCLC are eliminated, as are all carcinoid tumors. Of the clinical and spirometric data, only the BMI was able to discriminate between somewhat better survival and worse survival. The presence of diabetes was not a significant predictor of survival; however, patients with diabetes exhibited somewhat better 5-year survival, in agreement with recent observations by Currie et al. (20).
Treatment decisions regarding surgical intervention are difficult in some cases, when the risk of negative consequences of an operation does not differ greatly from the benefit of intervention. In this situation, both patient and physician are over-burdened and this problem may become increasingly part of the patient/physician/family interaction, if the consequences of personalized medicine are part of the decision-making process. Given the previous and present TNM classifications of lung cancer, we can discriminate between the following levels: T [(4/5] N [(4], M [2], L [2], V [2], R[(3], y [2], and G [3/4]. The numbers in brackets are the minimal and maximal elements of the TNM descriptors (4, 5). Because all levels can be combined, there are 2,304 possibilities of lung cancer in the 6th edition of the TNM classification and 3,840 in the 7th edition of the TNM classification (4, 5). To this set of possibilities, we can add the morphological subtypes of lung cancer (2) which raises the number of combinations to 13,824 as we have differentiated only six histological subtypes in the present evaluation instead of 35. Additional risk factors or confounders (Table III) can be added to these possibilities, which adds another 29 combinations [taking only the levels normal (not present) or abnormal (present) into consideration]. Therefore, it is evident that dimension-reducing models are required, and should help to find optimal treatment decisions. Higher stage, grade, BMI, and the presence of cardiac disease are multivariate predictors of disease outcome, independent of the type of analysis (Table VI). However, creatinine >1 mg/ml and VO2 max were weak predictors of survival in the random forest method (respectively: depth=2.239, VIMP=0.013; and depth=2.35, VIMP=0.012).
In addition to clinical and/or spirometric confounders of survival, molecular biological variables such as mutations of the epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene (KRAS), and polymorphisms of matrix metalloproteinases also enter into models for the prediction of disease outcome (22, 23). With increasing numbers of potential predictors and increasing acceptance of personalized medicine, mathematical models that facilitate the use of these confounders are urgently needed to enable better treatment decisions (24). We show in this study that the clinical information about risk of early post-resection death is better demonstrated by approaches using logistic regression, and that the random forest method allows a better inclusion of the covariables of comorbidity, spriometric data, and laboratory data. However, our study as those of others (25-28), does not include enough patients for a final evaluation of the clinical input of the examined covariables. In the light of ever-increasing numbers of covariables, we propose to make these data available in an anonymized form to non-profit institutions and WHO institutions to enable them to better evaluate which variables beyond those provided by the TNM system predict short- and long-term survival of patients with lung cancer.
Acknowledgements
Supported by the Robert Bosch Stiftung (Stuttgart) and the Sabine Dörges Stiftung (Ludwigsburg).
We thank Dr. S. Klenk and M Graser for skillfull assistance in software development.
Footnotes
-
This article is freely accessible online.
- Received January 13, 2013.
- Revision received March 3, 2013.
- Accepted March 4, 2013.
- Copyright© 2013 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved