Abstract
Background/Aim: We explored the prediction of programmed cell death ligand 1 (PD-L1) expression level in non-small cell lung cancer using a machine learning approach with positron emission tomography/computed tomography (PET/CT)-based radiomics. Patients and Methods: A total of 312 patients (189 adenocarcinomas, 123 squamous cell carcinomas) who underwent F-18 fluorodeoxyglucose PET/CT were retrospectively analysed. Imaging biomarkers with 46 CT and 48 PET radiomic features were extracted from segmented tumours on PET and CT images using the LIFEx package. Radiomic features were ranked, and the top five best feature subsets were selected using the Gini index based on associations with PD-L1 expression in at least 50% of tumour cells. The areas under the receiver operating characteristic curves (AUCs) of binary classifications afforded by several machine learning algorithms (random forest, neural network, Naïve Bayes, logistic regression, adaptive boosting, stochastic gradient descent, support vector machine) were compared. The model performances were tested by 10-fold cross validation. Results: We developed and validated a PET/CT-based radiomic model predicting PD-L1 expression levels in lung cancer. Long run high grey-level emphasis, homogeneity, mean Hounsfield unit, long run emphasis from CT, and maximum standardised uptake value from PET were the five best feature subsets for positive PD-L1 expression. The Naïve Bayes model (AUC=0.712), with a sensitivity of 75.3% and specificity of 58.2%, outperformed all other classifiers. It was followed by the neural network model (AUC=0.711), random forest (AUC=0.700), logistic regression (AUC=0.673) and adaptive boosting (AUC=0.604). Conclusion: PET/CT-based radiomic features may help clinicians identify tumours with positive PD-L1 expression in a non-invasive manner using machine learning algorithms.
Lung cancer remains the leading cause of cancer-related deaths worldwide, and non-small cell lung cancer (NSCLC) accounts for approximately 85% of primary lung cancers (1, 2). Despite the development of treatment strategies for NSCLC such as thoracoscopy surgery, chemotherapy, radiotherapy, and targeted therapy, the overall 5-year survival rate is still poor (3). Recently, immune checkpoint inhibitors targeting programmed cell death protein 1 (PD-1) or programmed death ligand 1 (PD-L1) have shown better survival outcomes than conventional chemotherapy in patients with advanced NSCLC (4, 5); thus, PD-L1 inhibitors have become one of the standard treatments for NSCLC. However, PD-1 and PD-L1 expression is evaluated with immunohistochemistry, making the availability of tumour tissue by biopsy mandatory in clinical practice. This procedure is invasive, time-consuming, and does not reflect the change in PD-L1 expression throughout the course of the treatment.
F-18 fluorodeoxyglucose positron emission tomography/computed tomography (FDG PET/CT) is a standard imaging modality for lung cancer evaluation such as staging, treatment planning, and monitoring of therapy response. In addition, glucose metabolism in the tumour measured by FDG PET/CT can be a significant biomarker for characterising lung cancer. Metabolic PET parameters can provide information about biological features of lung cancer such as necrosis, histological subtype, and epidermal growth factor receptor (EGFR) mutational status (6-8). Several studies have investigated the relationship between PET parameters and PD-L1 expression (9-11).
Radiomics is a process of extraction and analysis of quantitative features from diagnostic images. Radiomics using FDG PET/CT can provide phenotype information, is non-invasive, can be repeated, and can measure the changes in the tumour throughout the treatment thus allowing a personalised assessment of the disease. Recently, radiomics with a machine learning approach has been applied to FDG PET/CT for lung cancer. These studies have shown promising results in the prediction of disease-free survival (12), histological subtype (13), or EGFR mutational status (14). However, few studies have reported the prediction of PD-L1 expression. One recent study using a predictive model investigated the potential value of a radiomic features-derived approach for assessing the PD-L1 expression status in NSCLC (15). However, the study included a heterogenous population of both operable and inoperable NSCLC. It also provided inadequate information on tumour specimens used for PD-L1 testing, which may cause a reliability problem for the PD-L1 status. Therefore, we evaluated the predictive performance of a machine learning approach with PET/CT-based radiomics using the PD-L1 expression level analysed in surgically resected primary tumours of operable NSCLC patients.
Patients and Methods
Patients. We reviewed NSCLC patients who underwent pretreatment FDG PET/CT between January 2013 and December 2018 in our hospital. The inclusion criterion of our study was operable stage at initial presentation. The exclusion criteria were set as follows: 1) other histologic subtypes besides adenocarcinoma or squamous cell carcinoma and 2) small tumours inappropriate for texture analysis. The flow chart of patient selection is shown in Figure 1. This retrospective study was approved by the ethics committee of our institution (approval no. AJIRB-MED-MDB-20-209); the requirement for informed consent was waived given the retrospective nature of the work.
Flow chart of patient selection. AD, Adenocarcinoma; FDG PET/CT, F-18 fluorodeoxyglucose positron emission tomography/computed tomography; NSCLC, non-small cell lung cancer; SCC, squamous cell carcinoma.
FDG PET/CT acquisition and radiomic analysis. FDG PET/CT was performed using a Discovery ST or STE PET/CT scanner (GE Healthcare, Milwaukee, WI, USA). All patients fasted for at least 6 h before FDG PET/CT; their blood glucose levels at the time of FDG injection were <150 mg/dL. Unenhanced CT was performed 60 min after the injection of 5 MBq/kg of FDG using a 16-slice helical CT scanner (120 kVp; 30-100 mA in the AutomA mode; voxel size 1.17×1.17×3.75 mm; matrix 512×512). The reconstruction kernel setting was standard for the GE PET/CT scanner. Emission PET data were acquired from the thigh to the head for 3.0 min per frame in three-dimensional mode with voxel size of 4.68×4.68×2.79 mm and matrix size of 128×128. Attenuation-corrected PET images (CT data were used for correction) were reconstructed using an ordered-subset expectation maximisation algorithm (20 subsets, 2 iterations).
The volume of interest (VOI) of the lung lesion was defined on PET images with a threshold of 2.5 of the maximum standardised uptake value (SUVmax). The VOI on the CT images was manually delineated in the lung window setting and segmented slice-by-slice. Imaging biomarkers with 46 CT and 48 PET radiomic features were extracted from segmented tumours on PET and CT images using the LIFEx package (ver. 4.0, http://www.lifexsoft.org). The LIFEx calculates textural features only for VOIs of at least 64 voxels. When the PET VOI did not attain the minimum number of 64 voxels, we considered it as a small tumour inappropriate for texture analysis. For PET images, intensity discretisation was automatically adjusted by the software with the number of grey levels of 64 bins, and intensity rescaling bounds were defined from 0 to 20. For CT images, intensity discretisation was adjusted with the number of grey levels of 400 bins, and intensity rescaling bounds were from −1,000 to 3,000 HU. The radiomic features included first-order textural features based on histogram statistics (maximum, mean, minimum, quartiles, standard deviation, skewness, kurtosis, entropy, uniformity), shape features (total lesion glycolysis, sphericity, compacity, volume) and higher-order textural features [grey-level co-occurrence matrix (GLCM), grey-level run length matrix (GLRLM), neighbourhood grey-level difference matrix (NGLDM), and grey-level zone-length matrix (GLZLM)].
Machine learning approach and statistical analysis. A total of 94 radiomic features were used to predict the positive PD-L1 expression status in lung cancer. Selection of the appropriate features plays an important role in improving the discriminative power of a trained classifier. The input variables that have the strongest relationship with the target variable were selected using a ranking-based method by the Gini index (16). The number of selected features was set as 5, which was a proper feature selection size to lead to the best discriminative performance regardless of the machine learning methods in a previous study (13). Finally, the top five best feature subsets were consistently used as input for all machine learning algorithms according to the definition of the PD-L1 status.
The following seven different machine learning algorithms for binary classification were evaluated: random forest, neural network, Naïve Bayes, logistic regression, adaptive boosting, stochastic gradient descent, and a support vector machine. The receiver operating characteristic (ROC) curves were used to compare the discriminative performances of the seven machine learning classifiers according to the definitions of PD-L1 positivity. The performance evaluation measure was based on the area the under the ROC curve (AUC). The model performances were tested via a 10-fold cross validation method.
The machine learning approach was performed using the Orange ver. 3.25 software (Bioinformatics Laboratory at the University of Ljubljana, Slovenia), an open-source data-mining and visualisation package (17).
Clinical variables and SUVmax were compared between subgroups divided by different cutoffs of the PD-L1 expression or H-score using the t-test for continuous variables and the chi-squared test for dichotomous variables. Clinical variables included age, sex, smoking status, histology, and size of lung tumour. Binary logistic regression analysis was performed to yield predicted probabilities of PD-L1 positivity for significant clinical variables, SUVmax, and selected PET/CT radiomic features, respectively. These predicted probabilities yielded an AUC as an index of diagnostic performance of the logistic models (18). Finally, AUCs were compared using the method derived by Hanley and McNeil (19). All tests were two-sided with the significance level set at 0.05 and were performed with MedCalc 15.5 (MedCalc, Mariakerke, Belgium).
Immunohistochemical staining and PD-L1 expression scoring. One board-certified pathologist carefully examined all haematoxylin and eosin (H&E)-stained slides to determine tumour histologic classification, according to the 2015 World Health Organization Classification of Lung Tumors (2). The pathological stage was classified based on the eighth edition of the TNM classification. We created and used tissue microarrays for immunohistochemical (IHC) analysis. Two tumour cores (2-mm in size) were collected per patient for tissue microarrays. PD-L1 antibody (clone name SP263) was used to evaluate PD-L1 expression. Sp263 was a companion diagnostic assay for OPDIVO® (nivolumab). As sp263 antibody is a companion diagnostic assay, it was tested using a prediluted antibody (ready to use) provided by Roche. The SP263 assay was conducted using a VENTANA BenchMark ULTRA instrument, as recommended by the manufacturers (20). The intensity of PD-L1 was divided into 4 scales [0 (no staining), 1 (light yellow=faint staining), 2 (yellow-brown=moderate staining), and 3 (brown=strong staining)], and the percentage of membrane expression was measured individually (Figure 2). The H-score was used to interpret the PD-L1 staining (21). The H-score was calculated as [1 × (% cell 1+) + 2 × (% cell 2+) + 3 × (% cell 3+)]. We obtained the H-scores (0-300) by multiplying the percentage of cells by the intensity score. In this study, we used the cutoff according to PD-L1 expression as 1% and 50%, respectively, and the cutoff according to H-score as 5 for the definition of PD-L1 positivity (22, 23).
Programmed cell death ligand 1 (PD-L1) expression in non-small cell carcinoma. (A) No staining of PD-L1. (B) Faint (C) moderate and (D) strong PD-L1 staining.
Results
Patient characteristics. A total of 1603 NSCLC patients were enrolled in this study. Among these candidates, 1291 patients were excluded for the following reasons: nonsurgical treatment such as palliative or neo-adjuvant therapy due to inoperable stage at initial presentation (n=1,032), other histologic subtypes besides adenocarcinoma or squamous cell carcinoma (n=49), and small tumours inappropriate for texture analysis (n=210). Finally, the remaining 312 patients, including 228 males and 84 females (mean age=66.2±9.1 years), comprised the study group (Figure 1). The average longest diameter measured from the resected tumours was 3.7±1.7 cm (range=1.1-10.0); 189 cases (60.6%) were adenocarcinoma, and 123 cases (39.4%) were squamous cell carcinoma. The primary tumour stage was mainly T1 or T2 (n=239, 76.6%), and the lymph node stage was mainly N0 or N1 (n=245, 78.5%). The mean PD-L1 expression was 17.1±28.9, and the H-score was 43.7±80.3. The clinical characteristics of the patients are summarised in Table I.
Clinical characteristics of 312 patients.
Associations of variables according to PD-L1 positive expression. We further compared several clinical characteristics between subgroups divided by different cutoffs of the PD-L1 expression or H-score (Table II). The percentage of PD-L1 positive expression was 19.6% (61/312) for PD-L1 ≥50%, 49.4% (154/312) for PD-L1 ≥1%, and 40.7% (127/312) for an H-score ≥5. Sex and histology were significantly different between groups in the cohorts of PD-L1 ≥1 and a H-score ≥5. No significant difference in PD-L1 expression with age, smoking history and tumour size was found. The SUVmax of the primary tumour was significantly different between the groups in all 3 different standards. The representative PET/CT images according to different PD-L1 expressions are shown in Figure 3.
Differences in clinical parameters according to programmed cell death ligand 1 (PD-L1) expression.
Representative positron emission tomography/computed tomography (PET/CT) images of lung cancers with low and high programmed cell death ligand 1 (PD-L1) expression. A 67-year-old female patient with invasive adenocarcinoma in right upper lobe (A, B); Maximum standardised uptake value (SUVmax) of the primary tumour was 4.5, PD-L1 expression was 0% and H-score was 0. Another 76-year-old male patient with squamous cell carcinoma in right upper lobe (C, D); SUVmax of the primary tumour was 14.3, PD-L1 expression was 90% and H-score was 230.
Radiomic feature selection and ROC curve analysis. The PET/CT radiomic features were ranked based on the Gini index (Table III). The top five best feature subsets for PD-L1 expression in at least 50% of tumour cells were CT_GLRLM_Long Run High Grey-Level emphasis (LRHGE), CT_GLCM_Homogeneity, CT_mean Hounsfield Unit (HUmean), CT_GLRLM_Long Run Emphasis (LRE) and PET_SUVmax. The top five best feature subsets for PD-L1 expression in at least 1% of tumour cells were PET_SUVmax, PET_TLG, CT_GLZLM_Short-zone Low Grey-level Emphasis (SZLGE), CT_GLRLM_Run Percentage (RP) and CT_standard deviation HU (HUstd). The top five best predictors of positive PD-L1 expression as defined by the H-score (≥5) were PET_SUVmax, PET_TLG, CT_GLRLM_LRE, CT_NGLDM_Busyness and CT_GLZLM_SZLGE.
List of the top five best radiomic feature subsets according to the definitions of programmed cell death ligand 1 (PD-L1) positivity.
The discriminative performances of the seven machine learning algorithms were compared according to the definitions of PD-L1 positivity (Table IV). For the prediction of PD-L1 expression in at least 50% of tumour cells, the Naïve Bayes model (AUC=0.712) outperformed all other classifiers followed by the neural network model (AUC=0.711), random forest (AUC=0.700), logistic regression (AUC=0.673), and adaptive boosting (AUC=0.604).
Comparison of machine-learning model performances according to the definitions of programmed cell death ligand 1 (PD-L1) positivity.
Comparison of predictive performance for clinical variables and radiomic features. The performance of the PET/CT radiomic model was evaluated using logistic regression analysis and it was compared with the performance of a SUVmax model and a combination model with clinical variables for discriminating PD-L1 positivity (Table V). Binary logistic regression analysis was performed to yield predicted probabilities using radiomic features, SUVmax, and all features (radiomic features and clinical variables), respectively. Clinical variables included sex and histology that showed significant differences according to PD-L1 positivity. PET/CT radiomic features used for analysis were the top five best features according to the definition of PD-L1 positivity. Logistic regression models in three settings yielded AUCs according to different cutoffs of the PD-L1 expression or H-score. In the PET/CT radiomic model, the AUCs for predicting PD-L1 positive expression were 0.712 for PD-L1 ≥50%, 0.660 for PD-L1 ≥1%, and 0.672 for an H-score ≥5. In the SUVmax alone model, the AUCs for predicting PD-L1 positive expression were 0.652 for PD-L1 ≥50%, 0.633 for PD-L1 ≥1% and 0.657 for an H-score ≥5. The PET/CT radiomic model had higher predictive performance than the SUVmax alone model for PD-L1 ≥ 50%, but it did not reach statistical significance (p=0.210). Both the PET/CT radiomic and SUVmax alone models showed similar predictive performance for PD-L1 ≥1% (p=0.539) and for an H-score ≥5 (p=0.672). Predictive models that comprised significant clinical variables and selected radiomic features demonstrated an AUC of 0.730 for PD-L1 ≥50%, 0.671 for PD-L1 ≥1%, and 0.680 for an H-score ≥5. They did not show significant improvement compared with the PET/CT radiomic model.
Comparison of model performances using clinical and radiomic features according to the definition of programmed cell death ligand 1 (PD-L1) expression status.
Discussion
PET parameters, such as the SUV indicating metabolic intensity, are strongly associated with the tumour biological characteristics (24). Recent research has also attempted to predict tumour-specific biomarkers by using radiomic-based texture features as well as the intensity on FDG PET/CT, where machine learning analysis can be useful to improve the diagnostic performance (15). The present study focused on an FDG PET/CT-based radiomics model for predicting PD-L1 expression in surgically resected primary tumours of operable NSCLC patients. Patients with operable NSCLC and a positive PD-L1 expression may be ideal candidates for neoadjuvant or adjuvant immunotherapy to reduce the risk of cancer recurrence (25). However, the heterogeneity of PD-L1 expression makes it more difficult to select the subset of patients that can benefit most from immunotherapy with monoclonal antibodies against PD-L1 (26). Interestingly, among PET based radiomic parameters, conventional features such as SUVmax and TLG were strongly associated with PD-L1 expression status. CT-derived textural features also contributed to increasing the diagnostic capacity, especially in NSCLC patients with >50% of tumour cells expressing PD-L1.
Jiang et al. recently evaluated the potential value of PET/CT radiomic features for assessing PD-L1 expression level in a cohort of NSCLC patients (15). They used the least absolute shrinkage and selection operator regression as the analysis method and reported better model performance than that in our study. However, they did not clarify where the tumour tissues used for PD-L1 testing were obtained from. Although another recent study showed the usefulness of PET/CT radiomics for PD-L1 prediction, they included both biopsy and surgical specimens of lung tumours (27). The results of PD-L1 testing can vary according to the types (cytology, biopsy, surgical resection) and sites (primary, lymph node, distant) of tumour specimens (28). The selection of a consistent type and site is important for accurate evaluation of PD-L1 expression. In our study, all PD-L1 tests were performed on surgically resected lung specimens from operable NSCLC patients. Compared with the previous study, our study may provide more reliable results of PD-L1 status. While the previous study also included stage I-IV NSCLC patients, we analysed patients at the operable stage. Our results may provide more useful information for additional immunotherapy to reduce the risk of relapse in curable NSCLC patients.
A number of studies have suggested that cancer-intrinsic PD-L1 facilitates cancer glucose metabolism in NSCLC (29, 30). In relation to this, some studies demonstrated that the SUVmax on FDG PET/CT might be a useful predictive marker for PD-L1 positive expression in NSCLC (31, 32). Recently, the potential value of various textural features derived from FDG PET/CT has been the focus for prediction of PD-L1 expression status (15, 33). In our study, SUVmax was significantly different according to PD-L1 positivity, similarly to that in the previous study (34). It demonstrated predictive performance with AUC of 0.652 for PD-L1 ≥50%, 0.633 for PD-L1 ≥1%, and 0.652 for an H-score ≥5. Although not statistically significant, compared with the simple SUVmax model, the radiomic model using various PET/CT features presented a slightly better diagnostic efficacy. However, only SUVmax and TLG of the radiomic parameters extracted from PET were included in our top five best feature subsets. In other words, textural features with high association with the PD-L1 expression level were those extracted from the CT portion. Recent studies have revealed that CT texture parameters could be useful for predicting PD-L1 expression in several cancers (35, 36). Jiang et al. also reported that textural features extracted from CT performed better than those of PET in assessing the expression status of PD-L1 in NSCLC (15). This finding might be explained by the relatively low spatial resolution of PET images, compared to the anatomical imaging modality (37). Therefore, further investigations using new PET scanners with better resolution are required to identify the potential value of PET-derived textural features in the prediction of PD-L1 expression status.
The positive status of PD-L1 expression in NSCLC can have different definitions as defined by various positive thresholds and scoring systems (23). Recent clinical trials reported that these definitions of PD-L1 positivity had individual potentials as biomarkers for immunotherapy (38, 39). We thus analysed the performance of PET/CT-based radiomic models for various PD-L1 cutoff values. The percentage of PD-L1 positive expression was 19.6% for PD-L1 ≥50%, 49.4% for PD-L1 ≥1%, and 40.7% for an H-score ≥5. The SUVmax of the primary tumour was significantly different between the groups in all 3 different standards. The diagnostic performance for predicting PD-L1 expression was better in the cohorts using PD-L1 ≥50% than in those using PD-L1 ≥1% and an H-score ≥5 (0.712 vs. 0.654 and 0.677) in the Naïve Bayes model. Metabolic and anatomical changes progressed in higher levels of PD-L1 expression and might be more closely related to imaging features (40). However, the proportion of the positive group is relatively small for the definition of PD-L1 ≥50%. This uneven distribution of binary data might influence the classification performances of machine learning models (41). Thus, our results should be validated, considering the data imbalance problem, in future studies.
The limitations of this study include the possibility of inherent biases associated with a retrospective study design. Although we obtained sufficient tissue volume from surgically resected lung tumour, PD-L1 testing was not performed on whole-tissue sections of the entire tumour specimen. Analysis using whole-tissue sections is the best method to evaluate PD-L1 expression in terms of cancer heterogeneity. However, it may be not feasible to enrol a large enough study population due to cost-effectiveness (42). Therefore, few previous studies have attempted to evaluate the potential value of PET/CT radiomics using PD-L1 expression levels assessed on whole sections. Moreover, although our primary endpoint was the pretreatment PD-L1 expression status, its positivity remains indeterminate and sometimes shows insufficient efficacies in predicting response to immunotherapy. For clinical use as a predictive biomarker, the direct relationship between FDG PET/CT-derived parameters and the treatment response to immunotherapy should be verified in future prospective studies.
Conclusion
This study demonstrated that not only SUVmax but various PET and CT textural features are also associated with the expression of PD-L1. Such relevant features were different according to the definition of PD-L1. Consequently, these PET/CT-based radiomic features may help clinicians identify tumours with positive PD-L1 expression in a non-invasive manner using machine learning algorithms. Supporting more accurate selection of NSCLC patients who require immunotherapy could have survival benefits.
Acknowledgements
This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2020R1F1A064955).
Footnotes
Authors’ Contributions
CHL: Data curation, investigation, writing – original draft, formal analysis. YWK: Conceptualization, investigation, writing – original draft, visualization. SHH: Methodology, supervision, writing – review & editing, validation. SJL: Project administration, software, supervision, writing – review & editing.
Conflicts of Interest
The Authors declare no conflicts of interest.
- Received September 24, 2022.
- Revision received October 13, 2022.
- Accepted October 18, 2022.
- Copyright © 2022 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).