Abstract
Background/Aim: We aimed to investigate the role of radiogenomic and deep learning approaches in predicting the KRAS mutation status of a tumor using radiotherapy planning computed tomography (CT) images in patients with locally advanced rectal cancer. Patients and Methods: After surgical resection, 30 (27.3%) of 110 patients were found to carry a KRAS mutation. For the radiogenomic model, a total of 378 texture features were extracted from the boost clinical target volume (CTV) in the radiotherapy planning CT images. For the deep learning model, we constructed a simple deep learning network that received a three-dimensional input from the CTV. Results: The predictive ability of the radiogenomic score model revealed an AUC of 0.73 for KRAS mutation, whereas the deep learning model demonstrated worse performance, with an AUC of 0.63. Conclusion: The radiogenomic score model was a more feasible approach to predict KRAS status than the deep learning model.
In colorectal cancer, several genomic biomarkers are being used as prognostic or predictive tools. According to the National Comprehensive Cancer Network guideline, patients with metastatic colorectal cancer are recommended to undergo tumor genotyping for several mutations, one of which is KRAS mutation, which is involved in early colorectal cancer development (1). In particular, patients with KRAS mutation have poor response to cetuximab (1) or panitumumab (2); therefore, these treatment modalities are not recommended for patients with KRAS mutation. Identification of this genomic profile requires a tumor specimen obtained by invasive surgery and a qualified clinical laboratory. However, less invasive watch and wait strategies or local excision have been the options for complete or good responders to preoperative treatment. In such cases, appropriate and qualified genomic testing is unlikely performed. Therefore, noninvasive identification of a patient’s tumor characteristics before treatment would be useful.
The radiogenomic approach can be used to reveal tumor characteristics noninvasively by extracting several texture features from the region of interest (ROI) in medical images. This method has been evaluated to predict genotype or phenotype in breast cancer (3), renal cell carcinoma (4), glioma (5), and advanced or metastatic solid tumors treated with immunotherapy (6). In colorectal cancer, several studies have investigated the radiogenomic approach with various imaging modalities to predict KRAS mutation (7-10), prognosis (11), and treatment response (12-14). Most imaging modalities of those studies were rectal MRI or PET for accurate tumor segmentation. Nevertheless, the computed tomography (CT) images radiation therapy (RT) planning in patients with locally advanced rectal cancer can be used for the radiogenomic approach. In addition to the radiogenomic approach, the deep learning method can be used to predict tumor phenotype. Deep learning is a network structure in which several data processing structures are layered (15). The convolutional neural network (CNN) is a famous deep learning structure in oncology because of its promising results in terms of medical image classification and decision support. In colorectal cancer, a deep learning method using the CNN structure has been applied to predict KRAS mutation (16, 17); however, these two studies required manual tumor segmentation from the CT images. Given that the CNN imitates human visual cortex, in general, we hypothesized that medical images could be analyzed by a deep learning method without ROI segmentation.
Patients with locally advanced rectal cancer receive neoadjuvant concurrent chemoradiation therapy (CCRT). For radiotherapy, a radiation oncologist delineates the clinical target volume (CTV) for irradiation to the risk areas, which include the gross tumor with margin, mesorectum, presacral nodes, and internal iliac nodes. After irradiation with 45-50 Gy in 25-28 fractions to the pelvic area, an additional 5.4-9.0 Gy in 3-5 fractions is prescribed as tumor boost RT. The boost CTV is relatively small in order to reduce the toxicities to the other pelvic organs. Given that the boost CTV represents the gross tumor and mesorectum, we hypothesized that genomic information could be derived using the radiogenomic approach. Without a separate handcrafted tumor segmentation process, the boost CTV itself can be used to extract radiomic features and for RT planning.
Collectively, we aimed to evaluate both the radiogenomic and deep learning approaches to predict KRAS mutation in patients with locally advanced rectal cancer using the boost CTV in the radiotherapy planning CT.
Patients and Methods
This study was approved by the ethics committee and institutional review board (IRB) of Seoul National University Bundang Hospital (IRB No. B-2101-663-103). The ethics committee and IRB that approved this study waived the need for informed consent.
Study population, image, and ROI. We collected data from patients with locally advanced rectal cancer and who were eligible for neoadjuvant CCRT between January 2017 and October 2020. Specifically, patients ≥20 years old who were diagnosed with rectal cancer based on biopsy sample and who completed neoadjuvant CCRT and total mesorectal excision were included. Patients with evidence of distant metastases or concurrent malignancy on pretreatment workup were excluded from this study. After completing neoadjuvant CCRT, patients having available surgical specimen were included. Patients without pathology reports or molecular profile including KRAS status were excluded. When a patient had pathologically complete response, we collect molecular profile from the biopsy sample. For the radiotherapy, planning CT was acquired for all patients. Intensity-modulated radiation therapy (IMRT) or 3D radiotherapy was performed on supine or prone position with full bladder, respectively. The CT slice thickness was 3-4 mm, and a contrast material was routinely used. Two radiation oncologists delineated the boost volume for the reduced field (RF) plan for three-dimensional (3D) RT with a total dose of 50.4 Gy in 28 fractions or for the Simultaneous integrated boost (SIB) plan for IMRT with a total dose of 52.5 Gy in 25 fractions. For radiomic feature extraction, the ROI was the boost CTV, which included the primary tumor, the high-risk areas in the mesorectum, and lateral lymph nodes (≤2).
KRAS mutation status. All patients underwent surgical resection. To investigate the mutational status of the KRAS gene, pyrosequencing analysis using PyroMark Q24 Mdx platform was performed on the target regions exon 2 and exon 3 (codons 12, 13, and 61). When one of these regions was mutated, we defined the tumor as KRAS mutated. When the institutional panel sequencing was performed, KRAS mutation status was defined as Tire I or II (18) single nucleotide variation or indel disruption of the KRAS gene with allele frequency of ≥2% and depth of ≥100%.
Extraction and selection of radiogenomic features. Radiomic features were extracted using the Computational Environment for Radiological Research (CERR) (19), which is an open source software based on the MATLAB software (MathWorks Inc., Natick, MA, USA) platform. The CERR extracted features and calculated scalar values according to the image biomarker standardization initiative guideline (20).
The first order statistics, peak/valley, shape, intensity volume histogram, and the higher order features of the ROI were extracted. For the first order statistics, the following features were extracted: min, max, 10th percentile, 90th percentile, median, mean, range, variance, standard deviation, skewness, kurtosis, energy, total energy, root mean square, mean absolute deviation, robust mean absolute deviation, robust median absolute deviation, interquartile range, quartile coefficient of dispersion, coefficient of variation, and entropy with a bin width parameter of 25. The higher order features included the Gray Level Co-occurrence Matrix (GLCM), gray level run length matrix, the Grey-Level Zone Length Matrix (GLZLM), neighborhood gray tone difference matrix, and neighborhood gray level dependence matrix. These 3D calculation results were reduced into scalar features for each directional offset. Then, the mean, max, and standard deviations were calculated from these scalar values. Among the radiomic features, the shape feature was not calculated, because the CTV, which was the ROI, was relatively circular and had a homogeneous shape among patients.
The first order statistics, peak/valley, shape, intensity volume histogram, and the higher order features were calculated from both the original and filtered CT images, which were resized to 0.1×0.1×0.1 cm3 voxels by the linear interpolation method. Then, the Hounsfield unit values were resampled into 400 discrete bin widths. The filtered images were obtained by 3D wavelet filtering (21). The Haar and Coiflets filtering types were used for normalization. The original CT images were decomposed by all directional high pass filtering.
In total, 378 features were calculated in each patient. To select the features that were significantly related with the genomic profile, we adopted the Lasso regression method. For each lambda in the grid, nonzero coefficients were estimated. From those lambda values, the optimal lambda was selected by 10-fold cross validation method. If the optimal lambda indicated no nonzero coefficients, the next lambda value was selected. Thereafter, the radiogenomic score was calculated by linearly combining the selected features with their nonzero coefficients. The Lasso analysis for model selection and prediction was performed using STATA 16 statistical software (StataCorp, College Station, TX, USA).
Deep learning network. We constructed a simple 3D classification deep learning network (VoxNet), as suggested by a previous study (22). Three 3D-CNN layers and three leaky ReLu layers were arranged alternately, followed by arrangement of the max pooling, fully connected, and ReLu layers. Finally, the dropout and classification layers were located in order to classify the KRAS status (wild type or mutated). Details of the network structure and the parameters are provided in Table I.
After determining the bounding box around the ROI, the relevant volume was cropped. The non-ROI region within the box was set to be zero. Thereafter, the box was resized to 160×160×80 pixels, which was the input data for the VoxNet. The network was trained using the rmsprop optimizer with a fixed learning rate of 1e-4. The epoch size was determined until the best results came out. Training was performed using the MATLAB software in an NVIDIA GeFore 1080Ti GPU system. We trained and tested the network model for the KRAS mutation status; positive cases were defined as KRAS mutation. To evaluate model performance, 10-fold cross validation was adopted for KRAS mutation.
Evaluation of the radiogenomic score and deep learning network model. After generation of radiogenomic score, a logistic model was established to estimate the probability of KRAS mutation. The receiver operating characteristic (ROC) curve and the corresponding area under the curve (AUC) were calculated in the same study population because we already performed 10-fold validation to determine optimal delta value for the Lasso regression analysis.
To evaluate the deep learning network, we performed 10-fold validation for KRAS mutation prediction, then the mean AUC value was calculated.
Results
Characteristics. The patient characteristics are described in Table II. For the entire cohort, the median age was 61 years (range=33-92 years). Most patients had clinical T3 (N=70, 63.6%) or T4 (N=33, 30%) disease and clinical N1 (N=65, 59.1%) or N2 (N=24, 21.8%) disease. For CCRT, patients were treated with 3D radiotherapy or IMRT, depending on the radiation oncologist’s discretion; therefore, the ROI was derived from the 3D plan (N=78, 70.9%) and the IMRT plan (N=32, 29.1%), respectively. After surgical resection, 80 (72.7%) and 30 (27.3%) patients were found to have KRAS-mutated and wild-type rectal tumors, respectively.
Two approaches for KRAS mutation prediction. The radiogenomic and deep learning approaches that we adopted to predict KRAS status are shown in Figure 1. In the radiogenomic approach, the ROI was segmented from the RT planning CT then was progressed into texture analysis, which used the original and filtered images derived from wavelet transform by Haralick and Coiflets descriptors. The estimated features were subjected to least absolute shrinkage and selection operator (Lasso) regression analysis in terms of the KRAS status. In the deep learning approach, the ROI was reconstructed into a 3D volume, which was input for the simple VoxNet deep learning network. Details of the layer, parameter, and optimization process are described in the Methods section.
Generation and evaluation of the radiogenomic score and deep learning model. We extracted 378 radiogenomic features from original and filtered images. Lasso regression analyses selected four features with nonzero coefficients for KRAS. The correlations of these features with the KRAS mutation are depicted in a heat map (Figure 2A). The gray level co-occurrence matrix (GLCM) Haralick Correlation from the original image (Figure 2B) did not differ between the KRAS-mutated and wild type tumors. However, higher skewness and peak/valley derived from the wavelets (Haar) showed a trend of associations with KRAS-mutated status (Figure 2C, p=0.061 and Figure 2D, p=0.069, respectively). The coefficient of variation from wavelets (Coif1) was not different between the KRAS-mutated and wild-type tumors (Figure 2E, p=0.130). Therefore, the KRAS radiogenomic score was generated using the best tuning parameters (λ=0.0661311), as follows:
To evaluate the performance of the radiogenomic score, we used the ROC curve analysis and calculated AUC. The radiogenomic scores for KRAS showed an AUC of 0.730 (95%CI=0.637-0.810) (Figure 3A). Then, we identified the cutoff value for the minimal false negative and false positive values. For the KRAS radiogenomic score, the best cutoff value was −0.5462, with a sensitivity of 56.7% and a specificity of 85.0% (Figure 3B). On the other hand, the performance of the deep learning model was evaluated by an internal validation method, resulting in a mean AUC of 0.63 for KRAS mutation (Figure 3C).
Discussion
We showed that the radiogenomic features extracted from the boost CTV in the radiotherapy planning CT images could predict the KRAS status in patients with locally advanced rectal cancer. Moreover, we showed that the radiogenomic model demonstrated better performance than the deep learning model.
In colorectal cancer, KRAS mutations have been associated with poor response to EGFR tyrosine kinase inhibitor (1, 2, 23), and have a reported the incidence as approximately 40% (24). This genomic feature is routinely investigated using surgical specimens from patients who have received CCRT for locally advanced rectal cancer. However, tumor response to CCRT can vary, with a reported pathologic complete response rate of 15% to 27% (25, 26). In these cases, pretreatment imaging biomarkers may have a role in predicting genomic profile.
Some of the published radiomics studies on colorectal cancer mainly focused on chemoradiation response (12-14) or the related prognostic factors (11), and only a small number of studies performed their investigation on CT imaging modality. Yang et al. (9) developed a support vector machine (SVM) model based on 346 radiomic features derived from pretreatment contrast-enhanced CT. This SVM model was developed to differentiate between the three-gene mixed mutated group (KRAS, NRAS, or BRAF) and the nonmutated group. Although that model showed an AUC of 0.83 in the validation cohort (N=56), it was used to predict a mixture of genetic mutations rather than KRAS mutation alone. Meanwhile, our study dedicated for classifying KRAS mutation, and was based on the largest and homogeneously treated rectal cancer population (N=110) among radiogenomic studies using CT-images.
The strength of our study is that there was no need for the labor-intensive manual tumor segmentation, which has commonly accompanied radiomics studies. Both studies by Yang et al. (9) and Golia Pernicka et al. (27) were conducted under the premise that rectal tumor segmentation should done precisely. Manual segmentation requires experienced radiologists and is inevitably vulnerable to observer variability. Above all, the accuracy of segmentation of rectal tumors using axial CT images is questionable, because most cases of early and locally advanced rectal tumor are assessed by high-resolution MRI (28, 29). In this study, we used the ROI from the radiotherapy boost target. Owing to its high response rate and reduced toxicity, the boost technique is frequently used in neoadjuvant CCRT (30). Radiation oncologists have defined the boost CTV as the high risk area, including the gross tumor volume and mesorectal bed (31, 32), which represents the tumor burden. This delineation process is performed by radiation oncologists for treating patients with locally advanced rectal cancer that are eligible for preoperative CCRT. Therefore, compared with the manual precise tumor segmentation by a radiologist, the delineation process during RT planning is relatively more cost effective.
In addition, we adopted the deep learning approach to predict the genomic profile from the same ROI that was used for the radiogenomic approach. Despite the use of optimized parameters, the deep learning model showed worse performance, compared with that of the radiogenomic approach. Recent advances in the development of deep learning models depend on the size of the dataset and the computing power that supports the training of many network layers. Therefore, the relatively poor performance of the deep learning approach in the current study might have been related with the model structure, small number of datasets, and less optimized hyperparameters. Wu et al. (17) combined deep learning and handcrafted radiomics approaches to predict KRAS mutation status from two-dimensional (2D) CT images. The combined model achieved a C-index of 0.82 and was superior to the radiomics model, which showed a C-index of 0.79. He et al. (16) tested the performance of the ResNet model with three different input dimensions from axial, coronal, and sagittal 2D CT input images. In their test cohort (N=45), the deep learning model showed AUCs of 0.90 for the axial images, 0.75 for the coronal images, and 0.72 for the sagittal images. In general, the resolution of reconstructed coronal and sagittal images was not as high as the resolution of axial images. This may explain the limited performance of our deep learning model, which used 3D images having information of coronal and sagittal images. Moreover, the authors of that previous study found that additional expansion of the ROI to include the surrounding tissue may have contributed to the model performance. This is aligned with the rationale of the current study. Notably, both previous studies (16, 17) required separate handcrafted tumor segmentation processing of 2D CT images. Future large-scale research is needed to test the feasibility of the deep learning approach for 3D reconstruction of an ROI.
There are concerns about the overfitting problem given the complexity of deep learning model and the relatively small dataset. Nevertheless, the deep learning model in the current study has a shallow, simple, and tiny network structure albeit the 3D-CNN model. Of a total of 14 layers, only 3 CNN layers were fitted to our dataset, which is very simple given this is a 3D-CNN network. Rather than focusing on the result of the CNN model, we aimed to benchmark radiogenomic model, compared with a simple CNN model. This result will provide many researchers with a hint for choosing appropriate strategy. Specifically, this result will reduce the trial-and-error and will help for making a reliable model when using CT images from rectal cancer patients. Studying cancer patients in a single institution commonly suffers from a small number of eligible patients, particularly when developing machine learning or deep learning model. In order for the model to be further validated in other institutions, the source code has been released at the publicly available repository (33).
This study has several limitations. The developed models were only tested internally, because the study population was small. External validation using a large and multi-institutional dataset is required. Nevertheless, the results of the present study gave a hint on which is a feasible approach for a small dataset.
In conclusion, derivation of radiogenomic features from the CTV in RT planning CT could be a feasible approach for noninvasive prediction of KRAS status. Compared with the deep learning network model, the radiogenomic score model showed better performance.
Acknowledgements
This research was funded by the National Research Foundation of Korea, grant number 2020R1C1C1014192.
Footnotes
↵* These Authors have contributed equally to this work.
Authors’ Contributions
Data curation, Sung-Bum Kang; Formal analysis, Bum-Sup Jang and Changhoon Song; Funding acquisition, Changhoon Song; Investigation, Bum-Sup Jang and Changhoon Song; Methodology, Bum-Sup Jang; Project administration, Jae-Sung Kim; Resources, Sung-Bum Kang; Supervision, Jae-Sung Kim; Writing – original draft, Bum-Sup Jang and Changhoon Song; Writing – review & editing, Jae-Sung Kim.
Conflicts of Interest
The Authors have no conflicts of interest to declare in relation to this study.
- Received June 1, 2021.
- Revision received June 21, 2021.
- Accepted June 22, 2021.
- Copyright © 2021 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.