Skip to main content

Main menu

  • Home
  • Current Issue
  • Archive
  • Info for
    • Authors
    • Editorial Policies
    • Subscribers
    • Advertisers
    • Editorial Board
    • Special Issues
  • Journal Metrics
  • Other Publications
    • In Vivo
    • Cancer Genomics & Proteomics
    • Cancer Diagnosis & Prognosis
  • More
    • IIAR
    • Conferences
    • 2008 Nobel Laureates
  • About Us
    • General Policy
    • Contact
  • Other Publications
    • Anticancer Research
    • In Vivo
    • Cancer Genomics & Proteomics

User menu

  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Anticancer Research
  • Other Publications
    • Anticancer Research
    • In Vivo
    • Cancer Genomics & Proteomics
  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Anticancer Research

Advanced Search

  • Home
  • Current Issue
  • Archive
  • Info for
    • Authors
    • Editorial Policies
    • Subscribers
    • Advertisers
    • Editorial Board
    • Special Issues
  • Journal Metrics
  • Other Publications
    • In Vivo
    • Cancer Genomics & Proteomics
    • Cancer Diagnosis & Prognosis
  • More
    • IIAR
    • Conferences
    • 2008 Nobel Laureates
  • About Us
    • General Policy
    • Contact
  • Visit us on Facebook
  • Follow us on Linkedin
Research ArticleClinical Studies

Predicting Overall Survival Using Machine Learning Algorithms in Oral Cavity Squamous Cell Carcinoma

JIA YAN TAN, JOHN ADEOYE, PETER THOMSON, DILEEP SHARMA, POORNIMA RAMAMURTHY and SIU-WAI CHOI
Anticancer Research December 2022, 42 (12) 5859-5866; DOI: https://doi.org/10.21873/anticanres.16094
JIA YAN TAN
1Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, S.A.R.;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
JOHN ADEOYE
1Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, S.A.R.;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
PETER THOMSON
2College of Medicine & Dentistry, James Cook University, Queensland, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DILEEP SHARMA
2College of Medicine & Dentistry, James Cook University, Queensland, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
POORNIMA RAMAMURTHY
2College of Medicine & Dentistry, James Cook University, Queensland, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
SIU-WAI CHOI
1Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, S.A.R.;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: htswchoi{at}hku.hk
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Abstract

Background/Aim: Machine learning (ML) models are often modelled to predict cancer prognosis but rarely consider spatial factors in a region. Hence this study explored machine learning algorithms utilising Local Government Areas (LGAs) in Queensland, Australia to spatially predict 3- and 5-year prognosis of oral cancer patients and provide clinical interpretability of the predicted outcome made by the ML model. Patients and Methods: Data from a total of 3,841 oral cancer patients were retrieved from the Queensland Cancer Registry (QCR). Synthesizing minority oversampling technique together with edited nearest neighbours (SMOTE-ENN) was used to pre-process unbalanced datasets. Five ML models: logistic regression, random forest classifier, XGBoost, Gaussian Naïve Bayes and Voting Classifier were trained. Predictive features were age, sex, LGAs, tumour site and differentiation. Outcomes were 3- and 5-year overall survival of patients. Model performances on test set were evaluated using area under the curve and F1 scores. SHapley Additive exPlanations (SHAP) method was applied to the best performing model for model interpretation of the predicted outcome. Results: The Voting Classifier was the best performing model with F1 score of 0.58 and 0.64 for 3- and 5-year overall survival, respectively. Age was the most important feature in the Voting Classifier in 3- and 5-year prognosis prediction. LGAs at diagnosis was the top 3 predictive feature for both 3- and 5-year models. Conclusion: The Voting Classifier demonstrated the best overall performance in classifying both 3- and 5-year overall survival of oral cancer patients in Queensland. SHAP method provided clinical understanding of the predictive features of the Voting Classifier.

Key Words:
  • Oral cavity cancer
  • machine learning
  • interpretability
  • SHapley values
  • prognosis

Squamous cell carcinoma involving the oral cavity is the most common malignancy arising from the head and neck region (1). Globally, 5-year overall survival rates post-diagnosis remains poor, and only around 50% of patients survive for more than five years (1). Despite advancement in multimodality treatments of curative surgery, chemo-radiotherapy, immunotherapy and other targeted therapies, late diagnosis and aggressive primary tumour have hampered the efforts for reducing cancer mortality and morbidity. Early diagnosis and better prognosis prediction are still the ultimate goals for better overall and disease-specific prognosis, especially for high-risk oral cancer patients (2, 3). Therefore, targeting this group of patients for timely treatment may improve efficacy and response towards oncological treatments (4, 5).

Whilst cancer may be influenced by lifestyle and genetic susceptibility, it may also be affected by environmental exposure (6). Spatially and temporally, disease mapping of cancer provides an explanation and prediction of disease outcome patterns within a well-defined geographic region. Our group has previously delineated at-risk regions and those at-risk of oral cancer within the Hong Kong population using Bayesian disease mapping (7). To date, even though there is an exponential utilisation of machine learning modelling for oral cancer prognosis prediction, most machine learning predictions are based on the AJCC TNM staging and clinicopathologic profiling (8, 9). Of note, the input of spatial units within a region is rarely taken into account in the prognosis prediction algorithms for cancer.

Queensland, which is located northeast of Australia, is defined by 78 spatial units gazetted as Local Government Areas (LGAs). Potentially targeting at-risk hotspots, and more so identifying those high-risk oral cancer patients in Queensland may assist budgetary allocation and resources for future targeted interventions and screening strategies. Hence, we aimed to explore machine learning algorithms utilising LGAs in Queensland to spatially predict 3-year and 5-year prognosis of oral cancer.

More appreciably, there is a strong emphasis on explainability and re-traceability of machine learning models in order for future, proper implementation (10, 11). It is crucial to understand and provide an interpretation of how a machine learning model reached its output in understandable, non-specialised terms (12). Hence, in this study, we utilised SHapley Additive exPlanation (SHAP) to facilitate interpretation of the best performing machine learning prediction model (13). Besides, it is beneficial to understand if the machine learning model is in agreement with experts and clinicians on whether certain predictors are important. Thus, understanding how the model propose decisions may further allow adoption of the model in an oncology treatment setting.

Patients and Methods

Patient dataset and ethical approval. Approval to conduct this retrospective study was obtained from the James Cook University Human Research Ethics Committee (Ref. H8609) and further approval under the Public Health Act 2005 provided by Queensland Health. The Queensland Cancer Registry (QCR) was accessed for the period 1982 (when data were first compiled) to 2018 (most recent available data); the dataset was received as a de-identified, password protected spreadsheet and managed under the Australian Code for the Responsible Conduct of Research.

Data from patients diagnosed with OSCC between 1st January 1982 to 31st December 2018 were retrieved. Only patients aged 18 years or older at the time of diagnosis with a minimum follow-up duration of 5 years were included in the development and training of the machine learning models. After data cleaning, a total of 3,841 patients met the inclusion criteria. The parameters included were age at diagnosis, sex, primary tumour site and tumour differentiation as listed in Table I. Local Government Area (LGA), classified as cities, regions and shires, was used to spatially stratify patients accordingly. Key dates included the date of diagnosis and the date of death, if the patient has passed away. Censoring date was set as 31st December 2018. The primary outcomes were 3-year and 5-year overall survival. Overall survival was modelled as a binary classification and defined as either alive or dead from cancer, as well as dead from other causes.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table I.

Variables and input features fed into the machine learning models.

Machine learning prediction algorithms. Machine learning models that focused on binary classification as in our study outcomes (alive or death at 3-years and 5-years post diagnosis) were developed to assess their performance for predicting 3- year and 5- year overall survival of OSCC patients in Queensland, Australia. Models evaluated included logistic regression, random forest classifier (14), XGBoost (15) and Gaussian Naïve Bayes (16). Random forest classifier and XGBoost were chosen for comparison as they are ensemble algorithms that are meant for weak learners and controls over-fitting. Gaussian Naïve Bayes, on the other hand, has been applied in many medical classification problems. A voting ensemble classifier, combining multiple classifiers with the best performance in the training cohort, was fitted. Herein we implemented the hard voting ensemble where the majority class, as voted by the three individual classifier of logistic regression, random forest classifier and Gaussian Naïve Bayes, was chosen as the final predicted outcome.

Model training and internal validation. The data were split into a training and testing cohort of 70:30 in which the training set consists of 2688 patient data while the testing cohort comprise of 1153 patient data. Training cohort is required for the training of the machine learning model whilst the testing cohort unseen during model training is used to validate the performance of the trained machine learning model. Class imbalance (i.e., there were many more patients who were still alive compared to those who had passed away) was handled by a combination of oversampling the minority class using synthetic minority oversampling technique (SMOTE) and under-sampling the majority class using edited nearest neighbours (ENN) (17). SMOTE is an oversampling method which will result in new synthetic examples of the minority class. Though the process improves accuracy, noisy samples, which may not be understood by the algorithm, may be introduced. Hence, application of ENN will then subsequently remove data that have outcome class differing from the observation data and its nearest neighbours in order to improve label purity.

Hyperparameters are parameters whose values control the learning process. In this study, hyperparameter optimization was determined using the grid search method with 5-fold cross validation on the training cohort to determine the parameters with the best performance. The internal validation cohort (testing cohort) unseen during training and cross-validation were used to assess the performance of the machine learning algorithms. Performance measures generated from the internal-validation dataset were the basis for comparison of the algorithms in this study.

Model performance measures and model prediction explainability. To describe and evaluate discriminative performance of the machine learning algorithms, area under the curve (AUC) and accuracy scores were calculated. Recall and F1 scores were also reported as they are more clinically relevant and applicable (18), with recall being a measure of how many of the positive cases the classifier correctly predicted (sensitivity) while F1 is a measure combining both recall and precision. As it is crucial to detect true positives and false negatives in an oncology setting for timely and effective treatment for OSCC patients, the F1 score would be the most relevant metric when assessing the 3-year and 5-year overall-survival classification. For better clinical interpretability, SHapley additive explanation (SHAP) values and SHAP interaction values were also presented (13). SHAP values represents how much weight is given to a variable to predict the final outcome in a model, and provides better understanding to the predictions made by the machine learning classifiers.

Statistical analysis and computation. Descriptive statistics were performed using the SPSS for Windows version 27.0 (IBM Corp., Armonk, NY, USA). All classification algorithms were developed and conducted in Python v 3.10.4. using sklearn (19) and XGBoost (15) packages. The SHAP library was applied to calculate SHAP values and generate interaction plots of the predictive variables in the machine learning models.

Results

Patient characteristics. The study data included 3,841 patients with squamous cell carcinoma arising from the oral cavity region meeting the inclusion criteria as presented in Table II. Of included patients, 2,490 patients were males and 1,351 patients were females. Median age of diagnosis was 63 years (IQR=55-72 years). Distribution of patients was higher in Brisbane City which comprised 972 patients (25.3%), followed by Gold Coast City with 373 patients (9.7%) and in Whitsunday Regional with 309 patients (0.78%), diagnosed over the course of 36 years. Most primary lesions involved the anterior tongue (48.1%). Distribution according to tumour differentiation showed that 656 (17.1%) patients had well differentiated tumours, 2454 (63.9%) patients with moderately differentiated tumours and 731 (19.0%) patients who had poor or undifferentiated tumours. At the time of censoring, 1572 patients (40.9%) had died within three years of diagnosis. Five-year overall survival was 51% in this study cohort.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table II.

Demographics and clinical-pathologic characteristics.

Performance of machine learning models. Following division of the data into training and validation datasets, 2,688 patients (70%) and their corresponding data were used for training and 5-fold cross validation of the machine learning classifiers while 1,153 patient data (30%) which were previously unseen during training were used for internal validation of the algorithms. Addressing the potential of an imbalanced dataset, training cohorts were handled with SMOTE-ENN. As a result, a total of 824 and 538 unique patient data were input into the training cohort for 3-year and 5-year overall survival prediction, respectively.

The discriminative performance and other performance metrics of the models are presented in Table II. Overall, in the training phase, XGBoost demonstrated the highest accuracy of 0.89, while both XGBoost and Random Forest showed the highest AUC of 0.95 for the prediction of 3-year overall survival (Table III and Figure 1). The prediction model was then applied to the test cohort. Accuracy score of 0.61 was achieved using XGBoost and all models achieved an AUC of only 0.61. Similarly, XGBoost was the most robust during the training phase for 5-year overall survival prediction with an accuracy of 0.88 and AUC of 0.94. Overall, most models for 5-year overall survival prediction had an accuracy score of 0.62. All models achieved at least an AUC of 0.62 in the testing cohort (Table III).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table III.

Performance measures of machine learning models for prediction of 3-year and 5-year overall survival.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Area under the receiver operating characteristic curve based on machine learning models predicting 3-year and 5-year overall survival.

In the machine learning models predicting 3-year overall survival, models achieved at least an F1 score of 0.54. The highest score of 0.58 was achieved by the Voting Classifier. The F1 score for machine learning algorithms for 5-year overall survival increased, with Voting Classifier achieving a score of 0.64. When considering performance using the F1 score, the best overall performance was achieved with the hard Voting Classifier for 3-year and 5-year overall survival (Table III).

SHAP summary plot. Summary plots presenting SHAP values of the 5 predictive features for each machine learning model based on the test set are shown in Figure 2a and 2b. In summary, the x-axis denotes the SHAP values, while predictive variables are presented along the y-axis according to their weights. Each dot on the summary plot reflects to an input from a single patient and dots pile up vertically to show the density of those with the same SHAP value. The position of the dot on the x-axis shows the importance of the feature on that prediction of that particular machine learning model. The effect size and its distribution can be reflected based on the tails of the plot. For the 3-year and 5-year overall survival plots, red corresponds to a higher value placed on a variable while blue corresponds to a lower value.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Summary plots for SHAP values based on Voting Classifier predicting 3-year (a) and 5-year (b) overall survival.

Overall, age at diagnosis as a variable was observed to be the most important feature in the Voting Classifier predicting 3-year and 5-year overall survival. Increasing age was associated with a poorer prognosis while those of younger age has a better overall survival. For 3-year overall survival, the top three features are age at diagnosis, LGAs at diagnosis and tumour differentiation. On the other hand, with regard to 5-year overall survival, the top 3 predictive features were age, tumour site and LGAs at diagnosis. Sex was the least important variable contributing to outcome in both 3- and 5-year survival when using the Voting Classifier algorithm.

Discussion

Oral cancer is an aggressive disease that affects speaking, eating and swallowing. More often than not, patients want to be informed of their cancer diagnosis and prognosis, particularly on how long more they are expected to live. Hence, robust predictive tools can provide guidance to healthcare providers with the necessary information to guide treatment and patient discussion (20). This study utilised retrospective data collected from patients diagnosed with OSCC in Queensland, Australia over the past 36 years to build machine learning algorithms to predict 3-year and 5-year overall survival. A comparison of XGBoost, Random Forest, Logistic Regression, Gaussian Naïve Bayes and Voting Classifier were conducted to evaluate their performance using AUC and F1 score as metrics. As the Voting Classifier takes into account the prediction of three other machine learning models, the prediction of the Voting Classifier was also interpreted with SHAP summary plots.

We observed an evolving and growing number of machine learning models for clinical prediction in terms of diagnostic or prognostic purposes particularly in oral cancer (9). To push further the implementation of machine learning modelling for prognosis prediction in actual clinical settings, it is a priori to make these models interpretable and understandable. Moreover, oncology care is currently on a paradigm shift towards personalisation and precision. The utility of the myriad data types and thereafter interpretation of individual patient prognosis require significant time and expertise. Hence, the application of SHAP values and summary plots can be utilised to interpret these predictions and demonstrate variable importance in a more time-efficient manner and can be better understood by personnel with little training in machine learning. SHAP summary plots provide a concise figure by visually demonstrating the range and distribution of importance of the features on the machine learning models’ output. Individualised plots can also be generated using SHAP force plots (Figure 3) to present an explanation based on the patient’s routine assessment. Healthcare providers and patients can review the information directly with the inputted data and better plan for further monitoring or treatment.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

SHAP force plot displaying predictive features for an individual patient.

After understanding how predictive features or variables impact OSCC patients, healthcare providers can place more emphasis and resources on those at higher risk of more aggressive cancers or death. The top most important predictive features were age at diagnosis, LGAs and tumour differentiation for the prediction of 3-year overall survival in OSCC patients. On the contrary, age at diagnosis, tumour sites and LGAs were more important for the prediction of 5-year overall survival in this Queensland dataset. Selecting patients for appropriate and timely treatment based on their respective SHAP values together with the expertise of the healthcare providers, may render better treatment efficacy and resources allocation thereafter improving overall prognosis. This is especially useful when resources are limited or affected, especially those in rural areas. Moreover, since Queensland is spatially organized into LGAs, local government may also reference the SHAP values for allocation of medical resources and healthcare providers according to LGAs as LGAs at diagnosis was heavily weighted as an important predictive feature in the summary plots presented in Figure 2a and b for 3-year and 5-year overall survival, respectively. In short, the summary plots may be implemented as a guide for budgetary and resources allocation for oral cancer patients in high risk areas in Queensland.

Before interpretation of the results presented in this study, several limitations should be taken into account. The performance of this Voting Classifier is yet to be externally validated. As LGAs in Queensland is involved as a predictive feature to stratify risk of oral cancer patients geographically, prospective data can be collected to further validate and access the real performance of the classifier. Moreover, this study was limited to the use of demographic data – age, sex and LGAs, as well as clinical data- tumour site and differentiation. These are obviously the most easily obtainable data types and available at the time of diagnosis. However, oral cancer is an aggressive and heterogenous cancer and many factors at the clinicopathological, histological and molecular levels can affect overall survival of patients (21-27). Treatment information would also contribute to the prediction of the machine learning model (28). Retrieving more data from these patients and modifying these machine learning models with the incorporation of multiple factors that is beyond tumour differentiation and staging could potentially improve accuracy, precision, and recall ability in predicting overall survival among patients with OSCC in Queensland. Moreover, lifestyle factors, such as smoking and drinking, should also be included as predictive features as these etiologic differences may affect overall survival of the patient. Reports have suggested that overall survival of OSCC patients might also be affected by existing co-morbidities that leads to other systemic diseases (29, 30).

Whilst LGA data are of considerable interest in identifying geographic regions displaying high disease incidence, they are inevitably influenced by large populations numbers in urban regions and patient referrals to tertiary head and neck centres located in city centre hospitals, such as in Brisbane and the Gold Coast. Nonetheless, previous studies suggest OSCC incidence and mortality are likely to be higher in regional and remote regions where low socio-economic status, increased risk factor behaviour, limited access to healthcare services and large Indigenous population numbers influence disease status (31).

Future directions for machine learning modelling on oral cancer prognostic prediction can focus on time-to-event algorithms to predict and stratify oral cancer patient temporally based on the use of the use of demographic, clinical, pathologic, and treatment information. Even with the advancement in treatment and therapeutic strategies, risk of recurrence and survival probability remains poor in OSCC patients. The machine learning algorithms presented here are mostly static and do not take into account the dynamic and heterogenous nature of the cancer and patient. Development of clinical prognostic prediction algorithms that can handle time-to-event data may be more superior for providing risk estimates at appropriate timeframes for better oncological monitoring and timely treatment. When available, these models may assist healthcare providers in selecting patients for timely introduction of multimodality interventions.

Conclusion

This study utilized binary classification algorithms to model and predict 3- and 5-year overall survival of oral cancer patients diagnosed across LGAs in Queensland. The hard Voting Classifier demonstrated the best overall performance in classifying both 3- and 5-year overall survival. SHAP values indicated the importance of the respective features in Voting Classifier. Age at diagnosis, LGAs and tumour differentiation were important predictive features for prediction of 3-year overall survival in OSCC patients. For the predication of 5-year overall survival, age at diagnosis, tumour sites and LGAs were more important predictive features. This study calls for further inclusion of clinicopathological information to improve discriminative performances of the Voting Classifier before actual implementation in the clinical setting in Queensland.

Footnotes

  • Authors’ Contribution

    Jia Yan Tan – Wrote the manuscript, conducted the machine learning. John Adeoye – Advised on manuscript and advised on the machine learning. Peter Thomson – Read and critiqued manuscript. Dileep Sharma – Acquired data from public database. Poornima Ramamurthy – Acquired data from public database. Siu-Wai Choi – Critiqued manuscript for intellectual content and re-wrote passages for readability.

  • Conflicts of Interest

    The Authors report no conflicts of interest.

  • Received August 15, 2022.
  • Revision received August 30, 2022.
  • Accepted October 5, 2022.
  • Copyright © 2022 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

References

  1. ↵
    1. Sung H,
    2. Ferlay J,
    3. Siegel RL,
    4. Laversanne M,
    5. Soerjomataram I,
    6. Jemal A and
    7. Bray F
    : Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3): 209-249, 2021. PMID: 33538338. DOI: 10.3322/caac.21660
    OpenUrlCrossRefPubMed
  2. ↵
    1. Adeoye J and
    2. Thomson PJ
    : Strategies to improve diagnosis and risk assessment for oral cancer patients. Fac Dent J 11: 122-127, 2020. DOI: 10.1308/rcsfdj.2020.97
    OpenUrlCrossRef
  3. ↵
    1. Huang S,
    2. Yang J,
    3. Fong S and
    4. Zhao Q
    : Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett 471: 61-71, 2020. PMID: 31830558. DOI: 10.1016/j.canlet.2019.12.007
    OpenUrlCrossRefPubMed
  4. ↵
    1. Baykul T,
    2. Yilmaz HH,
    3. Aydin U,
    4. Aydin MA,
    5. Aksoy M and
    6. Yildirim D
    : Early diagnosis of oral cancer. J Int Med Res 38(3): 737-749, 2010. PMID: 20819411. DOI: 10.1177/147323001003800302
    OpenUrlCrossRefPubMed
  5. ↵
    1. Awan Kh
    : Oral cancer: early detection is crucial. J Int Oral Health 6(5): i-ii, 2014. PMID: 25395811.
    OpenUrlPubMed
  6. ↵
    1. Thomson P
    : Oral cancer: From prevention to intervention. Cambridge, UK, Cambridge Scholars Publishing, 2018.
  7. ↵
    1. Adeoye J,
    2. Choi SW and
    3. Thomson P
    : Bayesian disease mapping and the ‘High-Risk’ oral cancer population in Hong Kong. J Oral Pathol Med 49(9): 907-913, 2020. PMID: 32450000. DOI: 10.1111/jop.13045
    OpenUrlCrossRefPubMed
  8. ↵
    1. Alabi RO,
    2. Youssef O,
    3. Pirinen M,
    4. Elmusrati M,
    5. Mäkitie AA,
    6. Leivo I and
    7. Almangush A
    : Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future-A systematic review. Artif Intell Med 115: 102060, 2021. PMID: 34001326. DOI: 10.1016/j.artmed.2021.102060
    OpenUrlCrossRefPubMed
  9. ↵
    1. Adeoye J,
    2. Tan JY,
    3. Choi SW and
    4. Thomson P
    : Prediction models applying machine learning to oral cavity cancer outcomes: A systematic review. Int J Med Inform 154: 104557, 2021. PMID: 34455119. DOI: 10.1016/j.ijmedinf.2021.104557
    OpenUrlCrossRefPubMed
  10. ↵
    1. Heinrichs B and
    2. Eickhoff SB
    : Your evidence? Machine learning algorithms for medical diagnosis and prediction. Hum Brain Mapp 41(6): 1435-1444, 2020. PMID: 31804003. DOI: 10.1002/hbm.24886
    OpenUrlCrossRefPubMed
  11. ↵
    1. Holzinger A,
    2. Biemann C,
    3. Pattichis CS and
    4. Kell DB
    : What do we need to build explainable AI systems for the medical domain? arXiv, 2017. DOI: 10.48550/arXiv.1712.09923
    OpenUrlCrossRef
  12. ↵
    1. Ahmad MA,
    2. Teredesai A and
    3. Eckert C
    : Interpretable machine learning in healthcare. 2018 IEEE ICHI, pp. 447-447, 2018. DOI: 10.1109/ICHI.2018.00095
    OpenUrlCrossRef
  13. ↵
    1. Lundberg SM and
    2. Lee S-I
    : A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768-4777, 2017.
  14. ↵
    1. Biau G
    : Analysis of a random forests model. JMLR 13(1): 1063-1095, 2012. DOI: 10.48550/arXiv.1005.0208
    OpenUrlCrossRef
  15. ↵
    1. Chen T and
    2. Guestrin C
    : XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2021. DOI: 10.1145/2939672.2939785
    OpenUrlCrossRef
  16. ↵
    1. Al-Aidaroo K,
    2. Bakar A and
    3. Othman Z
    : Medical data classification with Naive Bayes approach. Information Technology Journal 11(9): 1166-1174, 2020. DOI: 10.3923/itj.2012.1166.1174
    OpenUrlCrossRef
  17. ↵
    1. Batista G,
    2. Prati R and
    3. Monard M
    : A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1): 20-29, 2021. DOI: 10.1145/1007730.1007735
    OpenUrlCrossRef
  18. ↵
    1. Steyerberg EW and
    2. Vergouwe Y
    : Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 35(29): 1925-1931, 2014. PMID: 24898551. DOI: 10.1093/eurheartj/ehu207
    OpenUrlCrossRefPubMed
  19. ↵
    1. Pedregosa F,
    2. Varoquaux G,
    3. Gramfort A,
    4. Michel V,
    5. Thirion B,
    6. Grisel O,
    7. Blondel M,
    8. Prettenhofer P,
    9. Weiss R,
    10. Dubourg V,
    11. Vanderplas J,
    12. Passos A,
    13. Cournapeau D,
    14. Brucher M,
    15. Perrot M and
    16. Duchesnay É
    : Scikit-learn: Machine learning in python. J Mach Learn Res 12(null): 2825-2830, 2011.
    OpenUrlCrossRefPubMed
  20. ↵
    1. McNair AGK,
    2. MacKichan F,
    3. Donovan JL,
    4. Brookes ST,
    5. Avery KNL,
    6. Griffin SM,
    7. Crosby T and
    8. Blazeby JM
    : What surgeons tell patients and what patients want to know before major cancer surgery: a qualitative study. BMC Cancer 16: 258, 2016. PMID: 27036216. DOI: 10.1186/s12885-016-2292-3
    OpenUrlCrossRefPubMed
  21. ↵
    1. Adeoye J,
    2. Thomson P and
    3. Choi SW
    : Prognostic significance of multi-positive invasive histopathology in oral cancer. J Oral Pathol Med 49(10): 1004-1010, 2020. PMID: 32740985. DOI: 10.1111/jop.13086
    OpenUrlCrossRefPubMed
    1. Chang SW,
    2. Abdul-Kareem S,
    3. Merican AF and
    4. Zain RB
    : Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods. BMC Bioinformatics 14: 170, 2013. PMID: 23725313. DOI: 10.1186/1471-2105-14-170
    OpenUrlCrossRefPubMed
    1. Karadaghy OA,
    2. Shew M,
    3. New J and
    4. Bur AM
    : Development and assessment of a machine learning model to help predict survival among patients with oral squamous cell carcinoma. JAMA Otolaryngol Head Neck Surg 145(12): 1115-1120, 2019. PMID: 31045212. DOI: 10.1001/jamaoto.2019.0981
    OpenUrlCrossRefPubMed
    1. Cao R,
    2. Wu Q,
    3. Li Q,
    4. Yao M and
    5. Zhou H
    : A 3-mRNA-based prognostic signature of survival in oral squamous cell carcinoma. PeerJ 7: e7360, 2019. PMID: 31396442. DOI: 10.7717/peerj.7360
    OpenUrlCrossRefPubMed
    1. Alkhadar H,
    2. Macluskey M,
    3. White S and
    4. Ellis I
    : Perineural invasion in oral squamous cell carcinoma: Incidence, prognostic impact and molecular insight. J Oral Pathol Med 49(10): 994-1003, 2020. PMID: 32533593. DOI: 10.1111/jop.13069
    OpenUrlCrossRefPubMed
    1. Rosado P,
    2. Lequerica-Fernández P,
    3. Villallaín L,
    4. Peña I,
    5. Sanchez-Lasheras F and
    6. de Vicente JC
    : Survival model in oral squamous cell carcinoma based on clinicopathological parameters, molecular markers and support vector machines. Expert Systems with Applications 40(12): 4770-4776, 2019. DOI: 10.1016/j.eswa.2013.02.032
    OpenUrlCrossRef
  22. ↵
    1. Zhang X,
    2. Jang MI,
    3. Zheng Z,
    4. Gao A,
    5. Lin Z and
    6. Kim KY
    : Prediction of chemosensitivity in multiple primary cancer patients using machine learning. Anticancer Res 41(5): 2419-2429, 2021. PMID: 33952467. DOI: 10.21873/anticanres.15017
    OpenUrlAbstract/FREE Full Text
  23. ↵
    1. Alabi RO,
    2. Mäkitie AA,
    3. Pirinen M,
    4. Elmusrati M,
    5. Leivo I and
    6. Almangush A
    : Comparison of nomogram with machine learning techniques for prediction of overall survival in patients with tongue cancer. Int J Med Inform 145: 104313, 2021. PMID: 33142259. DOI: 10.1016/j.ijmedinf.2020.104313
    OpenUrlCrossRefPubMed
  24. ↵
    1. Yang Y and
    2. Warnakulasuriya S
    : Effect of comorbidities on the management and prognosis in patients with oral cancer. Translational Research in Oral Oncology 1: 2057178X1666996, 2021. DOI: 10.1177/2057178x16669961
    OpenUrlCrossRef
  25. ↵
    1. Jariod-Ferrer ÚM,
    2. Arbones-Mainar JM,
    3. Gavin-Clavero MA,
    4. Simón-Sanz MV,
    5. Moral-Saez I,
    6. Cisneros-Gimeno AI and
    7. Martinez-Trufero J
    : Are comorbidities associated with overall survival in patients with oral squamous cell carcinoma? J Oral Maxillofac Surg 77(9): 1906-1914, 2019. PMID: 30980811. DOI: 10.1016/j.joms.2019.03.007
    OpenUrlCrossRefPubMed
  26. ↵
    1. Ramamurthy P,
    2. Sharma D and
    3. Thomson P
    : Oral cancer in Australia: regional and remote perspectives. Faculty Dental Journal 13(1): 41-45, 2022. DOI: 10.1308/rcsfdj.2022.9
    OpenUrlCrossRef
PreviousNext
Back to top

In this issue

Anticancer Research: 42 (12)
Anticancer Research
Vol. 42, Issue 12
December 2022
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Back Matter (PDF)
  • Ed Board (PDF)
  • Front Matter (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Anticancer Research.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Predicting Overall Survival Using Machine Learning Algorithms in Oral Cavity Squamous Cell Carcinoma
(Your Name) has sent you a message from Anticancer Research
(Your Name) thought you would like to see the Anticancer Research web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
14 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Predicting Overall Survival Using Machine Learning Algorithms in Oral Cavity Squamous Cell Carcinoma
JIA YAN TAN, JOHN ADEOYE, PETER THOMSON, DILEEP SHARMA, POORNIMA RAMAMURTHY, SIU-WAI CHOI
Anticancer Research Dec 2022, 42 (12) 5859-5866; DOI: 10.21873/anticanres.16094

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Reprints and Permissions
Share
Predicting Overall Survival Using Machine Learning Algorithms in Oral Cavity Squamous Cell Carcinoma
JIA YAN TAN, JOHN ADEOYE, PETER THOMSON, DILEEP SHARMA, POORNIMA RAMAMURTHY, SIU-WAI CHOI
Anticancer Research Dec 2022, 42 (12) 5859-5866; DOI: 10.21873/anticanres.16094
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Patients and Methods
    • Results
    • Discussion
    • Conclusion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • PDF

Related Articles

Cited By...

  • Four Different Artificial Intelligence Models Versus Logistic Regression to Enhance the Diagnostic Accuracy of Fecal Immunochemical Test in the Detection of Colorectal Carcinoma in a Screening Setting
  • Google Scholar

More in this TOC Section

  • Tolerance and Outcomes of Partial Breast Radiation in a Community-based Setting
  • Effectiveness of Pembrolizumab Monotherapy for Older Adults With Head and Neck Carcinoma by CPS Status
  • Diuretic Administration for Vomiting During Concurrent Chemoradiotherapy for Cervical Cancer: A Multicenter Retrospective Study
Show more Clinical Studies

Keywords

  • Oral cavity cancer
  • machine learning
  • interpretability
  • SHapley values
  • Prognosis
Anticancer Research

© 2026 Anticancer Research

Powered by HighWire