Prediction of Chemosensitivity in Multiple Primary Cancer Patients Using Machine Learning

Abstract

Background/Aim: Many cancer patients face multiple primary cancers. It is challenging to find an anticancer therapy that covers both cancer types in such patients. In personalized medicine, drug response is predicted using genomic information, which makes it possible to choose the most effective therapy for these cancer patients. The aim of this study was to identify chemosensitive gene sets and compare the predictive accuracy of response of cancer cell lines to drug treatment, based on both the genomic features of cell lines and cancer types. Materials and Methods: In this study, we identified a gene set that is sensitive to a specific therapeutic drug, and compared the performance of several predictive models using the identified genes and cancer types through machine learning (ML). To this end, publicly available gene expression datasets and drug sensitivity datasets of gastric and pancreatic cancers were used. Five ML algorithms, including linear discriminant analysis, classification and regression tree, k-nearest neighbors, support vector machine and random forest, were implemented. Results: The predictive accuracy of the cancer type models were 0.729 to 0.763 on the training dataset and 0.731 to 0.765 on the testing dataset. The predictive accuracy of the genomic prediction models was 0.818 to 1.0 on the training dataset and 0.759 to 0.896 on the testing dataset. Conclusion: Performance of the specific gene models was much better than those of the cancer type models using the ML methods. Therofore, the most effective therapeutic drug can be chosen based on the expression of specific genes in patients with multiple primary cancers, regardless of cancer types.

Key Words:

Multiple primary cancers
chemosensitivity prediction
gene expression
cancer type
machine learning

When more than one tumors in the same or a different organ is seen in a single patient, multiple primary tumors may be present (1-3). According to epidemiological studies, the frequency of multiple primary tumors is reported to be in the range of 2-17% (1, 4-7). In addition, when two active malignancies are diagnosed concurrently in the same patient, it is challenging to find an anticancer therapy that covers both cancer types (a) without increased toxicity or relevant pharmacological interactions and (b) without a negative impact on the overall outcome (1). Zhai et al. (2018) reported that the most common cancer pairs were digestive-digestive tumors among multiple primary malignant tumors (8). In particular, gastric cancer and pancreatic cancer are included in the WHO digestive cancer classification, and these two cancers are likely to be involved in cases of multiple primary cancers.

Gastric cancer is the fifth most commonly cancer diagnosed type and ranks as the third leading cause of cancer-related death worldwide (9-11). In general, the incidence rate is about two times higher in men than women (9, 12). Around 1 million new cases of gastric cancer were recorded globally in 2018, accounting for an estimated 783,000 deaths. This means that more than one-twelfth of all deaths are caused annually by gastric cancer. According to data from the World Health Organization, Asia is first in the world in the incidence, mortality, and 5-year prevalence rate of gastric cancer.

Pancreatic cancer is a lethal disease with poor early diagnosis and a limited number of therapeutic options, and is associated with a high number of cancer-related deaths. Despite improvements in survival for most cancer types in the last decade, pancreatic cancer is falling behind because there has been limited progress in diagnostic methods and effective targeted therapeutic interventions. Although various clinical trials showed that combination therapy is more efficacious than monotherapy in advanced pancreatic cancer, the side-effects of combination therapy are usually much more severe than those of monotherapy and many patients cannot tolerate the side effects of combination chemotherapy, such as FOLFIRINOX.

Most cancer-predisposing mutations confer susceptibility to cancer at multiple sites. Chemotherapy targets all rapidly growing cells, not only cancer cells, and is thus often associated with unpleasant side-effects. The side-effects of chemotherapy may be more severe in patients with multiple primary cancers. Therefore, an examination of chemosensitivity based on genotype is needed in order to reduce the incidence and severity of side-effects. A key goal of precision medicine is to predict the best drug therapy for a specific patient using genomic information. In oncology, cancers that appear pathologically similar can vary greatly in how they respond to the same drug.

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence and one of its important applications is in the field of drug discovery. Consequently, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.

The aim of this study was to compare the predictive accuracy of response of cancer cell lines to drug treatment, based on both the genomic features of the cell lines and cancer types. In order to fulfill the purpose of the study, chemosensitivity predictive models were implemented with publicly available drug sensitivity datasets and gene expression datasets of gastric and pancreatic cancers.

Materials and Methods

Data preparation. Three publicly available gene expression and drug sensitivity datasets were used in this study. The two expression datasets are accessible from a public microarray database [gene expression omnibus (GEO), GSE64604 and GSE77850]. These datasets consist of 16 gastric cancer cell lines and 29 pancreatic cancer cell lines. Moreover, both datasets include 41,000 probes, and they were summarized by 19,566 gene symbols for this study.

The chemosensitivity dataset consists of 144 cell lines and four chemosensitivity measurements, including the IC₅₀ scores (half-maximal inhibitory concentration; the concentration of a drug required for 50% growth inhibition in vitro) of 22 components. The remaining chemosensitivity measures are as follows: (a) EC₅₀ (half-maximal effective concentration), defined as the concentration required to obtain a 50% antioxidant effect; (b) ActArea, representing the area between the drug-response curve and a fixed reference; and (c) Amax, which is the maximum activity value. One hundred and forty-four cell lines were divided into seven types of cancers. Gastric cancer and pancreatic cancer cell lines were carefully analyzed because they represent digestive-digestive tumor pairs.

Machine learning methods. In recent decades, the rapid advancement of computational algorithms and the increased availability of big data have enabled artificial intelligence (AI), one of the most exciting technologies in our everyday lives, to analyze and dramatically improve upon the predictive performance of models in various research areas. To be specific, machine learning (ML), a major branch of AI, has been used widely; ML has been focused on the process of drug discovery and development in order to (a) predict treatment effects, (b) identify target genes as well as functional pathways, and (c) select potential biomarkers. The ML algorithms in this study are as follows.

Linear discriminant analysis. Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant, a method used in statistics and other fields to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or for dimensionality reduction before later classification. LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. LDA is a dimension-reduction technique which is commonly applied to supervised classification problems. It is used to model differences between groups, i.e., separating two or more groups from each other. It is also used to project features in higher dimensional space into lower dimensional space.

Classification and regression tree (CART). A classification and regression tree (CART) is a predictive model that explains how certain outcome variables can be predicted based on other values. A CART output is a decision tree where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable. CART is a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.

k-Nearest neighbors (KNN) method. The k-nearest neighbors algorithm is one of the simplest techniques used in ML. It is used for both classification and regression. It is preferred by many in the industry because of its ease of use and low calculation time. When implementing KNN, the first step is to transform data points into feature vectors; the algorithm works by finding the distance between the vectors of these points. The most common way to find this distance is to use the Euclidean distance, as shown below. Formula

where p and q are two arbitrary points in n-dimensional space, and the subscripts represent dimensions.

KNN runs this formula to compute the distance between each data point and the test data. It then finds the probability of these points being similar to the test data and classifies the points based on which of them share the highest probabilities.

Support vector machine (SVM). A support vector machine (SVM) is a supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis. The SVM algorithm is a popular machine learning tool that offers solutions to both classification and regression problems. It was developed at AT&T Bell Laboratories by Vapnik and colleagues. The objective of the SVM algorithm is to find a hyperplane in N-dimensional space (where N=the number of features) that distinctly classifies the data points. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, the margin of the classifier is maximized. Deleting the support vectors will change the position of the hyperplane.

Random forest. Random forest (RF) is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees. The RF algorithm is used to solve both regression and classification problems, making it a diverse model that is widely used by engineers. All machine learning models were implemented using the R programming language. All statistical analyses including principal component analysis (PCA) were conducted using R (version 4.0.1) with p<0.05 considered statistically significant.

In this study, we compared the predictive accuracies of the ML algorithms for drug sensitivity using both cancer type and specific gene expressions. The study proceeded as shown in Figure 1.

Figure 1.

Study process. Drug sensitive gene sets are identified from a combined dataset of drug sensitivity dataset and gene expression dataset.

Results

Chemosensitivity profiling in cancer cell lines. The chemosensitivity dataset contains data from 144 cell lines on 4 chemosensitivity measures, including the IC₅₀ scores, for 22 components. The 144 cell lines comprise 7 different types of cancers. Data on the gastric cancer and pancreatic cancer cell lines were used in this study. This dataset is summarized in Figure 2.

Figure 2.

Drug sensitivity according to 4 sensitivity measures in cell lines from 7 types of cancer for 22 drugs. The 4 sensitivity measures are: (A) IC₅₀, (B) EC₅₀, (C) ActArea and (D) Amax.

As shown in Figure 2, the smaller the values of IC₅₀, EC₅₀ and Amax, the greater the sensitivity (Figure 2A, B and D), and the larger the value of ActArea (Figure 2C), the greater the sensitivity. As shown in Figures 3 and 4, drugs, irinotecan, paclitaxel, panobinostat and topotecan, are more sensitive than all other drugs (components) in the cell lines from 7 types of cancer. The responses of all 22 drugs showed similar sensitivity patterns in the cell lines from 7 types of cancer. This indicates that a drug which is sensitive in a particular cancer type may also be sensitive in other cancer types (Figure 2).

Figure 3.

Combined topotecan sensitivities in gastric and pancreatic cancer.

Figure 4.

Expression patterns of the 24 identified topotecan sensitivity genes in cancer cell lines and correlation coefficients of gene expression. The dashed box in (A) represents the sensitive cell lines. (A) Black and white colors represent down-regulation and up-regulation, respectively. (B) Black and white colors represent strong negative and strong positive correlation, respectively.

To compare drug sensitivity in the two models, we used two types of cancer cell lines, gastric cancer and pancreatic cancer, and one of the four most sensitive drug components, topotecan, as it included the least missing entries.

The publicly archived gene expression datasets contain data on 16 gastric cancer cell lines and 29 pancreatic cancer cell lines, respectively. Forty-one thousand probes were summarized by 19,566 unique gene symbols in this study. The cell lines in two datasets are summarized in Table I.

View this table:

Table I.

Pancreatic and gastric cancer cell lines used in this study.

Combination of the 4 drug sensitivity measures. The drug sensitivity datasets included 4 measures of drug sensitivity. In this study, PCA was used to identify a combined sensitivity metric using all 4 measures. PCA is a simple nonparametric statistical method that reduces data dimensionality into uncorrelated variables, which are named principal components. Each principal component is represented by linear combinations of the original variables. Therefore, a combined chemosensitivity value for each cell line can be calculated according to the following formula, which we call a combined chemosensitivity measure:

Combined chemosensitivity measure = w₁S₁ + w₂S₂ + w₃S₃ + w₄S₄

where S₁, S₂, S₃, S₄ are the values of the 4 sensitivity measures, and w₁, w₂, w₃, w₄ are the weights of the 4 sensitivity measures.The correlation coefficients of the relationships between the combined chemosensitivity measure and the 4 measures in the two types of cancer cell lines are shown in Table II.

View this table:

Table II.

Correlation coefficients of the relationship between the 4 sensitivity measures and the combined chemosensitivity measure.

The 4 individual measures were not strongly correlated with each other; therefore, the combined measure is meaningful as it reflects the characteristics of all 4 measures.

Classification of cell lines according to chemosensitivity. The gene expression datasets included information on sixteen gastric cancer cell lines and twenty-nine pancreatic cancer cell lines. Among these 45 cell lines, twenty-seven were treated with topotecan. By combining the gene expression and chemosensitivity datasets, the number of cell lines was reduced from 45 to 27. The combined topotecan sensitivities in the two types of cancer are shown in Figure 3.

As shown in Figure 3, the combined sensitivities were widely distributed even in the same cancer type. This may imply that the chemosensitivity depends on the characteristics of the individual cell lines, not the type of cancer.

The combined sensitivity to topotecan was divided into two groups, sensitive and resistant, by k-means clustering. The 27 cell lines were classified into the two groups, as shown in Table III. The cell lines were divided into 3 groups at first, but the second and third groups were combined (resistant group) because of the small number of resistant cell lines.

View this table:

Table III.

Cell lines in the topotecan sensitive and resistant groups.

Identification of topotecan sensitivity genes. Using the gene expression data set, we identified 24 genes which are correlated with topotecan sensitivity based on the results of the Mann-Whitney U-test (Table IV).

View this table:

Table IV.

Summary of the 24 identified topotecan sensitivity genes.

Figure 4 shows the gene expression patterns and correlation coefficients of expression of the identified genes. The expression pattern is mixed in both the sensitive and resistant groups (Figure 4A).

The expression patterns on the right side of the heatmap show that OBP2B and PIGP are up-regulated in sensitive cell lines, while ZNF227, WDPCP and TTLL5 are down-regulated in sensitive cell lines (Figure 4A). Figure 4B shows that the expression levels of PIGP and OBP2B are negative correlated with expression of TTLL5, WDPCP, ZNF227 and CD300LD. The distribution of gene expression in the sensitive and resistant cell line groups is shown in Figure 5.

Figure 5.

Expression patterns of the identified topotecan sensitivity genes. S and R represent the sensitive and resistant groups, respectively.

Figure 5 shows the gene expression distributions in the topotecan-sensitive and -resistant groups. Though the expression of most genes partially overlaps in the sensitive and resistant groups, different expression patterns can be grasped in the two groups.

Comparison of the performance of two types of chemosensitivity models. We compared the performance of two types of models using ML algorithms. One set of models predicted chemosensitivity based on cancer type, while the other set of models used the identified gene sets. For this experiment, the dataset was randomly split into training (70%) and testing (30%) datasets. The data was processed 100 times repeatedly, and the performance of each model was summarized using the mean values and standard deviations of the calculated values from all 100 processing cycles. The results are summarized in Table V.

View this table:

Table V.

Comparison of the predictive accuracies on the testing and training datasets. The values shown represent the mean values and standard deviations, which were calculated from 100 repeated processes.

Table V shows the prediction accuracies on the training and testing dataset. The table shows mean values and standard deviations, which were calculated from the 100 randomly allocated processing cycles. The performances of the models using the identified genes are superior to those of the models using cancer type (Table V). The predictive accuracies were 0.729 to 0.763 on the training dataset and 0.731 to 0.765 on the testing dataset for the cancer type models. For the models using gene expression, RF is the best model on the training dataset (accuracy=1.0), and SVM is the best model on the testing dataset (accuracy=0.896).

SVM shows the best performance on the training and testing datasets, and performance seems to differ according to the predictive gene. For all genes, the specific gene models show better performance than the cancer type models. This means that the selection of therapeutic drug should be done on the basis of expression of specific genes, not cancer type. It may also mean that patients with multiple primary cancers can be treated depending on their specific gene expression profiles, not cancer type.

Discussion

Some environmental factors are thought to be involved in the pathogenesis of multiple primary cancers, such as smoking exposure, alcohol consumption, hepatitis C virus infection, and human papillomavirus infection. Moreover, failure of the immune surveillance system, including decreased T-cell number and expression of human leukocyte antigen class I and CD3 zeta chain, may also contribute to development of multiple primary cancers. However, most cases of multiple primary cancers cannot be fully explained by these immune dysfunction and environmental factors. Some genetic factors, including single-nucleotide polymorphisms, microsatellite instability, chromosomal instability, and epigenetic alterations, are considered risk factors for multiple primary cancers. Hereditary breast and ovarian cancer syndrome caused by germline mutations in the genes BRCA1 and BRCA2, along with TP53 mutations, are associated with Li-Farumeni syndrome, which is characterized by a high frequency of various types of malignancies such as soft tissue sarcomas, leukemia, and adrenocortical carcinomas.

A fundamental goal of precision medicine is to match drugs to the specific genomic profiles of patients in order to maximize the effectiveness of treatment for the individual. Cancer patients face the possibility of multiple primary cancers, and the occurrence of subsequent cancers in cancer patients is becoming increasingly frequent. Specifically, when the first cancer site is the pancreas, the standard incidence ratio (SIR) of stomach cancer is 1.41, and the relationship between the two is significant. These two types of cancers are seen in Peutz–Jeghers syndrome due to STK11 gene mutation.

The treatment of patients with multiple primary cancers is challenging and often presents therapeutic dilemmas. In cases of advanced disease, anti-tumour therapy selection is often difficult and is generally not based on evidence from the literature and clinical trials. In these patients, the side effects of combination therapy are usually much more severe than those of monotherapy. Identification of potential biomarkers in specific patient groups may aid in selection of appropriate therapies and increase survival.

ML is a subset of artificial intelligence that is seeing more applications in drug discovery every year. In a supervised ML algorithm, the output is already known; these include LDA, CART, KNN, SVM and RF.

From an exploration of chemosensitivity patterns in cell lines from 7 types of cancer, this paper showed that chemosensitivity depends on drug components, not cancer type (Figure 3). Also, sensitivity is widely distributed even in the same cancer cell lines (Figure 5). This can be interpreted to mean that the chemosensitivity depends on the characteristics of the cell lines, regardless of the type of cancer. Based on this insight, chemosensitivity models using specific gene expression patterns were implemented for patients with multiple primary cancers. By applying five ML models to the combined drug sensitivity and gene expression data, we confirmed that the use of specific gene expression patterns can improve the predictive accuracy for drug sensitivity. These findings imply that molecular markers are essential for personalized medicine in patients with multiple primary cancers, such as the gastric and pancreatic cancers investigated in this study.

Among the five ML models using the six identified genes, RF shows the best performance on the training dataset, but its performance is lower than that of the other models on the testing dataset. On the whole, the ML models based on gene expression were far superior to the models based on cancer type. This means that drugs should not be chosen according to type of cancer, but rather because of the expression of specific genes, especially in patients with multiple primary cancers.

While machine learning easily identifies trends and patterns of data set, it requires massive data sets to train on. Due to the small sample size, the first limitation of the study is that the dataset does not represent the entire population of patients with multiple primary cancers. A model trained on a random sample of a dataset may have poor generalizability and perform poorly outside of that sample. Indeed, the use of larger training and test sets result in more accurate and reliable predictions. In addition, the second limitation of this study is that we constructed ML models using the expression levels of single genes. When features are chosen based on single genes, the models tend to show poor stability and performance on independent datasets. Therefore, in future work, we will implement the predictive ML models using combined gene expression data from a larger dataset.

Acknowledgements

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1A2C1003028).

Footnotes

* These Authors contributed equally to the study.
Authors’ Contributions
Ki-Yeol Kim designed this study, analyzed the data, prepared the figures and wrote original draft. Xianglan Zhang and Mi Jang oversaw the study and revised the article. All Authors reviewed the article.
Conflicts of Interest
Conflicts of interest relevant to this article were not reported.

Received February 16, 2021.
Revision received March 11, 2021.
Accepted March 24, 2021.

References

1. Vogt A,
2. Schmid S,
3. Heinimann K,
4. Frick H,
5. Herrmann C,
6. Cerny T and
7. Omlin A
: Multiple primary tumours: Challenges and approaches, a review. ESMO Open 2(2): e000172, 2021. DOI: 10.1136/esmoopen-2017-000172
1. Cybulski C,
2. Nazarali S and
3. Narod SA
: Multiple primary cancers as a guide to heritability. Int J Cancer 135(8): 1756-1763, 2014. PMID: 24945890. DOI: 10.1002/ijc.28988
1. Kirova YM,
2. Chargari C and
3. Mazeron JJ
: [Multiple brain metastases after breast cancer and their radiotherapy management: what is the optimal treatment?]. Bull Cancer 98(4): 409-415, 2011. PMID: 21540140. DOI: 10.1684/bdc.2011.1335
1. Coyte A,
2. Morrison DS and
3. McLoone P
: Second primary cancer risk - the impact of applying different definitions of multiple primaries: Results from a retrospective population-based cancer registry study. BMC Cancer 14: 272, 2014. PMID: 24742063. DOI: 10.1186/1471-2407-14-272
1. Buiatti E,
2. Crocetti E,
3. Acciai S,
4. Gafà L,
5. Falcini F,
6. Milandri C and
7. La Rosa M
: Incidence of second primary cancers in three Italian population-based cancer registries. Eur J Cancer 33(11): 1829-1834, 1997. PMID: 9470841. DOI: 10.1016/s0959-8049(97)00173-1
1. Weir HK,
2. Johnson CJ and
3. Thompson TD
: The effect of multiple primary rules on population-based cancer survival. Cancer Causes Control 24(6): 1231-1242, 2013. PMID: 23558444. DOI: 10.1007/s10552-013-0203-3
1. Rosso S,
2. De Angelis R,
3. Ciccolallo L,
4. Carrani E,
5. Soerjomataram I,
6. Grande E,
7. Zigon G,
8. Brenner H and Eurocare Working Group
: Multiple tumours in survival estimates. Eur J Cancer 45(6): 1080-1094, 2009. PMID: 19121933. DOI: 10.1016/j.ejca.2008.11.030
1. Zhai C,
2. Cai Y,
3. Lou F,
4. Liu Z,
5. Xie J,
6. Zhou X,
7. Wang Z,
8. Fang Y,
9. Pan H and
10. Han W
: Multiple primary malignant tumors – A clinical analysis of 15,321 patients with malignancies at a single center in China. J Cancer 9(16): 2795-2801, 2018. PMID: 30123347. DOI: 10.7150/jca.25482
1. Bray F,
2. Ferlay J,
3. Soerjomataram I,
4. Siegel RL,
5. Torre LA and
6. Jemal A
: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6): 394-424, 2018. PMID: 30207593. DOI: 10.3322/caac.21492
1. Marqués-Lespier JM,
2. González-Pons M and
3. Cruz-Correa M
: Current perspectives on gastric cancer. Gastroenterol Clin North Am 45(3): 413-428, 2016. PMID: 27546840. DOI: 10.1016/j.gtc.2016.04.002
1. Venerito M,
2. Vasapolli R,
3. Rokkas T and
4. Malfertheiner P
: Gastric cancer: Epidemiology, prevention, and therapy. Helicobacter 23(Suppl 1): e12518, 2018. PMID: 30203589. DOI: 10.1111/hel.12518
1. Chen Y,
2. Zhou Q,
3. Wang H,
4. Zhuo W,
5. Ding Y,
6. Lu J,
7. Wu G,
8. Xu N and
9. Teng L
: Predicting peritoneal dissemination of gastric cancer in the era of precision medicine: Molecular characterization and biomarkers. Cancers (Basel) 12(8):2236, 2020. PMID: 32785164. DOI: 10.3390/cancers12082236
1. Bai M,
2. Wang P,
3. Yang J,
4. Zuo M and
5. Ba Y
: Identification of miR-135b as a novel regulator of TGFβ pathway in gastric cancer. J Physiol Biochem 76(4): 549-560, 2020. PMID: 32737704. DOI: 10.1007/s13105-020-00759-9
1. Wang S,
2. You L,
3. Dai M and
4. Zhao Y
: Mucins in pancreatic cancer: A well-established but promising family for diagnosis, prognosis and therapy. J Cell Mol Med 24(18): 10279-10289, 2020. PMID: 32745356. DOI: 10.1111/jcmm.15684
1. Moutinho-Ribeiro P,
2. Macedo G and
3. Melo SA
: Pancreatic cancer diagnosis and management: Has the time come to prick the bubble? Front Endocrinol (Lausanne) 9: 779, 2019. PMID: 30671023. DOI: 10.3389/fendo.2018.00779
1. Liu GF,
2. Li GJ and
3. Zhao H
: Efficacy and toxicity of different chemotherapy regimens in the treatment of advanced or metastatic pancreatic cancer: A network meta-analysis. J Cell Biochem 119(1): 511-523, 2018. PMID: 28608558. DOI: 10.1002/jcb.26210
1. Kamisawa T,
2. Wood LD,
3. Itoi T and
4. Takaori K
: Pancreatic cancer. Lancet 388(10039): 73-85, 2016. PMID: 26830752. DOI: 10.1016/S0140-6736(16)00141-0
1. Zhang TN,
2. Yin RH and
3. Wang LW
: The prognostic and predictive value of the albumin-bilirubin score in advanced pancreatic cancer. Medicine (Baltimore) 99(28): e20654, 2020. PMID: 32664063. DOI: 10.1097/MD.0000000000020654
1. Zhang X,
2. Cha IH and
3. Kim KY
: Use of a combined gene expression profile in implementing a drug sensitivity predictive model for breast cancer. Cancer Res Treat 49(1): 116-128, 2017. PMID: 27188202. DOI: 10.4143/crt.2016.085
1. Lind AP and
2. Anderson PC
: Predicting drug activity against cancer cells by random forest models based on minimal genomic information and chemical properties. PLoS One 14(7): e0219774, 2019. PMID: 31295321. DOI: 10.1371/journal.pone.0219774
1. Mitchell T
: Machine learning. New York, McGraw Hill, 1997.
1. Stephenson N,
2. Shane E,
3. Chase J,
4. Rowland J,
5. Ries D,
6. Justice N,
7. Zhang J,
8. Chan L and
9. Cao R
: Survey of machine learning techniques in drug discovery. Curr Drug Metab 20(3): 185-193, 2019. PMID: 30124147. DOI: 10.2174/1389200219666180820112457
1. Kim DW,
2. Kim H,
3. Nam W,
4. Kim HJ and
5. Cha IH
: Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report. Bone 116: 207-214, 2018. PMID: 29698784. DOI: 10.1016/j.bone.2018.04.020
1. Zukotynski K,
2. Gaudet V,
3. Kuo PH,
4. Adamo S,
5. Goubran M,
6. Scott C,
7. Bocti C,
8. Borrie M,
9. Chertkow H,
10. Frayne R,
11. Hsiung R,
12. Laforce R Jr.,
13. Noseworthy MD,
14. Prato FS,
15. Sahlas DJ,
16. Smith EE,
17. Sossi V,
18. Thiel A,
19. Soucy JP,
20. Tardif JC and
21. Black SE
: The use of random forests to classify amyloid brain PET. Clin Nucl Med 44(10): 784-788, 2019. PMID: 31348088. DOI: 10.1097/RLU.0000000000002747
1. van Niftrik CHB,
2. van der Wouden F,
3. Staartjes VE,
4. Fierstra J,
5. Stienen MN,
6. Akeret K,
7. Sebök M,
8. Fedele T,
9. Sarnthein J,
10. Bozinov O,
11. Krayenbühl N,
12. Regli L and
13. Serra C
: machine learning algorithm identifies patients at high risk for early complications after intracranial tumor surgery: Registry-based cohort study. Neurosurgery 85(4): E756-E764, 2019. PMID: 31149726. DOI: 10.1093/neuros/nyz145
1. Haibe-Kains B,
2. El-Hachem N,
3. Birkbak NJ,
4. Jin AC,
5. Beck AH,
6. Aerts HJ and
7. Quackenbush J
: Inconsistency in large pharmacogenomic studies. Nature 504(7480): 389-393, 2013. PMID: 24284626. DOI: 10.1038/nature12831
1. Bouhaddou M,
2. DiStefano MS,
3. Riesel EA,
4. Carrasco E,
5. Holzapfel HY,
6. Jones DC,
7. Smith GR,
8. Stern AD,
9. Somani SS,
10. Thompson TV and
11. Birtwistle MR
: Drug response consistency in CCLE and CGP. Nature 540(7631): E9-E10, 2016. PMID: 27905419. DOI: 10.1038/nature20580
1. Chen Z,
2. Bertin R and
3. Froldi G
: EC50 estimation of antioxidant activity in DPPH· assay using several statistical programs. Food Chem 138(1): 414-420, 2013. PMID: 23265506. DOI: 10.1016/j.foodchem.2012.11.001
1. Wang Z,
2. Li H,
3. Carpenter C and
4. Guan Y
: Challenge-enabled machine learning to drug-response prediction. AAPS J 22(5): 106, 2020. PMID: 32778984. DOI: 10.1208/s12248-020-00494-5
1. Adam G,
2. Rampášek L,
3. Safikhani Z,
4. Smirnov P,
5. Haibe-Kains B and
6. Goldenberg A
: Machine learning approaches to drug response prediction: Challenges and recent progress. NPJ Precis Oncol 4: 19, 2020. PMID: 32566759. DOI: 10.1038/s41698-020-0122-1
1. Vougas K,
2. Sakellaropoulos T,
3. Kotsinas A,
4. Foukas GP,
5. Ntargaras A,
6. Koinis F,
7. Polyzos A,
8. Myrianthopoulos V,
9. Zhou H,
10. Narang S,
11. Georgoulias V,
12. Alexopoulos L,
13. Aifantis I,
14. Townsend PA,
15. Sfikakis P,
16. Fitzgerald R,
17. Thanos D,
18. Bartek J,
19. Petty R,
20. Tsirigos A and
21. Gorgoulis VG
: Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining. Pharmacol Ther 203: 107395, 2019. PMID: 31374225. DOI: 10.1016/j.pharmthera.2019.107395
1. Tan M,
2. Özgül OF,
3. Bardak B,
4. Ekşioğlu I and
5. Sabuncuoğlu S
: Drug response prediction by ensemble learning and drug-induced gene expression signatures. Genomics 111(5): 1078-1088, 2019. PMID: 31533900. DOI: 10.1016/j.ygeno.2018.07.002
1. Turki T and
2. Wang JTL
: Clinical intelligence: New machine learning techniques for predicting clinical drug response. Comput Biol Med 107: 302-322, 2019. PMID: 30771879. DOI: 10.1016/j.compbiomed.2018.12.017
1. Fukunaga K
: Introduction to statistical pattern recognition. Boston, Academic Press, 1990.
1. Breiman L,
2. Friedman J,
3. Stone CJ and
4. Olshen RA
: Classification and Regression Trees. New York, Chapman and Hall, 1984.
1. Altman N
: An introduction to Kernel and Nearest-Neighbor nonparametric regression. The American Statistician 46(3): 175, 2018. DOI: 10.2307/2685209
1. Vapnik VN
: The Nature of Statistical Learning Theory. New York, NY, Springer New York, 1995.
1. Cortes C and
2. Vapnik V
: Support-vector networks. Machine Learning 20(3): 273-297, 2019. DOI: 10.1007/BF00994018
1. Ho TK
: Random Decision Forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278-282, 1995.
1. Ho TK
: The random subspace method for constructing decision forests. Ieee PAMI 20(8): 832-844, 1998. DOI: 10.1109/34.709601
1. Breiman L
: Random forests. Mach Learn 45(1): 5-32, 2001. doi: 10.1023/A:1010933404324
1. Do KA,
2. Johnson MM,
3. Lee JJ,
4. Wu XF,
5. Dong Q,
6. Hong WK,
7. Khuri FR and
8. Spitz MR
: Longitudinal study of smoking patterns in relation to the development of smoking-related secondary primary tumors in patients with upper aerodigestive tract malignancies. Cancer 101(12): 2837-2842, 2004. PMID: 15536619. DOI: 10.1002/cncr.20714
1. León X,
2. Del Prado Venegas M,
3. Orús C,
4. Kolañczak K,
5. García J and
6. Quer M
: Metachronous second primary tumours in the aerodigestive tract in patients with early stage head and neck squamous cell carcinomas. Eur Arch Otorhinolaryngol 262(11): 905-909, 2005. PMID: 15891925. DOI: 10.1007/s00405-005-0922-5
1. Nagao Y and
2. Sata M
: High incidence of multiple primary carcinomas in HCV-infected patients with oral squamous cell carcinoma. Med Sci Monit 15(9): CR453-CR459, 2009. PMID: 19721396.
1. Xu CC,
2. Biron VL,
3. Puttagunta L and
4. Seikaly H
: HPV status and second primary tumours in oropharyngeal squamous cell carcinoma. J Otolaryngol Head Neck Surg 42: 36, 2013. PMID: 23718873. DOI: 10.1186/1916-0216-42-36
1. Kuss I,
2. Hathaway B,
3. Ferris RL,
4. Gooding W and
5. Whiteside TL
: Decreased absolute counts of T lymphocyte subsets and their relation to disease in squamous cell carcinoma of the head and neck. Clin Cancer Res 10(11): 3755-3762, 2004. PMID: 15173082. DOI: 10.1158/1078-0432.CCR-04-0054
1. Atienza JA and
2. Dasanu CA
: Incidence of second primary malignancies in patients with treated head and neck cancer: A comprehensive review of literature. Curr Med Res Opin 28(12): 1899-1909, 2012. PMID: 23121148. DOI: 10.1185/03007995.2012.746218
1. Kuss I,
2. Saito T,
3. Johnson JT and
4. Whiteside TL
: Clinical significance of decreased zeta chain expression in peripheral blood lymphocytes of patients with head and neck cancer. Clin Cancer Res 5(2): 329-334, 1999. PMID: 10037182.
1. Foulkes WD,
2. Brunet JS,
3. Sieh W,
4. Black MJ,
5. Shenouda G and
6. Narod SA
: Familial risks of squamous cell carcinoma of the head and neck: Retrospective case-control study. BMJ 313(7059): 716-721, 1996. PMID: 8819440. DOI: 10.1136/bmj.313.7059.716
1. Collins FS and
2. Varmus H
: A new initiative on precision medicine. N Engl J Med 372(9): 793-795, 2015. PMID: 25635347. DOI: 10.1056/NEJMp1500523
1. Mirnezami R,
2. Nicholson J and
3. Darzi A
: Preparing for precision medicine. N Engl J Med 366(6): 489-491, 2012. PMID: 22256780. DOI: 10.1056/NEJMp1114866
1. Friedman AA,
2. Letai A,
3. Fisher DE and
4. Flaherty KT
: Precision medicine for cancer with next-generation functional diagnostics. Nat Rev Cancer 15(12): 747-756, 2015. PMID: 26536825. DOI: 10.1038/nrc4015
1. Jena A,
2. Patnayak R,
3. Lakshmi AY,
4. Manilal B and
5. Reddy MK
: Multiple primary cancers: An enigma. South Asian J Cancer 5(1): 29-32, 2016. PMID: 27169120. DOI: 10.4103/2278-330X.179698
1. Utada M,
2. Ohno Y,
3. Hori M and
4. Soda M
: Incidence of multiple primary cancers and interval between first and second primary cancers. Cancer Sci 105(7): 890-896, 2014. PMID: 24814518. DOI: 10.1111/cas.12433
1. Tabuchi T,
2. Ito Y,
3. Ioka A,
4. Miyashiro I and
5. Tsukuma H
: Incidence of metachronous second primary cancers in Osaka, Japan: Update of analyses using population-based cancer registry data. Cancer Sci 103(6): 1111-1120, 2012. PMID: 22364479. DOI: 10.1111/j.1349-7006.2012.02254.x
1. Youlden DR and
2. Baade PD
: The relative risk of second primary cancers in Queensland, Australia: a retrospective cohort study. BMC Cancer 11: 83, 2011. PMID: 21342533. DOI: 10.1186/1471-2407-11-83
1. Litman-Zawadzka A,
2. Łukaszewicz-Zając M and
3. Mroczko B
: Novel potential biomarkers for pancreatic cancer - A systematic review. Adv Med Sci 64(2): 252-257, 2019. PMID: 30844662. DOI: 10.1016/j.advms.2019.02.004
1. Akazawa M and
2. Hashimoto K
: Artificial intelligence in ovarian cancer diagnosis. Anticancer Res 40(8): 4795-4800, 2020. PMID: 32727807. DOI: 10.21873/anticanres.14482
1. Alcaraz N,
2. List M,
3. Batra R,
4. Vandin F,
5. Ditzel HJ and
6. Baumbach J
: De novo pathway-based biomarker identification. Nucleic Acids Res 45(16): e151, 2017. PMID: 28934488. DOI: 10.1093/nar/gkx642

Main menu

User menu

Search

Prediction of Chemosensitivity in Multiple Primary Cancer Patients Using Machine Learning

Abstract

Materials and Methods

Results

Discussion

Acknowledgements

Footnotes

References