1 Introduction

Saliva is an important biological fluid that provides various functions, including lubrication for speech, digestion of food, and protection from microorganisms. It is produced by multiple salivary glands; particularly the three major salivary glands parotid, submandibular and sublingual, and several minor glands. Saliva is comprised of 99% water with minerals, mucus, electrolytes, nucleic acids and proteins such as enzymes, enzyme inhibitors, growth factors, cytokines, immunoglobulins, and other glycoproteins (de Almeida Pdel et al. 2008). Saliva is a filtration of blood, reflecting the physiological conditions of the body; thus it could be used to monitor clinical status and predict systemic diseases. Compared with blood, saliva offers distinct advantages for diagnostic or research purposes; its collection is cost-effective, safe, easy and non-invasive. Indeed, many of the characteristics of bodily fluids, such as blood and urine, are applicable to saliva including diurnal variation and the presence of diverse diagnostic analytes.

Cancer is a leading cause of death and oral cancer annually affects more than 400,000 individuals worldwide. Despite advances in treatment, the overall 5-year survival rate of patients with oral cancer is approximately 50% and has not improved over the past 30 years (Epstein et al. 2002; Mao et al. 2004). The mortality rate associated with oral cancer is particularly high because it is routinely discovered late, commonly after metastasis to the lymph nodes or neck has already occurred. Worldwide, more than 200,000 patients with pancreatic cancer are registered annually, and 98% of the patients die of the disease (Parkin et al. 2005). The high mortality rate from this cancer is thought to be due to a lack of adequate systemic therapies and the high rate of metastasis at the time of diagnosis. Therefore, novel diagnostic tests are urgently needed to detect these cancers at the premalignant stage.

Studies using molecular-based biomarkers in blood or urine to detect the progress of malignant tumors have mainly focused on altered DNA methylation or mutation, or on changes in the RNA or protein levels (Sidransky 2002). In addition, several molecular biomarker candidates have been identified by analyzing the transcriptome or proteome of saliva (Hu et al. 2007, 2008; Zimmermann and Wong 2008). However, sufficiently sensitive and reproducible saliva-based diagnostic methods are not yet available. In addition, conventional tumor markers, such as serum cancer antigen 19-9, which is widely used in the diagnosis of pancreatic cancer, are known to have less specificity for particular lesions. With the exception of breast cancer (Streckfus et al. 2008), few studies have used saliva to detect tumors remote from the oral cavity. Tumor markers that can discriminate individual cancer-specific differences and which are sensitive are required for clinical applications.

Metabolomics, the measurement of all intracellular metabolites, has become a powerful new tool to gain insight into cellular function. So far, several metabolomic approaches have been reported (Aharoni et al. 2002; Fiehn et al. 2000; Plumb et al. 2003; Reo 2002). In this marriage of methodologies, CE offers rapid analysis and efficient resolution, and MS provides excellent selectivity and sensitivity (Soga et al. 2006). A number of clinical applications of CE-MS exploring urinary or serum proteomics biomarkers, were developed to detect and identify the charged peptide content, which demonstrates their potential to assess the profiles of small molecules in biofluids (Fliser et al. 2005; Kolch et al. 2005; Metzger et al. 2009; Schiffer et al. 2006, 2008; Zurbig and Mischak 2008). Although diverse saliva analyses with CE have been proposed (Lloyd 2008), salivary metabolomic analysis to determine cancer-specific profiles for early cancer detection has not yet been conducted. In this study, we, for the first time, obtained and compared comprehensive salivary metabolic profiles of patients with oral, breast or pancreatic cancer, or periodontal disease, and healthy controls. We then identified individual cancer-specific markers with high discriminative ability, demonstrating the potential use of salivary metabolomics in cancer diagnosis.

2 Materials and methods

2.1 Patient selection

This study was approved by the UCLA Institutional Review Board. Patients with oral, breast or pancreatic cancer or periodontal disease and the healthy controls were recruited at the UCLA Medical Center. All patients had recently been diagnosed with primary disease and were without metastasis; none had received any prior treatment in the form of chemotherapy, radiotherapy, surgery or alternative therapy. No subjects had a history of prior malignancy, immunodeficiency, autoimmune disorders, hepatitis or HIV infection. Written, informed consent was obtained from all patients and from volunteers who agreed to serve as saliva donors.

2.2 Sample collection and sample preparation

The subjects were asked to refrain from eating, drinking, smoking or using oral hygiene products for at least 1 h prior to saliva collection. The subjects rinsed their mouth with water and, 5 min later, they were instructed to spit into 50-cc Falcon tubes, which were placed in a Styrofoam cup filled with crushed ice. The subjects were reminded not to cough up mucus. Five milliliters of unstimulated saliva could usually be collected in 5–10 min. Saliva collection was performed in a private room. The saliva samples were centrifuged at 2600×g for 15 min at 4°C and spun for another 20 min for incomplete separation. Equal amounts of supernatant were transferred to two fresh tubes and the samples were processed and frozen within 30 min. The protocols used for sample collection are described in more detail elsewhere (Li et al. 2004).

Saliva fluid samples were obtained from patients with oral (n = 69), breast (n = 30) and pancreatic cancer (n = 18), patients with periodontal diseases (n = 11) and healthy controls (n = 87). The race, ethnicity, sex and age of the subjects are summarized in Table 1. Except for age, clinical parameters were not collected for the non-oral cancer groups.

Table 1 Subject characteristics

Frozen saliva was thawed and dissolved at room temperature, and 27 μl of each sample (69 patients with oral cancer and 70 healthy control samples) were added to a 1.5-ml Eppendorf tube, to which 3 μl of water containing 2 mM methionine sulfone and 2 mM 3-aminopyrrolidine as internal standards was added and mixed well. Similarly, individual thawed saliva samples (24 μl) from patients with breast or pancreatic cancer, and patients with periodontal disease and 17 healthy controls were admixed with 6 μl water containing internal standards (1 mM each of methionine sulfone and 3-aminopyrrolidine). These internal standards were selected because they were not included in the human endogenetic metabolites. Furthermore, they migrated to the center of the metabolite distribution, which was used to confirm the quality of the alignment results. Even though a unified dilution was preferred for the preparation of all samples, a greater dilution ratio was required for the control, breast, pancreatic cancer, and periodontal disease samples because of their high electrolyte content, which decreases the electrical current during the measurement.

2.3 Metabolite standards, instrumentation, and CE-TOF-MS conditions

The metabolite standards, instrumentation and CE-TOF-MS condition were used in this study as previously described (Soga et al. 2006), with slight modifications in the lock mass system setting. All chemical standards were of analytical or reagent grade and were obtained from commercial sources. They were dissolved in Milli-Q water (Millipore, Bedford, MA, USA), 0.1 mol/l HCl or 0.1 mol/l NaOH to obtain 1, 10 or 100 mmol/l stock solutions. The working solution was prepared prior to use by diluting with Milli-Q water to the appropriate concentration.

All CE-MS experiments were performed using an Agilent CE capillary electrophoresis system (Agilent Technologies, Waldbronn, Germany), an Agilent G3250AA LC/MSD TOF system (Agilent Technologies, Palo Alto, CA, USA), an Agilent 1100 series binary HPLC pump, and the G1603A Agilent CE-MS adapter and G1607A Agilent CE-ESI-MS sprayer kit. System control and data acquisition were done with G2201AA Agilent Chemstation software for CE and Analyst QS software for TOF-MS (ver. 1.1).

All samples were measured in single mode (see below); separation was done in fused-silica capillaries (50 μm i.d. × 100 cm total length) filled with 1 M formic acid as the background electrolyte. Sample solutions were injected at 50 mbar for 3 s and a voltage of 30 kV was applied. The capillary temperature was maintained at 20°C and the temperature of the sample tray was kept below 5°C using an external thermostatic cooler. The sheath liquid, comprising methanol/water (50% v/v) and 0.5 μM reserpine, was delivered at 10 μl/min. ESI-TOF-MS was conducted in the positive ion mode. The capillary voltage was set at 4 kV; the flow rate of nitrogen gas (heater temperature 300°C) was set at 10 psig. In TOF-MS, the fragmentor, skimmer and OCT RFV voltage were set at 75, 50 and 125 V, respectively. In the present study, we used a methanol dimer adduct ion ([2MeOH + H]+, m/z 65.059706) and hexakis phosphazene ([M + H]+, m/z 622.028963) to provide the lock mass for exact mass measurements. Exact mass data were acquired at the rate of 1.5 cycles/s over a 50–1000 m/z range.

2.4 Processing of CE-TOF-MS data

Raw data were analyzed with our proprietary software called MasterHands, which has already been used in several CE-TOF-MS-based profiling studies (Hirayama et al. 2009; Minami et al. 2009; Saito et al. 2009). The data analysis workflow starting with the raw data included noise-filtering, baseline correction, peak detection and integration of the peak area from sliced electropherograms (the width of each electropherogram was 0.02 m/z). Such functions are commonly used by data processing software such as MassHunter from Agilent Technologies, or XCMS (Smith et al. 2006) for liquid chromatography-MS or gas chromatography-MS data. The accurate m/z value for each peak detected within the time domain was calculated with Gaussian curve-fitting to the mass spectrum on the m/z domain peak. The alignment of peaks in multiple measurements was done by dynamic programming (DP)-based techniques (Baran et al. 2006; Soga et al. 2006) with slight modifications. The method picked up a few representative peaks using the Douglas-Peucker algorithm (Wallace et al. 2004) from unit m/z electropherograms, found corresponding peaks across multiple samples by DP, and optimized the numerical parameters of the normalization function for CE-migration (Reijenga et al. 2002). Instead of representative peaks, we used the detected peaks with accurate m/z values and regarded the peaks whose m/z difference was less than 20 ppm as ones that were derived from the same electropherograms.

All peak areas were divided by the area of the internal standard (relative area) to normalize the signal intensities, and to avoid injection-volume bias and mass-spectrometry detector sensitivity bias among multiple measurements. Undetected peaks with a threshold signal-to-noise ratio of 2 were given a peak area of 0. The relative areas of the 17 healthy control samples and of the pancreatic and breast cancer, and the periodontal disease samples were multiplied by 1.25/1.1 to standardize the sample concentration.

The peaks derived from salt and neutral molecules were found in the first and the last few minutes, respectively. Then, isotopic compounds, ringing, spikes and fragment and adduct ions were eliminated and the peak data sets were compared across the sample profiles and aligned according to m/z and migration time. Although all of the metabolites were quantified separately, the sum of the quantified values of leucine and isoleucine were counted as a single marker owing to the low separation of these peaks. Peaks showing P < 0.05 in the non-parametric, multiple comparison Steel–Dwass test, between the controls and at least one disease cohort were selected as candidate markers.

2.5 Metabolite identification

The peaks were identified based on the matched m/z values and normalized migration times of the corresponding standard compounds if available. Of the peaks that did not match with any standard compounds, the concomitant peaks, such as isotopic peaks and fragment peaks, were removed based on the difference in m/z values and the normalized migration time of the two peaks with an error tolerance of 20 ppm and 0.01 min to yield only the peaks, or referred to as components, which might be derived from metabolites (Brown et al. 2009). Although CE-TOF-MS provides accurate molecular mass at the milli m/z level, the m/z alone is seldom successful to identify the metabolite (Kind and Fiehn 2006, 2007). Therefore, we used their m/z values and the migration times predicted by the Artificial Neural Networks (ANNs) (Sugimoto et al. 2005) to identify the metabolite. Briefly, the ANN model was first trained using the measured migration times of standard compounds and molecular descriptors with the net charge calculated from the pKa values. The trained ANN model then predicted the migration times of the candidate metabolites. Here, we used compounds available from the Kyoto Encyclopedia of Gene and Genomics (KEGG) database (Goto et al. 2002) and the Human Metabolome Database (HMDB) (Wishart et al. 2007) as candidates. The composition formulae obtained using the measured mass spectrometry and the matched candidates were confirmed by their isotope distribution patterns.

2.6 Statistical analysis

To evaluate the ability of the detected peaks to discriminate diseases, we conducted an unsupervised method, principal component analysis (PCA). The same analyses were also conducted to discriminate only between controls and oral samples between males and females, and between race and ethnic groups. The analyses were not performed for the other patient groups due to the unavailability of clinical parameters. Supervised classification techniques, such as partial least squares-discriminant analysis (Jonsson et al. 2005; Michell et al. 2008; Woo et al. 2009), support vector machine (SVM) (Mahadevan et al. 2008) and multiple logistic regression (MLR), are commonly used to separate subjects and to identify important features for the separation. Here, we developed independent MLR models to discriminate healthy individuals and each disease cohort using a stepwise variable selection method (backward procedure to eliminate non-predictive peaks with a threshold of P > 0.10) to construct the predictive models. The models were trained with the complete dataset and we evaluated their versatilities by tenfold cross-validation (CV). The data were randomly separated into training sets and remaining data and this process was repeated ten times for all of the values selected in the training set. The non-parametric Mann–Whitney test was used to compare two groups, e.g. comparison of metabolites in males and females.

Statistical analyses using the Steel–Dwass test were performed using the R package with the Design, Hmisc, and Lexis libraries (available at http://lib.stat.cmu.edu/R/CRAN/). Statistical analyses using the Mann–Whitney test and the heat maps were generated with TM4 software (Saeed et al. 2003). The CV data were generated using WEKA (Witten and Frank 2005). The PCA and MLR models were developed using JMP Version 7 (SAS Institute Inc., Cary, NC, USA, 1989–2007; http://www.jmp.com/software/jmp.shtml).

3 Results and discussion

3.1 Statistical results of discriminative metabolites

On average, CE-TOF-MS detected 3041 peaks (minimum 1585, maximum 8400, standard deviation (SD) 1137) in each saliva sample. After removing the concomitantly observed peaks such as the isotopic and fragment peaks, and noise peaks including spike and ringing peaks, an average of 90 peaks were derived from the metabolites (minimum 48, maximum 128, SD 15). The standard deviation of the relative peak areas of the metabolite-derived peaks was 1.14 (no unit), and the SDs of the migration times before and after the time normalization procedure were 1.75 min and 3.02 × 10−3 min, respectively. Of the remaining peaks, we identified 57 metabolites that were significantly different between the patients and healthy controls (P < 0.05; Steel–Dwass test).

The marker pool used to discriminate between individuals with oral cancer and healthy controls revealed 28 metabolites; namely pyrroline hydroxycarboxylic acid, leucine plus isoleucine, choline, tryptophan, valine, threonine, histidine, pipecolic acid, glutamic acid, carnitine, alanine, piperidine, taurine, and two other metabolites with a significance of P < 0.001 (Steel–Dwass test); piperidine, alpha-aminobutyric acid, phenylalanine and an additional metabolite with a significance of P < 0.01 (Steel–Dwass test); and betaine, serine, tyrosine, glutamine, beta-alanine, cadaverine, and two other metabolite with a significance of P < 0.05 (Steel–Dwass test). The overlaid electropherograms of these CE-TOF-MS peaks with a 2-dimensional map (migration time and m/z) visualizing the difference in intensity between the averaged control and oral cancer samples are shown in Fig. 1. The vertical smear lines in the first few minutes (5–7 min) and those at a later time (at 19 min) were derived from salt ions and neutral molecules, respectively, and most of the peaks derived from charged metabolites were distributed between these times. Using a similar strategy, we identified 28 metabolites for breast cancer, 48 for pancreatic cancer and 27 for periodontal disease (P < 0.05; Steel–Dwass test) as biomarker candidates. The detected markers and the statistical results are listed in Table 2; dot plots of the quantified peak areas are shown in Fig. 2 and Supplementary Fig. S1. Although, several metabolites in the dot plots achieved a statistically significant difference, individual metabolites could not separate any two groups with high sensitivity and specificity. The score plots of the PCA analyses for all individuals are shown in Fig. 3 and in Supplementary Fig. S2. Although the PCA developed using the metabolite profiles of all subjects showed no unequivocal group-specific clusters, PCAs developed individually for the control and each disease group showed partial discriminative separation of the subjects, which might be attributed to the reduced complexity of the given datasets, or the extinction in the overlap between the distribution of the score plots for all disease groups.

Fig. 1
figure 1

A summary of the different metabolome profiles of cations obtained from CE-TOF-MS analyses of salivary metabolites from control (n = 87) and oral cancer samples (n = 69). The X and Y axes represent the migration time and the m/z value, respectively. The color density reflects the difference in intensity between the averaged control and oral cancer samples. Black circles indicate peaks that are significantly different between healthy control and oral cancer samples (P < 0.05; Steel–Dwass test). The small linked figures include overlaid electropherograms of control (blue) and oral cancer samples (red)

Table 2 Salivary metabolite marker candidates (P < 0.05 Steel–Dwass test) and the ratio of the relative area of oral, breast and pancreatic cancers, and periodontal diseases to controls (n = 215)
Fig. 2
figure 2

Representative dot plots for the relative area of detected metabolites in samples from all groups. The colored dots denote healthy controls (blue), oral (red), breast (pink), pancreatic cancer (green), and periodontal disease (purple). The Y- and X-axes denote the relative peak area (no units) and the group name, respectively. The horizontal, center long bars and the short top/bottom bars indicate the means and standard deviations, respectively. The stars indicates * P < 0.05, ** P < 0.01, and *** P < 0.001 (Steel–Dwass test). Only metabolites showing a significant difference between oral cancer and controls at P < 0.001 and matched with standard library are displayed. The dot plots of the other metabolites are shown in Supplementary Fig. S1

Fig. 3
figure 3

Score plots of principal components (PC) analyses. The subjects in all groups are shown in 3-dimensional (a) and 2-dimensional (b) plots without outliers. The cumulative proportions of the first, second and third PCs (PC1, PC2, and PC3) were 44.8, 57.6 and 67.0%. The same analyses presented for all datasets are shown in Supplementary Fig. S2

The MLR model developed for oral cancer yielded a high AUC (0.865), and the trained models also showed high separation ability in the CV (AUC = 0.810). The receiver operating characteristic (ROC) curves and selected parameters of the MLR models for each disease are shown in Fig. 4 and Supplementary Table S1, respectively. The MLR models for pancreatic cancer and periodontal disease yielded high AUCs in the CV test (0.944 and 0.954, respectively), using only five and two metabolic markers, respectively; while oral and breast cancers (0.810 and 0.881, respectively) used 9 and 14 metabolites, respectively, with lower AUCs. On the metabolite heat map (Fig. 5), the control group and the periodontal disease group were relatively lower and the pancreatic cancer group tended to be homologically higher, while the oral and breast cancers exhibited more diverse profiles compared with the other groups. This suggests that our MLR models for oral and breast cancer require additional parameters for accurate classification. The heterogeneous nature of oral cancers, including oral squamous cell carcinoma (OSCC), oropharyngeal, tongue and neck cancer, may produce different profiles; this diminishes the discriminative capability of a single classification model. The diverse profiles associated with breast cancer may result in a similar situation because breast cancer comprises structurally differing types according to the expression of hormone receptors such as estrogen and progesterone, and is affected by clinical parameters, such as the patient’s age or menopause status. Three metabolites, taurine, piperidine, and a peak at 120.0801 m/z, were oral cancer-specific markers (different from all of the other groups at P < 0.05; Steel–Dwass test) and eight metabolites (leucine with isoleucine, tryptophan, valine, glutamic acid, phenylalanine, glutamine, and aspartic acid) were pancreatic cancer-specific markers. Although several metabolites in breast cancer patients yielded a statistically significant difference between breast cancer and healthy controls, including taurine and lysine (P < 0.001 for both; Steel–Dwass test), there were no differences in metabolites between breast cancer and other cancer, and they were not unique for breast cancer.

Fig. 4
figure 4

ROC curve analysis of the ability of salivary metabolites to discriminate between samples from patients with a oral (n = 69), b breast (n = 30) or c pancreatic cancer (n = 18), and d samples from patients with periodontal diseases (n = 11) and the controls (n = 87). The solid (red) and dotted (blue) ROC curves were obtained using the complete data as a training set and with a tenfold cross-validation, respectively. Using a cut-off probability of 50%, the calculated area under the ROC curves were 0.865 (0.810) for oral, 0.973 (0.881) for breast and 0.993 (0.944) for pancreatic cancer, and 0.969 (0.954) for periodontal diseases. The non-parenthetic values were obtained with the full-training data and parenthetic values by tenfold cross-validation

Fig. 5
figure 5

Heat map of 57 peaks showing significantly different levels (P < 0.05; Steel–Dwass test) between control samples (n = 87) and samples from patients with at least one disease (n = 128). Each row shows data for a specific metabolite and each column shows an individual. The colors correspond to the relative metabolite areas that were converted to Z-scores

3.2 Comparison of the obtained metabolites with previous studies

Of the metabolite profiles obtained, the annotated metabolites included carnitines (betaine, choline, carnitine, glycerophosphocholine), polyamines (cadaverine and putrescine), a purine (hypoxanthine), amino alcohols (ethanolamine), aliphatic and aromatic amine (trimethylamine), and amino acids (the others), in accordance with the defined chemical class category in HMDB. Because each MLR model developed to discriminate between control and patient groups reached high accuracy by incorporating quantified multiple metabolites, the quantitative associations between the multiple metabolites and the individual markers are important. Changes in the individual metabolites were generally consistent with those of earlier studies. For example, polyamines are correlated with cell growth and proliferation (Casero and Marton 2007; Gerner and Meyskens 2004; Tabor and Tabor 1984), and with tumor growth in oral cancer (Dimery et al. 1987), while putrescine is used to monitor the effect of chemotherapy on oral cancer cells (Okamura et al. 2007). The serum concentration of putrescine and cadaverine are decreased in cancer patients undergoing radiotherapy but remain higher than those in healthy individuals (Khuhawar et al. 1999). There were no significant differences in urinary polyamine levels between the healthy individuals and breast cancer patients; however, the levels of putrescine, spermine and other metabolites were significantly higher in patients with breast cancer (Byun et al. 2008). Oral polyamine levels are also affected by periodontitis and gum healing (Silwood et al. 2002). We found that the levels of ornithine and putrescine were higher in patients with breast or pancreatic cancer, and were markedly higher in patients with oral cancer, than in our healthy controls, while there was no significant difference between patients with periodontal disease and the controls. Although the quantitative level of polyamines is associated with regulation of tumor growth and with periodontitis, our results indicate that salivary polyamines are affected by the cancer type and by periodontitis, and that their levels were markedly higher in patients with oral cancer.

In addition to polyamines, the level of tryptophan (Carlin et al. 1989), which is increased in oral and pancreatic cancer, is a direct marker for tumor development. In terms of an indirect connection between the detected metabolites and human cancer, the repeat peptide Pro-Pro-Gly, which is expressed at high levels in breast cancer, is an inhibitor of matrix metalloproteinase-2 (MMP-2, gelatinase A), which plays an important role in tumor invasion and metastasis (Jani et al. 2005). The expression levels of the amino acid transporters ACST2 and LAT1 are elevated in primary human cancers, and cancer cells optimize their metabolic pathways by activating the extra- to intracellular exchange of amino acids. Peptides and acids are derived from various sources, such as fragmented proteins, and the saliva metabolome profiles comprising these compounds may reflect the integrated results.

A significantly decreased level of arginine was observed in plasma samples from several cancers including breast, colonic and pancreatic cancer, which might be due to increased uptake of arginine by tumor tissues with high arginase activity (Vissers et al. 2005). However, salivary arginine was hardly changed, and there were no differences among the groups (Supplementary Fig. S3 and Table S2). A trend for decreasing levels of amino acids, including leucine, isoleucine, valine and alanine, has been reported in pancreatic cancer samples (Fang et al. 2007). The levels of amino acids in breast cancer tissue samples showed similar patterns, with low levels of isoleucine, leucine, lysine and valine (Yang et al. 2007). The decreased amino acid levels appear to be the result of enhanced energy metabolism or upregulation of the appropriate biosynthetic pathways, and required cell proliferation in cancer tissues. However, the observed salivary amino acid levels showing significant differences in the cancer groups (in Table 2) were higher than in the controls. The heterogeneous systems that transport amino acids from blood to saliva via the salivary gland, such as kinetic differences, or the dependence or independence of small ions such as potassium and sodium (Mann and Yudilevich 1987), altered the concentration of these ions because of water movement through the paracellular route (Melvin 1999) or channels (Ishikawa and Ishida 2000). Metabolism in the salivary gland itself might also play a major contribution to the differences in profiles between saliva and blood. Further validation of these findings by comparing saliva profiles with blood and tissue profiles is needed to understand the reason for the different saliva amino acid profiles.

Choline, a quaternary amine, is an essential nutrient that is predominantly supplied by the diet, and choline-containing metabolites are important constituents of phospholipid metabolism of cell membranes and are associated with malignant transformation, such as breast, brain and prostate cancers (Ackerstaff et al. 2003). Magnetic resonance spectroscopy (MRS) is routinely used to quantify choline-based metabolism in malignancies such as head and neck cancer and breast cancers (Bolan et al. 2003). Choline is highly metabolized in tumors to phosphocholine and is also highly oxidized to betanine; hence, the low concentration of choline and high concentrations of phosphocholine and betaine (Katz-Brull et al. 2002) were observed. Furthermore, the levels of choline metabolites were higher in tumors than in benign lesions or normal tissues (reviewed in Haddadin et al. 2009). In tumor cells, an excessive increase in plasma choline levels in patients with breast cancer was also shown (Katz-Brull et al. 2001). Aberrant choline metabolism can be explained as a result of enhanced membrane synthesis and degradation, which represent excessive proliferation of cancer cells. Pancreatic cancer tissue had a unique profile showing decreased levels of phosphocholine and glycerophosphocholine, but not choline (Fang et al. 2007). We found that the levels of phosphocholine (Supplementary Fig. S3 and Table S2) and glycerophosphocholine (Table 2) were increased in the saliva samples from oral cancer patients and were decreased in the other groups.

Creatine phosphate acts as a store for high-energy phosphates. Therefore, its concentration might be altered in energy-demanding tissues (Maheshwari et al. 2000). Previous studies showed an increase in the choline-creatinine ratio in tumor tissues and in the serum of patients with OSCC (Maheshwari et al. 2000; Tiziani et al. 2009). Creatine is converted to creatine phosphate by creatine kinase. Increased creatine phosphate levels were also found in other tumors, such as breast and gastrointestinal tract tumors. In our study, the salivary choline level was significantly higher in subjects with oral and pancreatic cancers (P = 2.30 × 10−5 and P = 1.91 × 10−4, respectively; Steel–Dwass test), but not in the other groups. Therefore, the salivary choline–creatinine ratio showed oral cancer-specific elevation (Supplementary Table S2 and Fig. S3). However, this finding needs to be interpreted with care because choline is a nutrient present in most foods.

Compared with oral cancer, breast and pancreatic tumors are physically remote from the oral cavity. Therefore, it can be questioned why salivary metabolite profiles reflect the aberrant localized tumour metabolism. Systemic biofluids, such as blood and lymph fluid, are one of the routes that readily bypass these tumors and the salivary gland, which blends saliva with contaminating blood. Several metabolites in tumor tissues, such as lactate, which is derived from tumor exposed to hypoxia, were altered both with and without metastasis (Hirayama et al. 2009; Walenta et al. 2000). Although abnormal arginine levels in breast cancer without metastasis were observed, the same metabolic changes were shown in a pooled group of patients with colonic and pancreatic cancer with/without metastasis (Vissers et al. 2005). In OSCC patients without metastasis from the primary tumor, cancer-specific changes in serum and salivary mRNA levels (Li et al. 2006; Pickering et al. 2007) and blood metabolome levels (Tiziani et al. 2009; Zhou et al. 2009) were shown. Although this does not constitute direct proof that the aberration in salivary metabolites is attributed to a remote tumor, evidence that the salivary metabolite profiles reflects the systemic and localized tumor status or its response to chemotherapies, such as breast and lung cancer, has accumulated (Emekli-Alturfan et al. 2008; Gao et al. 2009; Harrison et al. 1998; Streckfus et al. 2006, 2008). Although previous studies have demonstrated an increase in choline metabolites in blood in various cancers, the increase in choline metabolites in oral cancer patients in study indicate that the transportation of these metabolites from the blood to the saliva through the salivary gland is low, even though their levels in blood are elevated. Alternatively, these metabolites were diffused from the oral malignancy to the salivary gland via a route other than the blood vessel. We acknowledge that the current study merely mined the data and showed that the changes in salivary metabolites had cancer-specific features. Further biological studies to compare the metabolite profiles obtained concurrently from saliva, blood and cancer tissue is needed to provide rational evidence for the systemic metabolite links.

3.3 Bias derived from clinical parameters

We evaluated the metabolite bias introduced by relevant clinical parameters (age, gender, race and ethnicity). The PCA score plots showed poor separation between male and female subjects among healthy controls and patients with oral cancer (Supplementary Fig. S4). Statistical comparisons of the relative area are presented in Supplementary Table S3. Takeda et al. (2009) measured the gender-specific differences in salivary metabolites and found that formate, lactate, propionate and taurine were significantly higher in males. Compared with these metabolites, the gender-specific level of taurine, which was the only metabolite observed in our measurement condition, showed little difference between the subjects in the control and oral cancer groups. By contrast, in the control group, tyrosine and a metabolite at 214.4440 m/z were significantly higher in females than in males (P = 0.0492 and p = 0.0261, respectively; Mann–Whitney test). In the oral cancer group, threonine and serine were significantly higher in males and piperidine was higher in females (P = 0.0340, P = 0.0462, and P = 0.0221, respectively; Mann–Whitney test). Takeda et al. (2009) discussed that these gender-specific differences might be attributed to dental care, hormones such as estrogen, and oral pathogenesis carriers such as microflora. Indeed, infection of the oral environment with viruses such as human papillomavirus or micro-organisms is known to be a risk factor for the development of oral cancer (Meurman and Uittamo 2008). Although we found that the gender-specific differences in metabolic profiles differed between the tumor types, the number of metabolites showing significant differences was low, which implies that the disease-specific variation is predominantly embedded in the 57 metabolites identified here.

In the control and oral cancer groups, the PCA based on race and ethnicity were visualized using score plots (Supplementary Fig. S5) and the statistical analytical results are presented in Supplementary Table S4. In the control group, there were no significant differences between African-Americans and Caucasians, or African-Americans and Hispanics. Meanwhile, 11 and 12 significantly different (P < 0.05; Steel–Dwass test) metabolites were observed between African-Americans and Asians, and Asians and Caucasians, respectively. Similarly, the profiles between Asians and Hispanics, and Caucasians and Hispanics revealed three and seven significantly different metabolites (P < 0.05; Steel–Dwass test). Of particular note, levels of putrescine, proline, glycine and unannotated metabolites at 118.0864 m/z and 10.05 min were low in Asians, while the level of burimamide was high in African-Americans. A country-dependant bias in human urinary metabolite profiles has also been reported elsewhere (Holmes et al. 2008). In their study, positively charged metabolites, such as alanine-related metabolites, showed discriminative characteristics and were correlated with several dietary factors such as energy intake, dietary cholesterol and alcohol intake. However, in our study, there were no differences in alanine levels in either the control or the oral cancer subjects. In the control group, there were no differences in 34 out of 57 marker candidates among the race or ethnic groups. In subjects with oral cancer, only a metabolite at 211.4440 m/z showed a significant difference (P = 0.0386; Steel–Dwass test). Although biases based on race or ethnicity-were found in the 57 metabolic profiles, the number of the metabolites showing significant differences were less than the number of peaks showing significant differences in cancer-specific profiles, which implies that this bias might be more moderate than disease-specific differences.

Age-related differences have been reported in a transcriptome study of the salivary gland (Srivastava et al. 2008). The coefficients of regression lines for age and relative area for all 57 metabolite markers are presented in Supplementary Table S5. It has been reported that other commonly used methods for standardization of metabolites in biofluid yield different statistical results (Schnackenberg et al. 2007); therefore, consistent decreases or increases in levels of metabolites among subjects with correlated clinical parameters should be accounted for. In the control subjects and patients with pancreatic cancer, there was a positive correlation between metabolites and age, whereas the opposite was true for patients with oral or breast cancer or periodontal diseases. Accordingly, it is unlikely that age is correlated with the concentrations of salivary metabolites.

Several limitations in this study need to be acknowledged. First, the metabolite profiles in saliva might fluctuate to similar or greater levels compared with other omics profiles, such as the proteome and transcriptome, in response to systemic conditions such as stress, and oral conditions including gingival crevicular fluid and oral microbiota (reviewed in Fabian et al. 2008). Therefore, the reproducibility of the sample collection protocol used in this study should be rigorously verified under various conditions. Circadian rhythms in salivary flow rate and components have been reported (Dawes 1972). Levels of putrescine and cadaverine, which correlate with oral malodor, were markedly altered during waking time, even in healthy donors (Cooke et al. 2003). Although, the samples were collected within a limited period of time in the morning, levels of these metabolites were generally higher in patients with most types of cancer in the present study. The variance in the concentrations of these metabolites should be validated in future studies. Another external factor that alters saliva contents is the time-course of fluoride concentration, which has been tracked, and the changes in concentrations continued for 30 min after eating food (Hedman et al. 2006). Therefore, the 1-h period before sample collection should be evaluated in terms of food intake. Smoking is also known to affect salivary metabolites such as citrate lactate, pyruvate and sucrose (Takeda et al. 2009). The metabolites identified in this study could not be compared with these metabolites because they were not positively charged in our measurement condition. Therefore, the profiles of positively charged metabolites should be explored in further analyses.

Second, the sample sizes, particularly the number of patients with breast or pancreatic cancer or periodontal diseases, were relatively small. A larger cohort, including samples from an independent institute, would allow for statistical comparisons with greater power and a more rigorous validation. In addition, samples from patients with systemic diseases showing similar symptoms, such as oral leukoplakia and oral cancer (Zhou et al. 2009), chronic pancreatitis and pancreatic cancer (Fang et al. 2007; Kojima et al. 2008), should be compared with evaluate the sensitivity and specificity of the detected metabolites. In this study, the patients’ age was collected for all samples and only a few additional parameters, namely sex and race, or ethnic group, were collected for the control and oral cancer group. Analyses and validation studies taking into account the complete clinical and pathological parameters, including menopausal status, estrogen and progesterone receptors for breast cancer, and risk factors including smoking and alcohol drinking for oral cancers are essential before actual diagnostic application of the classification model obtained in this study. In this study, although we used stepwise feature selection and an MLR model to identify classifiers, other feature selection and classification methods are also applicable, such as regression tree models (Li et al. 2004, 2006) and concurrent use of ANN with SVM (Ayers et al. 2004). Instead of developing a classification model only based on the salivary metabolome profiles of matched subjects, the construction of a marker model incorporating related clinical features or risk factors and biomarkers can be used to visualize the probability of a specific diseases status; for example, nomograms are a commonly used strategy (Brennan et al. 2004; Gross et al. 2008; Katz et al. 2008).

A metabolomic study using serum samples from patients with oral cancer showed stage-specific profiles (Tiziani et al. 2009). The profiles obtained in this study were simply categorized into the type of cancer. Therefore, future studies are needed that integrate histological and clinical features. Simultaneous analyses of the metabolic profiles in blood and tissue collected from the same patients are also needed to track the biological sources of the disease-specific signatures in salivary metabolite profiles. Although there are still several limitations to be addressed, the methodology used in this study to detect salivary metabolite profiles are not limited to early diagnosis but offer the potential to aid the characterization of malignant neoplasms or tumors by integrating histological or clinical features, such as staging.

4 Concluding remarks

This is the first study to comprehensively analyze salivary metabolites and to identify metabolic profiles specific to oral, breast and pancreatic cancers. A larger number of patient samples, particularly those from different institutes, and additional clinical variables are needed for further validation and future clinical application of our method. In addition, integrating the knowledge obtained from other omics studies may help us to understand the biological basis of these disease-specific metabolic profiles.

In conclusion, our study has demonstrated that CE-TOF-MS can readily and effectively be applied to salivary metabolomics. We have proposed an alternative use for salivary diagnosis to be applied for the detection of oral, breast and pancreatic cancers.