Abstract
Background/Aim: Bladder cancer (BCa) is associated with high recurrence rates, emphasizing the importance of early and accurate detection. This study aimed to develop a lightweight and fast deep learning model, Light-Bladder-Net (LBN), for non-invasive BCa detection using conventional urine data.
Materials and Methods: We improved LBN’s generalization by applying data transformations, adding uniform noise, and employing feature selection methods (mRMR, PCA, SVD, t-SNE) to extract key vectors from its fully connected layer. These vectors were integrated into the original dataset, and multiple machine learning models were trained to enhance classification accuracy. Lastly, weighted voting was used to assign importance across these models.
Results: Our approach achieved an accuracy of 0.83, a sensitivity of 0.85, a specificity of 0.80, and a precision of 0.81, indicating robust performance in detecting BCa from urine data.
Conclusion: This non-invasive diagnostic method offers rapid, cost-effective predictions. A free online tool is available for clinicians and patients to conveniently detect BCa using standard urine samples at http://merlin.nchu.edu.tw/LBN/
- Conventional urine examination
- data image transformation
- weighted voting
- deep feature extraction
- non-invasive bladder cancer prediction
Introduction
Bladder cancer (BCa) is among the top ten most common cancers worldwide, with increasing incidence in developed countries (1). Symptoms include frequent urination, painless hematuria, and painful urination, often leading to misdiagnosis and delayed treatment (2). According to a 1996 study, 50% of patients experience recurrence due to micrometastases (3). BCa has a high recurrence rate, making it the fifth most expensive cancer to treat (4). High-risk groups are those over 55, with men three times more likely to develop BCa than women (5). Non-invasive detection methods, such as urine analysis, are less burdensome and cost-effective (6).
Machine learning (ML) is widely used in medicine for data analysis, helping doctors make clinical decisions regarding BCa (7, 8). Deep learning (DL) techniques are also used for medical image recognition, aiding in interpreting pathological images (9). For instance, Yin et al. present a machine learning system that accurately distinguishes between early-stage bladder cancers, Ta and T1, using feature-driven models (10). Although studies suggest urine analysis cannot accurately predict urinary tract infections (11, 12), Dwyer et al. used urine protein-to-creatinine ratio to predict significant proteinuria in pregnancy, finding it effective (13). Huttanus et al. successfully screened BCa using Raman chemometric urinalysis, making it the best study solely utilizing urine data for BCa prediction (14). Similarly, Barak et al. present a study demonstrating that combining ultrasonography and urine cytology significantly improves the sensitivity and specificity of non-invasive bladder cancer detection (15).
In this study, we used ML and DL techniques with conventional urine data to predict BCa, leveraging biomarkers like urine protein and leukocyte. Our Light-Bladder-Net (LBN) model, known for its efficiency and low resource requirements, outperformed traditional ML methods. Enhancements included data augmentation and background noise addition. We employed feature selection methods to extract vectors from LBN’s fully connected layers, improving classification accuracy. A weighted voting ensemble technique integrated outputs from multiple models, yielding an accuracy (ACC) of 0.83, sensitivity (SN) of 0.85, specificity (SP) of 0.80, and precision of 0.81, surpassing Huttanus et al.’s results in all metrics.
Materials and Methods
As shown in Figure 1, our methodology started with data preprocessing to handle missing values. The cleaned data was categorized into numerical and binary types. The Fengyuan dataset was split into 80% for training and 20% for validation, while the MacKay dataset served as an independent dataset (16). We applied image transformations like flipping, duplication, and noise addition to augment the data. Two DL models, InceptionV3 (17) and LBN, were used for training. Vectors from the Fully Connected (FC) layers of these models were merged with the original data using techniques such as Minimum Redundancy Maximum Relevance (mRMR) (18), Principal Component Analysis (PCA) (19), Singular Value Decomposition (SVD) (20), and t-distributed Stochastic Neighbor Embedding (t-SNE) (21). We then used a weighted voting approach to evaluate and compare the models.
Methodological flowchart employed in this study. The process commences with filtering missing values, followed by data type conversion and image transformation. The subsequent analysis entails training through machine learning (ML) and deep learning (DL) techniques. Features extracted from the DL models undergo a second round of training post-feature screening. A weighted voting technique is implemented, assigning distinct weights to various models. The performance outcomes of these models are then juxtaposed for comparison.
Data collection. We used two data sets for our study. The first, from Fengyuan Hospital in Taichung City, Taiwan, was collected between January 2016 and August 2019 with necessary approvals. Bladder cancer incidence is quite low in young adults (22), so we focused on individuals 30 and older, with 783 samples: 394 without BCa history and 389 diagnosed with BCa.
The second dataset, approved by the MacKay Memorial Hospital Institutional Review Board (code 20MMHIS200e), came from a 2022 publication by Tsai et al. (16) From 1337 original samples, we excluded entries with missing data, retaining 196 samples: 59 from non-BCa patients (cervical or prostate cancer) and 137 from BCa patients (16). The Fengyuan dataset was used for model training and validation, while the MacKay dataset served for independent testing.
Data preprocessing. In data preprocessing, missing values were first removed, followed by data type conversion and image transformation. The collected data comprised various parameters, including the patient’s age, gender, glucose, protein, bilirubin, urobilinogen, pH, occult blood, ketone body, nitrites, leukocytes, specific gravity (SG), bacteria, and squamous epithelial cells. The data was numerically transformed, drawing on information from the Medical Laboratory Department of E-Da Medical Foundation and Taichung Tzu Chi Hospital. The Supplementary Data Table SI provides comprehensive conversion details for all parameters.
Model. In this study, Random Forest (RF) (23) and Support Vector Machine (SVM) (24) were employed for preliminary ML analysis. In addition, we additionally developed a fast and shallower model for DL to recognize images of routine urine data and determine whether a patient has BCa based on these images.
Light-Bladder-Net. LBN stands out for its efficient image recognition and low computational demands. The model is designed for rapid computation, avoiding deep architecture. While InceptionV3, a deeper model, is known for medical image recognition and has demonstrated excellent recognition capabilities in related studies on bladder biomarkers (25), LBN extracts features more efficiently. For this study, InceptionV3 serves as a comparison benchmark against LBN.
The LBN’s architecture, illustrated in Figure 2, comprises seven layers. It integrates two convolution layers with 3×3 kernels and rectified linear unit (ReLU) activations and two max-pooling layers with 2×2 kernels. A normalization layer follows, and a flattening layer converts the feature map into a one-dimensional vector for the fully connected (FC) layer, which finalizes the classification.
Architecture of the Light-Bladder-Net (LBN) model. The diagram showcases the structure of the LBN model designed for this research. It comprises two convolution layers with 3×3 kernels and ReLU activation, followed by two 2×2 maximum pooling layers. Subsequent layers include a normalization layer and a flattening layer, leading to the final FC layer for classification. This model configuration optimizes swift computation and minimal memory usage, ensuring efficient feature extraction from images.
A unique LBN feature is its incremental filter approach, enhancing BCa identification. Filters increase proportionally (i×1,i×2,i×4) across layers. The initial layer features filters 2, 4, 8, 16, 32, 64, 128, and 256. Considering the image data size in this study, we performed additional tests using 1, 3, 5, 6, and 7, resulting in a total of 13 combinations, allowing varied DL feature vectors. For example, with i as 5, the filter set (i×1,i×2,i×4) becomes (5,10,20), extracting 20 features. This approach broadens feature combinations, capturing data patterns effectively.
Model evaluation. The efficacy of the computational models is assessed by quantifying the following metrics: Accuracy (ACC) (26), area under the receiver operating characteristic curve (AUC-ROC, hereafter referred to as AUC) (27), sensitivity (Sn), specificity (SP) (28), and the Harmonic Mean of Precision and Sensitivity, also known as F1-Score (29).
Image transformation and augmentation. Data was scaled and normalized using Minimum and Maximum Scaler (MinMaxScaler) from the sklearn preprocessing module in Python, ensuring values ranged between 0 and 255 (30). The data was reshaped into 4×4 images for DL model compatibility and further enhanced with white and black backgrounds, then upscaled to 32×32 and 256×256.
To mitigate undue emphasis on the image background, various noise types, including Gaussian noise (31), Salt-and-pepper Noise (32), and uniform noise (33), were introduced. We used mRMR (34), PCA (19) , SVD (20), and t-SNE (21) to merge FC layer vector features with the original data. This fusion reduces reliance on routine urine data during training, enhancing classification accuracy. All features were extracted and compared using these four methods, with t-SNE showing superior classification performance.
Weighted voting. This study combines the predictions of superior models in DL to improve the recognition of BCa. Initially, we obtained the ACCi, AUCi, and prediction probabilities for eight models, as fi (x), including seven uniform noise models and the original 4×4 black background LBN model. In this context, i represents each individual model. ACC and AUC were used as weights for model ensembling. We multiplied the f(x) by their corresponding weights and then summed the scores of each prediction result (35). This ensemble method leverages the strengths of multiple models, resulting in an improved BCa recognition rate. This process is outlined in detail in Eq. 6 and Eq. 7.
Through the weighted voting method, we can integrate the predictive capabilities and weights of multiple models, enabling us to combine their prediction results and achieve more accurate BCa classification predictions.
Web prediction tool. To facilitate the real-world application of our model, we developed a web-based prediction tool, as shown in Figure S1. The tool, accessible at http://merlin.nchu.edu.tw/LBN/, offers a user-friendly interface where users can input the detection values of routine urine. This prediction tool bridges the gap between our research and its practical implementation. It allows us to reduce unnecessary medical expenses in the early stages of treatment and minimize the discomfort associated with invasive testing, such as painful urination or hematuria (36).
Results
Model performance comparison. We evaluated routine urine data for BCa classification using ML and DL methods. As shown in Table I, RF outperformed SVM in ML, achieving a peak AUC of 0.83 with numerical data. In DL, using a non-fixed initial weight approach averaged over 30 iterations, Inception V3 achieved a Validation ACC of 0.72 and an AUC of 0.78. The LBN model surpassed Inception V3 with a Validation ACC of 0.73 and an AUC of 0.81. The optimal DL model, LBN with a black background and numerical data, achieved a Validation AUC of 0.81 and an independent AUC of 0.65, outperforming the foundational RF. We further investigated the stability of the model under different parameters, based on the validation and independent AUC, and present them in Figure S2 and Figure S3, respectively.
Comparisons of the performance of machine learning (ML) methods [random forest (RF) and support vector machine (SVM)] and deep learning methods [InceptionV3 and Light-Bladder-Net (LBN)] on routine urine data.
Since most images have a black background with central urine data, we explored its influence on the model’s learning. We used flipping and copying techniques to increase the proportion of urine values, enlarge image size, and enhance relationships between adjacent images, improving the model’s discernment capability.
Image expansion analysis. We increased the image size to a maximum of 32×32 for LBN model training, as summarized in Table II. Although neither method showed superior performance without flipping and duplication, the Simply Duplicate method with 8×8 image size provided the best results, with a validation ACC and AUC of 0.71 and 0.80, respectively. We examined model stability with flipping and duplication, which showed limited impact. The stability under image expansion analysis is illustrated in Figures S4-S6. Given the limited impact of flipping and copying, we chose to work with the original 4×4 images.
Performance comparison of models using different image augmentation techniques.
Noise model selection. We added various noises to the black background during convolution to enhance the model’s capacity. We tested Gaussian noise, Salt-and-pepper noise, and uniform noise. Adjusting Gaussian noise’s standard deviation (0.1, 0.01, 0.001) did not yield better results than the original 4×4 image training (Table SII). Salt-and-pepper noise, adjusted by white point probability (0.1 to 0.9), also failed, likely due to interference with urine data.
Uniform noise with maximum values of 5, 10, 20, 40, 80, 160, and 255 was tested. Most improved Validation AUC, with a maximum value of 10 achieving the best Validation AUC of 0.85. SN, SP, and precision remained consistent, while the F1-score increased to 0.76 (Table III).
Performance metrics of models with different uniform noise levels.
Model stability was assessed for different uniform noise values, as shown in Figure 3. The model was most stable at a uniform noise maximum value of 80. However, prioritizing accuracy and generalization, the model with a maximum value of 10 was chosen. The stability of other uniform noise models is presented in Figures S7-S13.
Comparing the stability with different uniform noise models. The model exhibits the highest overall stability among the uniform noise levels when the maximum value is set to 80. However, this study selects the model with a maximum of 10 in uniform noise for greater ACC and improved generalization ability. Val_AUC (Validation AUC): The area under the receiver operating characteristic (ROC) curve measured on the validation dataset. indep_auc (Independent AUC): The ROC curve measured on an independent (external) test dataset.
Impact of feature quantity. Incorporating uniform noise enhanced our model’s recognition. Consequently, we extracted the FC layer vector from this model and conducted feature screening using various methods: extracting full features, mRMR, PCA, SVD, and t-SNE. Utilizing t-SNE, we retained 64 features, which led to the best performance metrics. The t-SNE method consistently improved SN and F1-score, maintaining a validation AUC of 0.83.
Impact of model integration. Models with uniform noise generally performed better. We selected the top seven models based on the highest validation AUC and included the original 4×4 image model for weighted voting (Table IV). Since AUC couldn’t be calculated for weighted voting, we used validation ACC for comparison. All models exceeded a validation ACC of 0.8, outperforming preliminary results and the 0.76 validation ACC of the uniform noise model. Metrics like SN, SP, Precision, and F1-score improved, with SN rising significantly to 0.85 from 0.74 in the original 4×4 image model.
Performance metrics of the combined best models.
Compared to the traditional RF model, feature screening and weighted voting significantly enhanced performance. The uniform noise model excelled in validation AUC, while the t-SNE model showed notable gains in SN and F1-score. However, weighted voting consistently ranked above other methods in validation ACC.
Discussion
In contrast to prior studies highlighted in Table V, works by Van Nostrand et al. (12), Dwyer et al. (13), and Reardon et al. (37) pointed out the urinalysis’ constraints as a dependable predictive measure. When urine data becomes contaminated during collection (37), the enhancements in SN and SP diminish, making both metrics likely to forecast the same category. In our research, we used techniques like image conversion, uniform noise, feature extraction, and weighted voting to reduce the influence of lifestyle habits on urine data and boost its predictive capability. We achieved an SN of 0.85, SP of 0.8, precision of 0.81, and an ACC of 0.83, which stands out among urine data studies. Sanghvi et al. (38) did attain an AUC of 0.88 with a costly urine cytopathology AI algorithm. In Taiwan, using their approach costs patients an additional 50 USD. Furthermore, their results are processed using six Convolutional Neural Network models, which is time-consuming and expensive. Conversely, routine urine analysis can be easily accessed through health check-ups or a request from a medical professional. Our approach converted urine data into a 32×32 image size for training, streamlining the data processing step.
Performance comparison of this study with other urine-related studies.
Our study introduces significant advancements. The LBN model, a resource-efficient tool, surpasses InceptionV3 in computational speed and outperforms traditional ML methods like RF. We improved accuracy using image conversion, uniform noise integration, and weighted voting. Additionally, we developed a web-based prediction tool for preliminary BCa analysis, reducing early-stage testing costs and providing initial diagnostic data to streamline clinicians’ workflow. These developments highlight our contributions to BCa prediction.
Despite these advancements, some limitations should be acknowledged. First, our model relied solely on routine urine data and did not incorporate additional clinical risk factors such as smoking history, occupational exposure, or genetic predisposition. Previous studies have demonstrated that integrating such variables enhances BCa prediction accuracy (39-41). Tsai et al. effectively employed a light gradient-boosting machine model to differentiate cystitis and BCa patients by considering smoking habits. Future studies should explore the inclusion of comprehensive patient data to improve predictive performance. Additionally, investigating the correlation between image background and urine data could be beneficial. Our findings suggest that LBN performs optimally with a black background and specific uniform noise, indicating that slight disturbances may enhance model recognition. Further fine-tuning of kernel size, stride, padding, and layer configurations may refine BCa classification.
Furthermore, the independent test dataset from MacKay Memorial Hospital (n=196) is considerably smaller than the Fengyuan Hospital training dataset (n=783). This imbalance may affect the model’s generalizability, as it is predominantly trained on data from a single institution. A larger and more diverse external dataset is necessary to further validate the model’s stability.
Lastly, clinical differences exist between the two datasets. The MacKay dataset shows notable discrepancies in urine biomarkers, including glucose levels, protein concentration, and squamous epithelial cell counts (Table SIV). These variations may stem from differences in patient populations, urine sample collection protocols, or laboratory measurement techniques. Such factors could influence model performance when applied to new datasets, emphasizing the need for standardization across institutions. Future work should address these limitations by incorporating multi-center datasets, expanding the range of predictive features, and refining model interpretability to facilitate practical implementation in diverse healthcare settings.
Conclusion
In this study, we successfully developed an LBN model for identifying BCa. This LBN model utilizes routine urine testing and incorporates image conversion techniques and uniform noise to the image background. Results indicate that the LBN model outperforms traditional RF methods and improves efficiency. We further enhanced the model’s performance by applying the weighted voting method to integrate uniform noise, improving evaluation metrics, including ACC, SN, SP, and F1-score. These advancements enable the LBN model to support doctors in preliminary BCa diagnosis effectively. Importantly, BCa is most commonly found in middle-aged and elderly individuals. The advantage of the accessible collection of urine data and the absence of any side effects during the diagnostic process make it the optimal diagnostic method for BCa.
Footnotes
Authors’ Contributions
Author contributions to this study were outlined using the CRediT taxonomy. Conceptualization was performed by Chi-Hua Tung, Shih-Huan Lin, and Kai-Po Chang. Data curation and investigation were carried out by Min-Ling Chuang and Ya-Wen Xu. Formal analysis, methodology, and software development were led by Yen-Wei Chu. Project administration and funding acquisition were coordinated by Kai-Po Chang and Yen-Wei Chu. All Authors participated in writing – original draft and writing – review & editing, with final supervision by Yen-Wei Chu.
Supplementary Material
All supplementary figures and tables can be accessed at: http://merlin.nchu.edu.tw/LBN/supplementary_data.pdf
Conflicts of Interest
All Authors confirm that they have no conflicts of interest related to this work.
Funding
This research was supported by the National Science and Technology council, Taiwan, under grant number 111 2221 E 005 073 MY3, 113 2321 B 006 014, 112 2634 F 005 _002 and 111 2423 H 006 002 MY3; NCHU CCH 11307 from National Chung Hsing University and Changhua Christian Hospital.
Artificial Intelligence (A.I.) Disclosure
During the production of this article, a large language model (ChatGPT 4o) was used in some paragraphs solely for language improvement purposes. None of the generation, analysis, or interpretation of research data was performed by generative AI. The figures were not modified by machine learning tools.
- Received January 5, 2025.
- Revision received February 12, 2025.
- Accepted March 4, 2025.
- Copyright © 2025 International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.









