Tuberculosis prediction: performance analysis of machine learning models for early diagnosis and screening using symptom severity level data
-
https://doi.org/10.14419/parmkr90
Received date: April 18, 2025
Accepted date: May 21, 2025
Published date: May 26, 2025
-
Tuberculosis Prediction; Machine Learning; Symptom Severity; Artificial Neural Network; Medical Diagnosis -
Abstract
Tuberculosis (TB) remains a formidable issue for worldwide public health and calls for swift and exact diagnostic strategies to achieve the best health results for those affected. A methodical machine learning (ML) sequence was diligently followed, featuring data preprocessing, feature choice, encoding, and the training of the model in a logical order. A detailed investigation was performed on six unique machine learning architectures, comprising the ANN, SVM, Decision Tree, Random Forest, XGBoost, and Logistic Regression, closely analyzing their key performance measures essential for measuring their effectiveness, including accuracy, precision, recall, F1-score, and AUC-ROC, hence providing an extensive view of their attributes and feasible uses across different sectors. The matter of class imbalance was diligently approached through the execution of the Synthetic Minority Over-sampling Technique (SMOTE), and the model's performance was scruti-nized using 5-Fold Cross-Validation to affirm both consistency and relevance of the conclusions.
Achieving a stellar accuracy of 99.55%, an impeccable recall of 100%, and a noteworthy F1-score of 99.54%, the ANN model is hailed as the premier model for tuberculosis forecasting. The Random Forest and SVM models also illustrated robust predictive performance, evidenced by elevated accuracy and AUC-ROC scores. In a contrasting view, Logistic Regression provided the least successful outcomes, suggesting that linear models could be inadequately matched to the attributes of this dataset. This study elucidates the efficacy of machine learning methodologies in the diagnostics of TB and emphasizes the critical role of symptom analysis and data-informed decision-making within the healthcare sector.
-
References
- V. Škodrić-Trifunović, “Tuberculosis: Old and new disease,” Galen. Med. J., vol. 1, no. 4, pp. 40–47, 2022, https://doi.org/10.5937/Galmed2204042S.
- P. Etienne, “Does ‘Latent Tuberculosis Infection (LTBI)’ Really Exist? Genealogy of a Medical Nosology,” J. Tuberc. Res., vol. 09, no. 03, pp. 197–204, 2021, https://doi.org/10.4236/jtr.2021.93018.
- S. Mukherjee, S. Perveen, A. Negi, and R. Sharma, “Evolution of tuberculosis diagnostics: From molecular strategies to nanodiagnostics,” Tuberculosis, vol. 140, p. 102340, 2023, https://doi.org/10.1016/j.tube.2023.102340.
- I. Pavan Kumar, R. Mahaveerakannan, K. Praveen Kumar, I. Basu, T. C. Anil Kumar, and M. Choche, “A Design of Disease Diagnosis based Smart Healthcare Model using Deep Learning Technique,” Proc. Int. Conf. Electron. Renew. Syst. ICEARS 2022, pp. 1444–1449, 2022, https://doi.org/10.1109/ICEARS53579.2022.9752063.
- I. N. Al-Asady and J. F. Ali, “Review Article: Virulence Factors of Mycobacterium Tuberculosis,” J. Res. Appl. Sci. Biotechnol., vol. 2, no. 3, pp. 221–237, 2023, https://doi.org/10.55544/jrasb.2.3.31.
- M. Masand, P. Kumar Sharma, V. M. Balaramnavar, and D. Mathpal, “Tuberculosis: Current Progress in Drug Targets, Potential Drugs and Therapeutic Impact,” Curr. Respir. Med. Rev., vol. 18, no. 3, pp. 165–170, 2022, https://doi.org/10.2174/1573398X18666220503184459.
- T. Zhang et al., “The global, regional, and national burden of tuberculosis in 204 countries and territories, 1990–2019,” J. Infect. Public Health, vol. 16, no. 3, pp. 368–375, 2023, https://doi.org/10.1016/j.jiph.2023.01.014.
- J. Ma et al., “Rapid detection of airborne protein from: Mycobacterium tuberculosis using a biosensor detection system,” Analyst, vol. 147, no. 4, pp. 614–624, 2022, https://doi.org/10.1039/D1AN02104D.
- A. Shirsat, S. Kute, R. Haral, A. Patil, and D. S. A. Ubale, “Tuberculosis Detection Using Chest X-Ray with Deep Learning and Visualization,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 5, pp. 3888–3894, 2023, https://doi.org/10.22214/ijraset.2023.51440.
- P. Karmani, A. A. Chandio, I. A. Korejo, O. W. Samuel, and M. Aborokbah, “Machine learning based tuberculosis (ML-TB) health predictor model: early TB health disease prediction with ML models for prevention in developing countries,” PeerJ Comput. Sci., vol. 10, pp. e2397–e2397, 2024, https://doi.org/10.7717/peerj-cs.2397.
- J. orwa et al., “Comparison of logistic regression with regularized machine learning methods for the prediction of tuberculosis disease in people living with HIV: cross-sectional hospital-based study in Kisumu County, Kenya,” Research Square. 2023, https://doi.org/10.21203/rs.3.rs-3354948/v1.
- G. Landry, R. N. Malumba, F. C. B. Kabutakapua, and B. B. Mangata, “Performance comparison of classical algorithms and deep neural networks for tuberculosis prediction,” J. Techno Nusa Mandiri, vol. 21, no. 2, pp. 126–133, 2024, https://doi.org/10.33480/techno.v21i2.5609 .
- S. soam, “Comparative Study of Keras CNNs for Tuberculosis Detection from Chest X-rays,” Interantional J. Sci. Res. Eng. Manag., vol. 08, no. 05, pp. 1–5, 2024, https://doi.org/10.55041/IJSREM34126.
- T. Varshith, T. S. Koneri, T. S. K. Reddy, and R. P. Singh, “An Ensemble Approach to Tuberculosis Prediction using Shenzhen and Montgomery Datasets,” in 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024, pp. 1–7, https://doi.org/10.1109/ICCCNT61001.2024.10725732.
- J. Chen et al., “LSTM-Based Prediction Model for Tuberculosis Among HIV-Infected Patients Using Structured Electronic Medical Records: A Retrospective Machine Learning Study,” J. Multidiscip. Healthc., vol. 17, pp. 3557–3573, 2024, https://doi.org/10.2147/JMDH.S467877.
- A. Sambarey et al., “Integrative analysis of multimodal patient data identifies personalized predictors of tuberculosis treatment prognosis,” iScience, vol. 27, no. 2, 2024, https://doi.org/10.1016/j.isci.2024.109025.
- R. S. Prasad, R. C. Waghmare, T. B. Pajgade, R. R. Raut, and M. L. Mahajan, “A Comparative Study of Detection of Tuberculosis using Machine Learning & Deep Learning,” in Proceedings of the 17th INDIACom; 2023 10th International Conference on Computing for Sustainable Global Development, INDIACom 2023, 2023, pp. 1217–1221.
- R. Pradhan and K. M. Santosh, “Analyzing Pulmonary Abnormality with Superpixel Based Graph Neural Network in Chest X-Ray,” pp. 97–110, https://doi.org/10.1007/978-3-031-53085-2_9.
- S. Wang, V. Govindaraj, J. M. Górriz, X. Zhang, and Y. Zhang, “Explainable diagnosis of secondary pulmonary tuberculosis by graph rank-based average pooling neural network,” J. Ambient Intell. Humaniz. Comput., pp. 1–14, 2021, https://doi.org/10.1007/s12652-021-02998-0.
- Y.-X. Yu, Z. Qi, L. Xu, and X. Zhou, “Research on Deep Learning-Based Algorithms for Medical Image Characterisation,” 2024,
- S. Ohwo, F. Eze, F. Onu, and M. Julius, “Tuberculosis Dataset for Intelligent and Adaptive Medical Diagnostic System,” vol. V1, 2023,
- M. Zhan et al., “A clinical indicator-based prognostic model predicting treatment outcomes of pulmonary tuberculosis: a prospective cohort study,” BMC Infect. Dis., vol. 23, no. 1, 2023, https://doi.org/10.1186/s12879-023-08053-x.
- B. Mtafya et al., “Systematic assessment of clinical and bacteriological markers for tuberculosis reveals discordance and inaccuracy of symptom-based diagnosis for treatment response monitoring,” Front. Med., vol. 9, 2022, https://doi.org/10.3389/fmed.2022.992451.
- C. Liu, I. Cohen, R. Vishinkin, and H. Haick, “Nanomaterial-Based Sensor Array Signal Processing and Tuberculosis Classification Using Machine Learning,” J. Low Power Electron. Appl., vol. 13, no. 2, p. 39, 2023, https://doi.org/10.3390/jlpea13020039.
- N. Shakhovska and N. Melnykova, “Feature Engineering and Missing Data Imputation Method of Medical Data Analysis,” CEUR Workshop Proc., vol. 3137, pp. 48–57, 2022, https://doi.org/10.32782/cmis/3137-4.
- G. Wei, W. Mu, Y. Song, and J. Dou, “An improved and random synthetic minority oversampling technique for imbalanced data,” Knowledge-Based Syst., vol. 248, p. 108839, 2022, https://doi.org/10.1016/j.knosys.2022.108839.
- S. Maldonado, C. Vairetti, A. Fernandez, and F. Herrera, “FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification,” Pattern Recognit., vol. 124, p. 108511, 2022, https://doi.org/10.1016/j.patcog.2021.108511.
- D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach. Learn., vol. 113, no. 7, pp. 4903–4923, 2024, https://doi.org/10.1007/s10994-022-06296-4.
- B. Zhu and Y. Liu, “General Approximate Cross Validation for Model Selection: Supervised, Semi-supervised and Pairwise Learning,” MM 2021 - Proc. 29th ACM Int. Conf. Multimed., pp. 5281–5289, 2021, https://doi.org/10.1145/3474085.3475649.
- A. A. Khan, “Balanced Split: A new train-test data splitting strategy for imbalanced datasets,” arXiv.org, vol. abs/2212.1, 2022,
- M. K. Hasan et al., “Challenges of deep learning methods for COVID-19 detection using public datasets,” Informatics Med. Unlocked, vol. 30, p. 100945, 2022, https://doi.org/10.1016/j.imu.2022.100945.
- X. Qiu, S. Zheng, J. Yang, G. Yu, and Y. Ye, “Comparing Mycobacterium tuberculosis RNA Accuracy in Various Respiratory Specimens for the Rapid Diagnosis of Pulmonary Tuberculosis,” Infect. Drug Resist., vol. 15, pp. 4195–4202, 2022, https://doi.org/10.2147/IDR.S374826.
- S. H. Mostafaei, J. Tanha, N. Samadi, S. Imanzadeh, and N. Razzaghi-Asl, “A boosting based approach to handle imbalanced data,” 2022 30th Int. Conf. Electr. Eng. ICEE 2022, pp. 295–299, 2022, https://doi.org/10.1109/ICEE55646.2022.9827026 .
- I. Kassam, D. Ilkina, J. Kemp, H. Roble, A. Carter-Langford, and N. Shen, “Patient Perspectives and Preferences for Consent in the Digital Health Context: State-of-the-art Literature Review,” J. Med. Internet Res., vol. 25, p. e42507, 2023, https://doi.org/10.2196/42507
-
Downloads
-
How to Cite
S, S., & S, D. (2025). Tuberculosis prediction: performance analysis of machine learning models for early diagnosis and screening using symptom severity level data. International Journal of Basic and Applied Sciences, 14(1), 435-444. https://doi.org/10.14419/parmkr90
