Heart Disease Prediction Model Using Naïve Bayes Algorithm and Machine Learning Techniques

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    These days, heart disease comes to be one of the major health problems which have affected the lives of people in the whole world. Moreover, death due to heart disease is increasing day by day. So the heart disease prediction systems play an important role in the prevention of heart problems. Where these prediction systems assist doctors in making the right decision to diagnose heart disease easily. The existing prediction systems suffering from the high dimensionality problem of selected features that increase the prediction time and decrease the performance accuracy of the prediction due to many redundant or irrelevant features. Therefore, this paper aims to provide a solution of the dimensionality problem by proposing a new mixed model for heart disease prediction based on (Naïve Bayes method, and machine learning classifiers).

    In this study, we proposed a new heart disease prediction model (NB-SKDR) based on the Naïve Bayes algorithm (NB) and several machine learning techniques including Support Vector Machine, K-Nearest Neighbors, Decision Tree, and Random Forest. This prediction model consists of three main phases which include: preprocessing, feature selection, and classification. The main objective of this proposed model is to improve the performance of the prediction system and finding the best subset of features. This proposed approach uses the Naïve Bayes technique based on the Bayes theorem to select the best subset of features for the next classification phase, also to handle the high dimensionality problem by avoiding unnecessary features and select only the important ones in an attempt to improve the efficiency and accuracy of classifiers. This method is able to reduce the number of features from 13 to 6 which are (age, gender, blood pressure, fasting blood sugar, cholesterol, exercise induce engine) by determining the dependency between a set of attributes. The dependent attributes are the attributes in which an attribute depends on the other attribute in deciding the value of the class attribute. The dependency between attributes is measured by the conditional probability, which can be easily computed by Bayes theorem. Moreover, in the classification phase, the proposed system uses different classification algorithms such as (DT Decision Tree, RF Random Forest, SVM Support Vector machine, KNN Nearest Neighbors) as a classifiers for predicting whether a patient has heart disease or not. The model is trained and evaluated using the Cleveland Heart Disease database, which contains 13 features and 303 samples.

    Different algorithms use different rules for producing different representations of knowledge. So, the selection of algorithms to build our model is based on their performance. In this work, we applied and compared several classification algorithms which are (DT, SVM, RF, and KNN) to identify the best-suited algorithm to achieve high accuracy in the prediction of heart disease. After combining the Naive Bayes method with each one of these previous classifiers the performance of these combines algorithms is evaluated by different performance metrics such as (Specificity, Sensitivity, and Accuracy). Where the experimental results show that out of these four classification models, the combination between the Naive Bayes feature selection approach and the SVM RBF classifier can predict heart disease with the highest accuracy of 98%. Finally, the proposed approach is compared with another two systems which developed based on two different approaches in the feature selection step. The first system, based on the Genetic Algorithm (GA) technique, and the second uses the Principal Component Analysis (PCA) technique. Consequently, the comparison proved that the Naive Bayes selection approach of the proposed system is better than the GA and PCA approach in terms of prediction accuracy.




  • Keywords

    Heart Disease; Naïve Bayes; Bayes Theorem; Feature Selection; Prediction; Accuracy.

  • References

      [1] Brendan, M., & Reilly, M. D. (2018). The Best Medical Care in the World. The new England Journals of medicine, pp. 684–688.

      [2] Yang, J. J. et al. (2015). Emerging information technologies for enhanced healthcare. Computers in Industry. vol. 69, pp. 3–11.

      [3] Jamse et al. (2018). Design and Implementation of a Hospital Database Management System (HDMS) for Medical Doctors. International Journal of Computer Theory and Engineering, 10(1), pp.1–6.

      [4] Razeghi, R., & Nasiripour, A. A. (2014). An investigation of factors affecting Electronic

      [5] Rajkumar, A., & Reena, G. (2010). Diagnosis of heart disease using datamining algorithm. Global journal of computer science and technology, 10(10), pp. 38–43.

      [6] Vaddella, D.,Sruthi, C., Chowdary, B., Subbareddy, R., & Somula, G. (2019). Prediction of heart disease using machine learning techniques. International Journal of Recent Technology and Engineering, 8(2 Special Issue 4), pp. 612–616.

      [7] Iftikhar, S., Fatima, K., Rehman, A., Almazyad, A. S., & Saba, T.( 2017). An evolution based hybrid approach for heart diseases classification and associated risk factors identification, Biomedical Research (India), 28(8), 3451–3455.

      [8] Santosh, B., Reddy, D., Vardhan, M., & Subhani, S. (2019). Heart Disease Prediction with PCA and SVM, International Journal of Engineering and Advanced Technology (IJEAT), (4), pp. 2249–8958.

      [9] Kaur, G., Sharma, Anshu, and Sharma, Anurag. (2019). Heart Disease Prediction using KNN classification approach. international Journal of Computer Sciences and Engineering, 7(5) ,pp. 416–420.

      [10] Ghorbani, R. & Ghousi, R. (2019). Predictive data mining approaches in medical diagnosis: A review of some diseases prediction. International Journal of Data and Network Science, vol. 3, pp. 47–70.

      [11] Shukla, N., & Arora, M. (2016). Prediction of diabetes using neural network & random forest tree. International Journal of Computer Sciences and Engineering, vol. 4, pp. 101-104.

      [12] Pouriyeh, S., Vahid, S., Sannion, G., Pietro, G. D., Arabnia, H., & Gutierrez, J. (2017). A comprehensive investigation and comparison of Machine Learning Techniques in the domain of heart disease. Proceedings-IEEE Symposium on Computers and Communications, (Iscc), pp. 204–207.

      [13] Abdar, M., Kalhori, S. R., Sutikno, T., Ibnu Subroto, I. M., & Arji, G. (2015). Comparing performance of data mining algorithms in prediction heart diseses. International Journal of Electrical and Computer Engineering, 5(6), pp. 1569–1576.

      [14] Ming, D., Wang, S. M., & Gong, G. (2011). Research on decision tree algorithm based on information entropy, Advanced Materials Research, vol. 267.

      [15] Nowozin, S. (2012). Improved information gain estimates for decision tree induction, Proceedings of the 29th International Conference on Machine Learning, ICML 2012, 1, pp. 297–304.

      [16] Cotrtes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), pp. 273-297.

      [17] Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. IEEE conference on computer vision and pattern recognition, pp. 1-8. IEEE.

      [18] Patle, A., & Chouhan, D. S. (2013). SVM kernel functions for classification, 2013 International Conference on Advances in Technology and Engineering, ICATE 2013.

      [19] Shiliang, S., & Rongqing, H. (2010). An adaptive k-nearest neighbor algorithm. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 91–94.

      [20] Abu Alfeilat, H. A. et al. (2019). Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data, 7(4), pp. 221–248.

      [21] Wiener, A., & Liaw, M. (2003). Classification and Regression by random Forest. International Journal of Innovative Research in Science, Engineering and Technology, pp. 18–22.

      [22] Ho, T. K. (1998). The Random Subspace Method for Constructing Decision Forests, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 20(8), pp. 832–844.

      [23] Zhang, H. (2005). Exploring conditions for the optimality of naïve bayes, International Journal of Pattern Recognition and Artificial Intelligence, 19(2), pp. 183–198.

      [24] Alasadi, S. A. and Bhaya, W. S. (2017) ‘Review of data preprocessing techniques in data mining’, Journal of Engineering and Applied Sciences, 12(16), pp. 4102–4107. doi: 10.3923/jeasci.2017.4102.4107.

      [25] Graham, J. W. (2009). Missing Data Analysis: Making It Work in the Real World, Annual Review of Psychology, 60(1), pp. 549–576.

      [26] Abraham, R., Simha, J. B., & Iyengar, S. S. (2006). A comparative analysis of discretization methods for medical datamining with Naïve Bayesian classifie. Proceedings-9th International Conference on Information Technology, ICIT 2006, pp. 235–236.

      [27] Purpura, A., Masiero, C., Silvello, G., & Susto, G. (2019). Feature selection for emotion classification, CEUR Workshop Proceedings, vol. 2441, pp. 47–48.

      [28] Xue, B., Zhang, M., & Browne, W. N. (2012). Particle Swarm Optimization for Feature Selection in Classification : A Multi-Objective Approach, IEEE Transactions on Cybernetics, pp. 1–16.

      [29] Marlina, L., lim, M. & Siahaan, A. P. (2016). Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms), International Journal of Engineering Trends and Technology, 38(7), pp. 380–383.

      [30] Konieczny, R. & Idczak, R. (2016). Supervised Machine Learning: A Review of Classification Techniques, Hyperfine Interactions, 237(1), pp. 1–8.




Article ID: 31310
DOI: 10.14419/ijet.v10i1.31310

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.