Heart Disease Prediction Model Using NaÃ¯ve Bayes Algorithm and Machine Learning Techniques
Keywords:Heart Disease, NaÃ¯ve Bayes, Bayes Theorem, Feature Selection, Prediction, Accuracy.
These days, heart disease comes to be one of the major health problems which have affected the lives of people in the whole world. Moreover, death due to heart disease is increasing day by day. So the heart disease prediction systems play an important role in the prevention of heart problems. Where these prediction systems assist doctors in making the right decision to diagnose heart disease easily. The existing prediction systems suffering from the high dimensionality problem of selected features that increase the prediction time and decrease the performance accuracy of the prediction due to many redundant or irrelevant features. Therefore, this paper aims to provide a solution of the dimensionality problem by proposing a new mixed model for heart disease prediction based on (NaÃ¯ve Bayes method, and machine learning classifiers).
In this study, we proposed a new heart disease prediction model (NB-SKDR) based on the NaÃ¯ve Bayes algorithm (NB) and several machine learning techniques including Support Vector Machine, K-Nearest Neighbors, Decision Tree, and Random Forest. This prediction model consists of three main phases which include: preprocessing, feature selection, and classification. The main objective of this proposed model is to improve the performance of the prediction system and finding the best subset of features. This proposed approach uses the NaÃ¯ve Bayes technique based on the Bayes theorem to select the best subset of features for the next classification phase, also to handle the high dimensionality problem by avoiding unnecessary features and select only the important ones in an attempt to improve the efficiency and accuracy of classifiers. This method is able to reduce the number of features from 13 to 6 which are (age, gender, blood pressure, fasting blood sugar, cholesterol, exercise induce engine) by determining the dependency between a set of attributes. The dependent attributes are the attributes in which an attribute depends on the other attribute in deciding the value of the class attribute. The dependency between attributes is measured by the conditional probability, which can be easily computed by Bayes theorem. Moreover, in the classification phase, the proposed system uses different classification algorithms such as (DT Decision Tree, RF Random Forest, SVM Support Vector machine, KNN Nearest Neighbors) as a classifiers for predicting whether a patient has heart disease or not. The model is trained and evaluated using the Cleveland Heart Disease database, which contains 13 features and 303 samples.
Different algorithms use different rules for producing different representations of knowledge. So, the selection of algorithms to build our model is based on their performance. In this work, we applied and compared several classification algorithms which are (DT, SVM, RF, and KNN) to identify the best-suited algorithm to achieve high accuracy in the prediction of heart disease. After combining the Naive Bayes method with each one of these previous classifiers the performance of these combines algorithms is evaluated by different performance metrics such as (Specificity, Sensitivity, and Accuracy). Where the experimental results show that out of these four classification models, the combination between the Naive Bayes feature selection approach and the SVM RBF classifier can predict heart disease with the highest accuracy of 98%. Finally, the proposed approach is compared with another two systems which developed based on two different approaches in the feature selection step. The first system, based on the Genetic Algorithm (GA) technique, and the second uses the Principal Component Analysis (PCA) technique. Consequently, the comparison proved that the Naive Bayes selection approach of the proposed system is better than the GA and PCA approach in terms of prediction accuracy.
 Brendan, M., & Reilly, M. D. (2018). The Best Medical Care in the World. The new England Journals of medicine, pp. 684â€“688.
 Yang, J. J. et al. (2015). Emerging information technologies for enhanced healthcare. Computers in Industry. vol. 69, pp. 3â€“11.
 Jamse et al. (2018). Design and Implementation of a Hospital Database Management System (HDMS) for Medical Doctors. International Journal of Computer Theory and Engineering, 10(1), pp.1â€“6.
 Razeghi, R., & Nasiripour, A. A. (2014). An investigation of factors affecting Electronic
 Rajkumar, A., & Reena, G. (2010). Diagnosis of heart disease using datamining algorithm. Global journal of computer science and technology, 10(10), pp. 38â€“43.
 Vaddella, D.,Sruthi, C., Chowdary, B., Subbareddy, R., & Somula, G. (2019). Prediction of heart disease using machine learning techniques. International Journal of Recent Technology and Engineering, 8(2 Special Issue 4), pp. 612â€“616.
 Iftikhar, S., Fatima, K., Rehman, A., Almazyad, A. S., & Saba, T.( 2017). An evolution based hybrid approach for heart diseases classification and associated risk factors identification, Biomedical Research (India), 28(8), 3451â€“3455.
 Santosh, B., Reddy, D., Vardhan, M., & Subhani, S. (2019). Heart Disease Prediction with PCA and SVM, International Journal of Engineering and Advanced Technology (IJEAT), (4), pp. 2249â€“8958.
 Kaur, G., Sharma, Anshu, and Sharma, Anurag. (2019). Heart Disease Prediction using KNN classification approach. international Journal of Computer Sciences and Engineering, 7(5) ,pp. 416â€“420.
 Ghorbani, R. & Ghousi, R. (2019). Predictive data mining approaches in medical diagnosis: A review of some diseases prediction. International Journal of Data and Network Science, vol. 3, pp. 47â€“70.
 Shukla, N., & Arora, M. (2016). Prediction of diabetes using neural network & random forest tree. International Journal of Computer Sciences and Engineering, vol. 4, pp. 101-104.
 Pouriyeh, S., Vahid, S., Sannion, G., Pietro, G. D., Arabnia, H., & Gutierrez, J. (2017). A comprehensive investigation and comparison of Machine Learning Techniques in the domain of heart disease. Proceedings-IEEE Symposium on Computers and Communications, (Iscc), pp. 204â€“207.
 Abdar, M., Kalhori, S. R., Sutikno, T., Ibnu Subroto, I. M., & Arji, G. (2015). Comparing performance of data mining algorithms in prediction heart diseses. International Journal of Electrical and Computer Engineering, 5(6), pp. 1569â€“1576.
 Ming, D., Wang, S. M., & Gong, G. (2011). Research on decision tree algorithm based on information entropy, Advanced Materials Research, vol. 267.
 Nowozin, S. (2012). Improved information gain estimates for decision tree induction, Proceedings of the 29th International Conference on Machine Learning, ICML 2012, 1, pp. 297â€“304.
 Cotrtes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), pp. 273-297.
 Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. IEEE conference on computer vision and pattern recognition, pp. 1-8. IEEE.
 Patle, A., & Chouhan, D. S. (2013). SVM kernel functions for classification, 2013 International Conference on Advances in Technology and Engineering, ICATE 2013.
 Shiliang, S., & Rongqing, H. (2010). An adaptive k-nearest neighbor algorithm. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 91â€“94.
 Abu Alfeilat, H. A. et al. (2019). Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data, 7(4), pp. 221â€“248.
 Wiener, A., & Liaw, M. (2003). Classification and Regression by random Forest. International Journal of Innovative Research in Science, Engineering and Technology, pp. 18â€“22.
 Ho, T. K. (1998). The Random Subspace Method for Constructing Decision Forests, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 20(8), pp. 832â€“844.
 Zhang, H. (2005). Exploring conditions for the optimality of naÃ¯ve bayes, International Journal of Pattern Recognition and Artificial Intelligence, 19(2), pp. 183â€“198.
 Alasadi, S. A. and Bhaya, W. S. (2017) â€˜Review of data preprocessing techniques in data miningâ€™, Journal of Engineering and Applied Sciences, 12(16), pp. 4102â€“4107. doi: 10.3923/jeasci.2017.4102.4107.
 Graham, J. W. (2009). Missing Data Analysis: Making It Work in the Real World, Annual Review of Psychology, 60(1), pp. 549â€“576.
 Abraham, R., Simha, J. B., & Iyengar, S. S. (2006). A comparative analysis of discretization methods for medical datamining with NaÃ¯ve Bayesian classifie. Proceedings-9th International Conference on Information Technology, ICIT 2006, pp. 235â€“236.
 Purpura, A., Masiero, C., Silvello, G., & Susto, G. (2019). Feature selection for emotion classification, CEUR Workshop Proceedings, vol. 2441, pp. 47â€“48.
 Xue, B., Zhang, M., & Browne, W. N. (2012). Particle Swarm Optimization for Feature Selection in Classification : A Multi-Objective Approach, IEEE Transactions on Cybernetics, pp. 1â€“16.
 Marlina, L., lim, M. & Siahaan, A. P. (2016). Data Mining Classification Comparison (NaÃ¯ve Bayes and C4.5 Algorithms), International Journal of Engineering Trends and Technology, 38(7), pp. 380â€“383.
 Konieczny, R. & Idczak, R. (2016). Supervised Machine Learning: A Review of Classification Techniques, Hyperfine Interactions, 237(1), pp. 1â€“8.
View Full Article:
How to Cite
LicenseAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under aÂ Creative Commons Attribution Licensethat allows others to share the work with an acknowledgement of the work''s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal''s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).