Prediction of Breast Cancer Using Big Data Analytics
Keywords:Big data, Healthcare, Breast cancer, KNN, Wisconsin dataset.
Big data is a phrase which is used to report collection of data that vast in size and still growing exponentially with time. It covers structured unstructured and semi-structured data. Now a dayâ€™s big data is widely used in healthcare for prediction of diseases. Breast cancer is one of top cancer that occurs in a woman. It is the second main leading reason for the death of a woman in the United States and in Asian countries. If we identify this disease in early stages there is a better chance for curing. For this experiment, we used K nearest neighbor (KNN) algorithm for finding classification accuracy and it is implemented on R tool. We consider Wisconsin breast cancer (original) dataset taken from UCI machine learning repository.
 K. Shailaja et al., â€œApplications of Big Data Analytics: A Systematic Reviewâ€, International Journal of Engineering Research in Computer Science and Engineering, volume 5, 2018.
 American Cancer Society. Breast Cancer Facts & Figures 2005-2006. Atlanta: American Cancer Society, Inc. http://www.cancer.org/.
 Ms. Shweta Srivastava et al., â€œA Review Paper on Feature Selection Methodologies and Their Applicationsâ€, International Journal of Engineering Research and Development, Volume 7, PP. 57-61, 2013.
 Abdur Rahman Onik et al., â€œAn Analytical Comparison on Filter Feature Extraction Method in Data Mining using J48 Classifier, International Journal of Computer Applications, volume 13, 2015.
 Mitushi Modi et al., â€œAn evaluation of filter and wrapper methods for feature selection in classificationâ€, International Journal of Engineering Development and Research, volume 2, 2014.
 Syed Imran Ali et al., â€œA Feature Subset Selection Method based on Symmetric Uncertainty and Ant Colony Optimizationâ€, International Journal of Computer Applications, volume 11, 2012.
 Sai Prasad Potharaju et al., â€œA Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasetsâ€, Journal of Engineering Science and Technology Review, volume 6, pp.154-162, 2017.
 Bangsuk Jantawan et al., â€œA Comparison of Filter and Wrapper Approaches with Data Mining Techniques for Categorical Variables Selectionâ€, International Journal of Innovative Research in Computer and Communication Engineering, Volume 2, 2014.
 MA Jabbar, â€œPrediction of heart disease using k-nearest neighbor and particle swarm optimizationâ€, Biomedical Research , volume 28, 2017.
 M Akhil Jabbar, et al., â€œHeart disease classification using nearest neighbor classifier with feature subset selectionâ€, Anale. Seria Informatica, volume 11 , 2013.
 M Akhil Jabbar et al., Classification of heart disease using k-nearest neighbor and genetic algorithm, Procedia Technology, volume 10, 85-94, 2013.
 K .P Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.
 A.Priyanga, â€œEffectiveness of Data Mining - based Cancer Prediction Systemâ€, International Journal of Computer Applications, volume 10, 2013.
 .Animesh et al., â€œStudy and analysis of Breast cancer Cell Detection using NaÃ¯ve Bayes, SVM and Ensemble Algorithmsâ€, International Journal of Computer Applications, vol.2, 2016.
 K.Sivakami et al., â€œMining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Modelâ€, International Journal of Scientific Engineering and Applied Science, volume 1, 2015.
 G. Sumalatha et al., â€œA Study on Early Prevention and Detection of Breast Cancer using Data Mining Techniquesâ€, International Journal of Innovative Research in Computer and Communication Engineering, volume 5,2017.
 D.R Umesh et al., â€œBig Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approachâ€, International Journal of Computer Applications, volume 7, 2016.
 Hiba Asri, â€œUsing Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosisâ€, The 6th International Symposium on Frontiers in Ambient and Mobile Systems, pp.1064-1069.
 Asuncion, A. & Newman, D.J. (2007). UCI Machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, School of Information and Computer Science.
 Sai Prasad Potharaju et al., â€œA Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Data setsâ€, Journal of Engineering Science and Technology Review, volume 6, pp. 154-162, 2017.