Prediction of Breast Cancer Using Big Data Analytics

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Big data is a phrase which is used to report collection of data that vast in size and still growing exponentially with time. It covers structured unstructured and semi-structured data. Now a day’s big data is widely used in healthcare for prediction of diseases. Breast cancer is one of top cancer that occurs in a woman. It is the second main leading reason for the death of a woman in the United States and in Asian countries. If we identify this disease in early stages there is a better chance for curing. For this experiment, we used K nearest neighbor (KNN) algorithm for finding classification accuracy and it is implemented on R tool. We consider Wisconsin breast cancer (original) dataset taken from UCI machine learning repository.

     

     


  • Keywords


    Big data; Healthcare; Breast cancer; KNN; Wisconsin dataset.

  • References


      [1] K. Shailaja et al., “Applications of Big Data Analytics: A Systematic Review”, International Journal of Engineering Research in Computer Science and Engineering, volume 5, 2018.

      [2] American Cancer Society. Breast Cancer Facts & Figures 2005-2006. Atlanta: American Cancer Society, Inc. http://www.cancer.org/.

      [3] Ms. Shweta Srivastava et al., “A Review Paper on Feature Selection Methodologies and Their Applications”, International Journal of Engineering Research and Development, Volume 7, PP. 57-61, 2013.

      [4] Abdur Rahman Onik et al., “An Analytical Comparison on Filter Feature Extraction Method in Data Mining using J48 Classifier, International Journal of Computer Applications, volume 13, 2015.

      [5] Mitushi Modi et al., “An evaluation of filter and wrapper methods for feature selection in classification”, International Journal of Engineering Development and Research, volume 2, 2014.

      [6] Syed Imran Ali et al., “A Feature Subset Selection Method based on Symmetric Uncertainty and Ant Colony Optimization”, International Journal of Computer Applications, volume 11, 2012.

      [7] Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasets”, Journal of Engineering Science and Technology Review, volume 6, pp.154-162, 2017.

      [8] Bangsuk Jantawan et al., “A Comparison of Filter and Wrapper Approaches with Data Mining Techniques for Categorical Variables Selection”, International Journal of Innovative Research in Computer and Communication Engineering, Volume 2, 2014.

      [9] MA Jabbar, “Prediction of heart disease using k-nearest neighbor and particle swarm optimization”, Biomedical Research , volume 28, 2017.

      [10] M Akhil Jabbar, et al., “Heart disease classification using nearest neighbor classifier with feature subset selection”, Anale. Seria Informatica, volume 11 , 2013.

      [11] M Akhil Jabbar et al., Classification of heart disease using k-nearest neighbor and genetic algorithm, Procedia Technology, volume 10, 85-94, 2013.

      [12] K .P Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.

      [13] A.Priyanga, “Effectiveness of Data Mining - based Cancer Prediction System”, International Journal of Computer Applications, volume 10, 2013.

      [14] .Animesh et al., “Study and analysis of Breast cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms”, International Journal of Computer Applications, vol.2, 2016.

      [15] K.Sivakami et al., “Mining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Model”, International Journal of Scientific Engineering and Applied Science, volume 1, 2015.

      [16] G. Sumalatha et al., “A Study on Early Prevention and Detection of Breast Cancer using Data Mining Techniques”, International Journal of Innovative Research in Computer and Communication Engineering, volume 5,2017.

      [17] D.R Umesh et al., “Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approach”, International Journal of Computer Applications, volume 7, 2016.

      [18] Hiba Asri, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis”, The 6th International Symposium on Frontiers in Ambient and Mobile Systems, pp.1064-1069.

      [19] Asuncion, A. & Newman, D.J. (2007). UCI Machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, School of Information and Computer Science.

      [20] https://www.r-project.org/

      [21] Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selection Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Data sets”, Journal of Engineering Science and Technology Review, volume 6, pp. 154-162, 2017.


 

View

Download

Article ID: 20480
 
DOI: 10.14419/ijet.v7i4.6.20480




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.