Prediction of Breast Cancer Using Big Data Analytics

  • Authors

    • K. Shailaja
    • B. Seetharamulu
    • M. A. Jabbar
    https://doi.org/10.14419/ijet.v7i4.6.20480

    Received date: September 29, 2018

    Accepted date: September 29, 2018

    Published date: September 25, 2018

  • Big data, Healthcare, Breast cancer, KNN, Wisconsin dataset.
  • Abstract

    Big data is a phrase which is used to report collection of data that vast in size and still growing exponentially with time. It covers structured unstructured and semi-structured data. Now a day’s big data is widely used in healthcare for prediction of diseases. Breast cancer is one of top cancer that occurs in a woman. It is the second main leading reason for the death of a woman in the United States and in Asian countries. If we identify this disease in early stages there is a better chance for curing. For this experiment, we used K nearest neighbor (KNN) algorithm for finding classification accuracy and it is implemented on R tool. We consider Wisconsin breast cancer (original) dataset taken from UCI machine learning repository.

  • References

    1. K. Shailaja et al., “Applications of Big Data Analytics: A Systemat-ic Review”, International Journal of Engineering Research in Com-puter Science and Engineering, volume 5, 2018.
    2. American Cancer Society. Breast Cancer Facts & Figures 2005-2006. Atlanta: American Cancer Society, Inc. http://www.cancer.org/.
    3. Ms. Shweta Srivastava et al., “A Review Paper on Feature Selection Methodologies and Their Applications”, International Journal of Engineering Research and Development, Volume 7, PP. 57-61, 2013.
    4. Abdur Rahman Onik et al., “An Analytical Comparison on Filter Feature Extraction Method in Data Mining using J48 Classifier, In-ternational Journal of Computer Applications, volume 13, 2015.
    5. Mitushi Modi et al., “An evaluation of filter and wrapper methods for feature selection in classification”, International Journal of En-gineering Development and Research, volume 2, 2014.
    6. Syed Imran Ali et al., “A Feature Subset Selection Method based on Symmetric Uncertainty and Ant Colony Optimization”, Interna-tional Journal of Computer Applications, volume 11, 2012.
    7. Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selec-tion Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Datasets”, Journal of Engineer-ing Science and Technology Review, volume 6, pp.154-162, 2017.
    8. Bangsuk Jantawan et al., “A Comparison of Filter and Wrapper Approaches with Data Mining Techniques for Categorical Variables Selection”, International Journal of Innovative Research in Com-puter and Communication Engineering, Volume 2, 2014.
    9. MA Jabbar, “Prediction of heart disease using k-nearest neighbor and particle swarm optimization”, Biomedical Research , volume 28, 2017.
    10. M Akhil Jabbar, et al., “Heart disease classification using nearest neighbor classifier with feature subset selection”, Anale. Seria In-formatica, volume 11 , 2013.
    11. M Akhil Jabbar et al., Classification of heart disease using k-nearest neighbor and genetic algorithm, Procedia Technology, volume 10, 85-94, 2013.
    12. K .P Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.
    13. A.Priyanga, “Effectiveness of Data Mining - based Cancer Predic-tion System”, International Journal of Computer Applications, vol-ume 10, 2013.
    14. .Animesh et al., “Study and analysis of Breast cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms”, International Journal of Computer Applications, vol.2, 2016.
    15. K.Sivakami et al., “Mining Big Data: Breast Cancer Prediction us-ing DT - SVM Hybrid Model”, International Journal of Scientific Engineering and Applied Science, volume 1, 2015.
    16. G. Sumalatha et al., “A Study on Early Prevention and Detection of Breast Cancer using Data Mining Techniques”, International Journal of Innovative Research in Computer and Communication Engineer-ing, volume 5,2017.
    17. D.R Umesh et al., “Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approach”, Inter-national Journal of Computer Applications, volume 7, 2016.
    18. Hiba Asri, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis”, The 6th International Symposium on Frontiers in Ambient and Mobile Systems, pp.1064-1069.
    19. Asuncion, A. & Newman, D.J. (2007). UCI Machine learning repos-itory, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, School of Information and Computer Science.
    20. https://www.r-project.org/
    21. Sai Prasad Potharaju et al., “A Novel M-Cluster of Feature Selec-tion Approach Based on Symmetrical Uncertainty for Increasing Classification Accuracy of Medical Data sets”, Journal of Engineer-ing Science and Technology Review, volume 6, pp. 154-162, 2017.
  • Downloads

  • How to Cite

    Shailaja, K., Seetharamulu, B., & A. Jabbar, M. (2018). Prediction of Breast Cancer Using Big Data Analytics. International Journal of Engineering and Technology, 7(4.6), 223-226. https://doi.org/10.14419/ijet.v7i4.6.20480