Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree.  10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.

     

     


  • Keywords


    Breast Cancer Data, Classification, Decision Trees (DT), Logistic Regression, Multi-Layer Perceptron (MLP), and Prediction.

  • References


      [1] Cruz JA, Wishart DS, Applications of Machine Learning in Cancer Prediction and Prognosis, Departments of Biological Science and Computing Science, University of Alberta Edmonton,AB, Canada.Vol.2, 2-21 (2006).

      [2] Han J., Kamber M., Data Mining Concepts and Techniques. Morgan Kaufman Publishers, 2001.

      [3] McCarthy et al. Applications of Machine Learning and High ­Dimensional Visualization in Cancer Detection, Diagnosis, and Management. (2004).

      [4] Chaurasia V, Pal S. Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability. International Journal of Computer Science and Mobile Computing. Vol3, 10–22 (2014).

      [5] Chang PW, Liou MD, editors. Comparison of three Data Mining techniques with Genetic Algorithm in analysis of Breast Cancerdata.AvailableOnline: http://edoc.ypu.edu.tw:8080/paper/ha/Other/%E5%BC%B5%E5%81%89%E6%96%8C_comparison%20of%20data%20mining%20in%20breast%20cancer.pdf.

      [6] Kharya S. Using Data Mining Techniques for Diagnosis and Prognosis of Cancer Disease. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT);2:55–66 (2012).

      [7] Senturk ZK, Kara R. Breast Cancer Diagnosis via Data mining: Performance Analysis Of Seven Different Algorithms. Computer Science & Engineering: An International Journal (CSEIJ); 4:35–46 (2014).

      [8] Rajesh K, Anand S. Analysis of SEER Dataset for Breast Cancer Diagnosis using C4.5 Classification Algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 1:72–77 (2012).

      [9] Gupta S, Kumar D, Sharma A. Data Mining Classification Techniques Applied For Breast Cancer Diagnosis and Prognosis. Indian Journal of Computer Science and Engineering. 2 (2011).

      [10] Kumar R, Verma R. Classification Algorithms for Data Mining: A Survey. International Journal of Innovations in Engineering and Technology (IJIET) 1:7–14 (2012).

      [11] Kesavaraj G, Sukumaran S. A Study on Classification Techniques in Data Mining. 1 4th ICCCNT (2012).

      [12]Soundarya M, Balakrishnan R. Survey on Classification Techniques in Data mining. International Journal of Advanced Research in Computer and Communication Engineering Vol.3:7550–7552 (2014).

      [13] Li J, Wong L. Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains; 15th European Conference on Machine Learning (ECML) (2004).

      [14] Kumar D, Beniwal S. Genetic Algorithm and Programming Based Classification: A Survey. Journal of Theoretical and Applied Information Technology. 54:48–58 (2013).

      [15] Mansuri AM, Verma M, Laxkar P. A Survey of Classifier Designing Using Genetic Programming and Genetic Operators. International Journal of Engineering Research and Reviews (IJERR) Vol. 2:16–22 (2014).

      [16] Loh WY. Encyclopedia of Statistics in Quality and Reliability. Ruggeri, Kenett & Faltin, Wiley; Classification and Regression Tree Methods; pp. 315–323 (2008).

      [17] Li Y, Zhu J. Analysis of array CGH data for cancer studies using fused quantile regression. Bioinformatics. Vol.23:2470–2476 (2007).

      [18]-UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/ (5-15) several classification algo.

      [19] Refaeilzadeh P., Tang L., Liu. H. Cross Validation. In Encyclopedia of Database Systems, 532538, Springer, U.S, (2009).

      [20]“WEKA Data Mining Book” (n.d.) http://www.cs.waikato.ac.nz/~ml/weka/book.html.

      [21] “WEKA 3: Data Mining Software in Java” (n.d.) Retrieved March 2010 from http://www.cs.waikato.ac.nz/ml/weka/.

      [22] Kusiak A. Decomposition in Data Mining: An Industrial Case Study in IEEE Transactions On Electronics Packaging Manufacturing, Vol. 23, No. 4, 87-97, (2000).

      [23] le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201.D. Aha, D. Kibler Instance-based learning algorithms. Machine Learning. 6:37-66 (1991).

      [24]Aha D., Kibler D., Instance-based learning algorithms. Machine Learning. 6:37-66 (1991).

      [25] John G. Cleary, Leonard E. Trigg: K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108-114, (1995).

      [26] Walter H. Delashmit and Michael T. Manry, 2005. Recent Developments in Multilayer Perceptron Neural Networks. Proceedings of the 7th Annual Memphis Area Engineering and Science Conference, MAESC. 699 (2005).

      [27] Leo Breiman, Random Forests, Machine Learning:45 (1):5-32 (2001).

      [28] Kohavi R. The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174-189, (1995).

      [29] Quinlan R., Induction of decision trees. Machine Learning, vol. 1, 81-106, (1986).

      [30] Frank E., Ian H. Witten: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, 144-151, (1998).

      [31] Kohavi R., Scaling Up the Accuracy of Naïve-Bayes Classifiers: a Decision Tree Hybrid. In Proceedings of KDD-96, Portland, USA, 202-207, (1996).


 

View

Download

Article ID: 22115
 
DOI: 10.14419/ijet.v7i4.20.22115




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.