Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers
Keywords:Breast Cancer Data, Classification, Decision Trees (DT), Logistic Regression, Multi-Layer Perceptron (MLP), and Prediction.
The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree. 10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.
 Cruz JA, Wishart DS, Applications of Machine Learning in Cancer Prediction and Prognosis, Departments of Biological Science and Computing Science, University of Alberta Edmonton,AB, Canada.Vol.2, 2-21 (2006).
 Han J., Kamber M., Data Mining Concepts and Techniques. Morgan Kaufman Publishers, 2001.
 McCarthy et al. Applications of Machine Learning and High ÂDimensional Visualization in Cancer Detection, Diagnosis, and Management. (2004).
 Chaurasia V, Pal S. Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability. International Journal of Computer Science and Mobile Computing. Vol3, 10â€“22 (2014).
 Chang PW, Liou MD, editors. Comparison of three Data Mining techniques with Genetic Algorithm in analysis of Breast Cancerdata.AvailableOnline: http://edoc.ypu.edu.tw:8080/paper/ha/Other/%E5%BC%B5%E5%81%89%E6%96%8C_comparison%20of%20data%20mining%20in%20breast%20cancer.pdf.
 Kharya S. Using Data Mining Techniques for Diagnosis and Prognosis of Cancer Disease. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT);2:55â€“66 (2012).
 Senturk ZK, Kara R. Breast Cancer Diagnosis via Data mining: Performance Analysis Of Seven Different Algorithms. Computer Science & Engineering: An International Journal (CSEIJ); 4:35â€“46 (2014).
 Rajesh K, Anand S. Analysis of SEER Dataset for Breast Cancer Diagnosis using C4.5 Classification Algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 1:72â€“77 (2012).
 Gupta S, Kumar D, Sharma A. Data Mining Classification Techniques Applied For Breast Cancer Diagnosis and Prognosis. Indian Journal of Computer Science and Engineering. 2 (2011).
 Kumar R, Verma R. Classification Algorithms for Data Mining: A Survey. International Journal of Innovations in Engineering and Technology (IJIET) 1:7â€“14 (2012).
 Kesavaraj G, Sukumaran S. A Study on Classification Techniques in Data Mining. 1 4th ICCCNT (2012).
Soundarya M, Balakrishnan R. Survey on Classification Techniques in Data mining. International Journal of Advanced Research in Computer and Communication Engineering Vol.3:7550â€“7552 (2014).
 Li J, Wong L. Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains; 15th European Conference on Machine Learning (ECML) (2004).
 Kumar D, Beniwal S. Genetic Algorithm and Programming Based Classification: A Survey. Journal of Theoretical and Applied Information Technology. 54:48â€“58 (2013).
 Mansuri AM, Verma M, Laxkar P. A Survey of Classifier Designing Using Genetic Programming and Genetic Operators. International Journal of Engineering Research and Reviews (IJERR) Vol. 2:16â€“22 (2014).
 Loh WY. Encyclopedia of Statistics in Quality and Reliability. Ruggeri, Kenett & Faltin, Wiley; Classification and Regression Tree Methods; pp. 315â€“323 (2008).
 Li Y, Zhu J. Analysis of array CGH data for cancer studies using fused quantile regression. Bioinformatics. Vol.23:2470â€“2476 (2007).
-UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/ (5-15) several classification algo.
 Refaeilzadeh P., Tang L., Liu. H. Cross Validation. In Encyclopedia of Database Systems, 532538, Springer, U.S, (2009).
â€œWEKA Data Mining Bookâ€ (n.d.) http://www.cs.waikato.ac.nz/~ml/weka/book.html.
 â€œWEKA 3: Data Mining Software in Javaâ€ (n.d.) Retrieved March 2010 from http://www.cs.waikato.ac.nz/ml/weka/.
 Kusiak A. Decomposition in Data Mining: An Industrial Case Study in IEEE Transactions On Electronics Packaging Manufacturing, Vol. 23, No. 4, 87-97, (2000).
 le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201.D. Aha, D. Kibler Instance-based learning algorithms. Machine Learning. 6:37-66 (1991).
Aha D., Kibler D., Instance-based learning algorithms. Machine Learning. 6:37-66 (1991).
 John G. Cleary, Leonard E. Trigg: K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108-114, (1995).
 Walter H. Delashmit and Michael T. Manry, 2005. Recent Developments in Multilayer Perceptron Neural Networks. Proceedings of the 7th Annual Memphis Area Engineering and Science Conference, MAESC. 699 (2005).
 Leo Breiman, Random Forests, Machine Learning:45 (1):5-32 (2001).
 Kohavi R. The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174-189, (1995).
 Quinlan R., Induction of decision trees. Machine Learning, vol. 1, 81-106, (1986).
 Frank E., Ian H. Witten: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, 144-151, (1998).
 Kohavi R., Scaling Up the Accuracy of NaÃ¯ve-Bayes Classifiers: a Decision Tree Hybrid. In Proceedings of KDD-96, Portland, USA, 202-207, (1996).
View Full Article:
How to Cite
LicenseAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under aÂ Creative Commons Attribution Licensethat allows others to share the work with an acknowledgement of the work''s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal''s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).