LDT-MRF: Log decision tree and map reduce framework to clinical big data classification

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    The growth of the data is enormous in the current scenario of the developing information technology and performing the data classification is complex both in time and information extraction. Moreover, there are uncertainties in performing the big data classification that are associated with the unbalanced datasets. In order to overcome the issues, a novel method of big data classification is introduced in this paper. The novel method, Log Decision Tree and Map Reduce Framework (LDT-MRF) uses the Log Decision Tree (LDT) and the Map Reduce Framework (MRF) for performing the parallel data classification. The novel parameter termed as Log-entropy is used to select the best feature attribute for data classification. The data classification is performed using the LDT that enables the efficient data classification. Experimentation is carried out using three datasets, namely the Cleveland dataset, Switzerland dataset, and the Breast Cancer dataset. The comparative analysis is carried out using the performance metrics, such as sensitivity, specificity, and accuracy to prove the effectiveness of the proposed method. The sensitivity, specificity, and accuracy of the proposed method is 84.7596%, 74.633%, and 80.9088% respectively, which is greater when compared with the existing methods of big data classification. 


  • Keywords


    Big data classification, Map Reduce, Log-entropy, Log Decision Tree, Accuracy.

  • References


      [1] Victoria López, Sara del Río, José Manuel Benítez, and Francisco Herrera, "Cost-sensitive linguistic fuzzy rule based classification systems under the Map Reduce framework for imbalanced big data", Fuzzy Sets and Systems, vol. 258, pp. 5–38, January 2015.

      [2] Emad A Mohammed, Behrouz H Far, and Christopher Naugler, "Applications of the Map Reduce programming framework to clinical big data analysis: current landscape and future trends", BioData Mining, vol. 7, no.1, 2014.

      [3] Sara del Río, Victoria López, José Manuel Benítez, and Francisco Herrera , "On the use of Map Reduce for imbalanced big data using Random Forest", Journal of Information Sciences, vol.285, pp.112–13720, November 2014.

      [4] Shamsul Huda, John Yearwood, Herbert F. Jelinek, Mohammad Mehedi Hassan, Giancarlo Fortino, and Michael Buckland, "A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis", IEEE Access, Vol. 4, pp. 9145 - 9154, 2017.

      [5] Magnus Orn Ulfarsson, Frosti Palsson, Jakob Sigurdsson, and Johannes R. Sveinsson, " Classification of Big Data with Application to Imaging Genetics", Computer Vision and Pattern Recognition, 2016.

      [6] Alberto Fernández, Sara del Río, Nitesh V. Chawla, and Francisco Herrera, "An insight into imbalanced Big Data classification: outcomes and challenges", Complex & Intelligent Systems, pp 1–16, 2017.

      [7] Dawen Xia, Huaqing Li, Binfeng Wang, Yantao Li, and Zili Zhang, "A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction", IEEE Access, vol. 4, pp. 2920 - 2934, 2016.

      [8] Sina Khanmohammadia and Chun-An Choua, "A Gaussian Mixture Model Based Discretization Algorithm for Associative Classification of Medical Data", Expert Systems with Applications, vol. 58, pp. 119–129, 1 October 2016.

      [9] Sara del Rio, Victoria Lopez , Jose Manuel Benitez, and Francisco Herrera, "A Map Reduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules", International Journal of Computational Intelligence Systems, vol. 8, no. 3, pp.422-437, 2015.

      [10] IBM, What is big data? Bringing big data to the enterprise, [Online; accessed December 2013], http://www-01.ibm.com/software/data/bigdata/, 2012.

      [11] P.Zikopoulos, C.Eaton, D.DeRoos, T.Deutsch, G.Lapis, "Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data", McGraw-Hill, 2011.

      [12] S. Madden , "From data bases to big data", IEEE Internet Computing, vol.16, no.3, pp. 4– 6, 2012.

      [13] A. Sathi, "Big Data Analytics: Disruptive Technologies for Changing the Game", MCPress, 2012.

      [14] Miner, A. Shook, "MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems", O’Reilly Media, 2012

      [15] Kaplan R.S, Porter M.E, "How to solve the cost crisis in health care", Harv Bus Rev, vol. 89, no.9, pp.46–52, 2011.

      [16] Musen M.A, Middleton B, Greenes R.A, "Clinical decision-support systems", Biomedical Informatics, pp. 643–674, 2014.

      [17] Devaraj S, Ow TT, Kohli R, "Examining the impact of information technology and patient flow on healthcare performance: A Theory of Swift and Even Flow (TSEF) perspective", Journal of Operations Management, vol. 31, no.4, pp.181–192, May 2013.

      [18] Friedman A.B, "Preparing for responsible sharing of clinical trial data", New England Journal of Medicine, vol.370, no. 5, pp.484–484, 2014.

      [19] Mazurek M, "Applying NoSQL Databases for Operationalizing Clinical Data Mining Models", International Conference: Beyond Databases, Architectures and Structures, pp 527-536, 2014.

      [20] Vijay Mahadeo Mane and D.V. Jadhav, " Holoentropy enabled-decision tree for automatic classification of diabetic retinopathy using retinal fundus images", Biomed. Eng.-Biomed. Tech. 2016.

      [21] Hari Singh, Seema Bawa , "A MapReduce-based scalable discovery and indexing of structured big data", Future Generation Computer Systems , vol.73, pp.32-43, August 2017.

      [22] Alessio Bechini, Francesco Marcelloni, Armando Segatori, " A MapReduce solution for associative classification of big data", Information Sciences, vol. 332, pp. 33-55, 1 March 2016.

      [23] Seema Maitrey, C.K. Jha, " MapReduce: Simplified Data Analysis of Big Data", Procedia Computer Science, vol. 57, pp. 563-571, 2015.

      [24] Jin Qian, Ping Lv, Xiaodong Yue, Caihui Liu, Zhengjun Jing, " Hierarchical attribute reduction algorithms for big data using MapReduce", Knowledge-Based Systems, vol. 73, pp.18-31, January 2015.

      [25] Cen Chen, Kenli Li, Aijia Ouyang, Keqin Li, " A parallel approximate SS-ELM algorithm based on Map Reduce for large-scale datasets", Journal of Parallel and Distributed Computing, 21 January 2017.

      [26] UCI machine learning repository dataset - https://archive.ics.uci.edu/ml/datasets.html.

      [27] T. Surekha, Dr. R. Siva Rama Prasad,, " LDT: Log Decision Tree to clinical data classification", Journal of Theoretical and Applied Information Technology, 15 January 2018.

      [28] T. Padmapriya, V.Saminadan, “Performance Improvement in long term Evolution-advanced network using multiple imput multiple output technique”, Journal of Advanced Research in Dynamical and Control Systems, Vol. 9, Sp-6, pp: 990-1010, 2017.

      [29] S.V.Manikanthan and K.srividhya "An Android based secure access control using ARM and cloud computing", Published in: Electronics and Communication Systems (ICECS), 2015 2nd International Conference on 26-27 Feb. 2015,Publisher: IEEE,DOI: 10.1109/ECS.2015.7124833.

      [30] Rajesh, M., and J. M. Gnanasekar. "Path observation-based physical routing protocol for wireless ad hoc networks." International Journal of Wireless and Mobile Computing 11.3 (2016): 244-257.


 

View

Download

Article ID: 9129
 
DOI: 10.14419/ijet.v7i1.5.9129




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.