Evaluating the Performance of Supervised Classification Models: Decision Tree and Naïve Bayes Using KNIME

  • Authors

    • Syed Muzamil Basha
    • Dharmendra Singh Rajput
    • Ravi Kumar Poluru
    • S. Bharath Bhushan
    • Shaik Abdul Khalandar Basha
    2018-09-22
    https://doi.org/10.14419/ijet.v7i4.5.20079
  • Classification Accuracy, Decision Tree, Error Rate, F-measure, KNIME Analytics platform, Naïve Bayes, Precision, Recall.
  • The classification task is to predict the value of the target variable from the values of the input variables. If a target is provided as part of the dataset, then classification is a supervised task. It is important to analysis the performance of supervised classification models before using them in classification task. In our research we would like to propose a novel way to evaluated the performance of supervised     classification models like Decision Tree and Naïve Bayes using KNIME Analytics platform. Experiments are conducted on Multi variant dataset consisting 58000 instances, 9 columns associated specially for classification, collected from UCI Machine learning repositories  (http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle)) and compared the performance of both the models in terms of Classification  Accuracy (CA) and Error Rate. Finally, validated both the models using Metric precision, recall and F-measure. In our finding, we found that  Decision tree acquires CA (99.465%) where as Naïve Bayes attain CA (90.358%). The F-measure of Decision tree is 0.984, whereas Naïve Bayes acquire 0.7045.

     

     

  • References

    1. [1] C. E. López Guarín, E. L. Guzmán and F. A. González,"A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining", IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, Vol.10, No.3, (2015), pp.119-125.

      [2] Wei Chen, Xiaoshen Xie, Jiale Wang, Biswajeet Pradhan, Haoyuan Hong, Dieu Tien Bui, Zhao Duan, Jianquan Ma,"A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility", CATENA, Vol.151, ( 2017), pp.147-160.

      [3] Zhao Zhang, Lei Wang, Lei Jia, Fanzhang Li, Li Zhang, Mingbo Zhao,"Projective label propagation by label embedding: A deep label prediction framework for representation and classification", Knowledge-Based Systems, Vol.119, (2017), pp.94-112.

      [4] Zhengxing Huang, Tak-Ming Chan, Wei Dong,"MACE prediction of acute coronary syndrome via boosted resampling classification using electronic medical records", Journal of Biomedical Informatics, Vol. 66, (2017), pp.161-170.

      [5] Saeed Banihashemi, Grace Ding, Jack Wang,"Developing a Hybrid Model of Prediction and Classification Algorithms for Building Energy Consumption", Energy Procedia, Vol.110, (2017), pp.371-376.

      [6] Mohammad Hossein Rafiei, Hojjat Adeli,"NEEWS: A novel earthquake early warning model using neural dynamic classification and neural dynamic optimization", Soil Dynamics and Earthquake Engineering, Vol.100, (2017), pp.417-427.

      [7] Diego P.P. Mesquita, Lincoln S. Rocha, João Paulo P. Gomes, Ajalmar R. Rocha Neto,"Classification with reject option for software defect prediction", Applied Soft Computing, Vol.49, (2016), pp.1085-1093.

      [8] Zeyu Wang, Ravi S. Srinivasan,"A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models", Renewable and Sustainable Energy Reviews, Vol.75, (2017), pp.796-808.

      [9] Mazin Abed Mohammed, Mohd Khanapi AbdGhani, Raed Ibraheem Hamed, Dheyaa Ahmed Ibrahim,"Review on Nasopharyngeal Carcinoma: Concepts, methods of analysis, segmentation, classification, prediction and impact: A review of the research literature", Journal of Computational Science, (2017).

      [10] Ghulam Mujtaba, Liyana Shuib, Ram Gopal Raj, Retnagowri Rajandram, Khairunisa Shaikh,"Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study", Journal of Forensic and Legal Medicine, (2017).

      [11] Miha Pavlinek, Vili Podgorelec,"Text classification method based on self-training and LDA topic models", Expert Systems with Applications, Vol.80, (2017), pp.83-93s.

      [12] Tinghui Ouyang, Xiaoming Zha, Liang Qin,"A combined multivariate model for wind power prediction", Energy Conversion and Management, Vol.144, (2017), pp.361-373.

      [13] Goran Mauša, Tihana Galinac Grbac,"Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study", Applied Soft Computing, Vol. 55, (2017), pp.331-351.

      [14] Basha, Syed Muzamil, Yang Zhenning, Dharmendra Singh Rajput, N. Iyengar, and D. R. Caytiles,"Weighted Fuzzy Rule Based Sentiment Prediction Analysis on Tweets", International Journal of Grid and Distributed Computing, Vol.10,No.6, (2017), pp.41-54, DOI: 10.14257/ijgdc.2017.10.6.04.

      [15] Basha, Syed Muzamil, Yang Zhenning, Dharmendra Singh Rajput, Ronnie D. Caytiles, and N. Ch SN Iyengar,"Comparative Study on Performance Analysis of Time Series Predictive Models", International Journal of Grid and Distributed Computing, Vol.10,No.8, (2017), pp.37-48, DOI: 10.14257/ijgdc.2017.10.8.04.

      [16] Basha, Syed Muzamil, H. Balaji, N. Ch SN Iyengar, and Ronnie D. Caytiles,"A Soft Computing Approach to Provide Recommendation on PIMA Diabetes", International Journal of Advanced Science and Technology, Vol.106, (2017), pp.19-32, DOI: 10.14257/ijast.2017.106.03.

      [17] Basha, Syed Muzamil, Dharmendra Singh Rajput, and Vishnu Vandhan,"Impact of Gradient Ascent and Boosting Algorithm in Classification", International Journal of Intelligent Engineering and Systems (IJIES), Vol.11,No.1, (2018), pp.41-49. DOI: 10.22266/ijies2018.0228.05.

      [18] Poluru, Ravi Kumar, and Shaik Naseera,"A Literature Review on Routing Strategy in the Internet of Things", Journal of Engineering Science and Technology Review, Vol.10,No.5, (2017), pp.50-60, DOI:10.25103/jestr.105.06.

      [19] Bhushan, S. Bharath, and Pradeep Reddy,"A Four-Level Linear Discriminant Analysis Based Service Selection in The Cloud Environment", International Journal of Technology, Vol. 5, (2016), pp. 859-870.

      [20] Bhushan, S. Bharath, and Reddy CH Pradeep,"A Network QoS Aware Service Ranking Using Hybrid AHP-PROMETHEE Method in Multi-Cloud Domain", International Journal of Engineering Research in Africa, Vol. 24, (2016).

      [21] Gitanjali J,"Data mining from smart card data using data clustering", International Journal of Applied Engineering Research, Vol.11,No.1, (2016), pp.347-52.

  • Downloads

  • How to Cite

    Muzamil Basha, S., Singh Rajput, D., Kumar Poluru, R., Bharath Bhushan, S., & Abdul Khalandar Basha, S. (2018). Evaluating the Performance of Supervised Classification Models: Decision Tree and Naïve Bayes Using KNIME. International Journal of Engineering & Technology, 7(4.5), 248-253. https://doi.org/10.14419/ijet.v7i4.5.20079