A novel approach to ensemble learning in distributed data mining

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Several data mining techniques have been proposed to take out hidden information from databases. Data mining and knowledge extraction becomes challenging when data is massive, distributed and heterogeneous. Classification is an extensively applied task in data mining for prediction. Huge numbers of machine learning techniques have been developed for the purpose. Ensemble learning merges multiple base classifiers to improve the performance of individual classification algorithms. In particular, ensemble learning plays a significant role in distributed data mining. So, study of ensemble learning is crucial in order to apply it in real-world data mining problems. We propose a technique to construct ensemble of classifiers and study its performance using popular learning techniques on a range of publicly available datasets from biomedical domain.



  • Keywords

    Ensemble Learning; Meta-Learning; Classifier Ensemble; Ensemble Method; Classification Performance; Meta-Classifier.

  • References

      [1] Vilalta R. and Drissi Y. (2002), “A Perspective View and Survey of Meta-Learning”, Journal of Artificial Intelligence Review, 18 (2), pp.77-95.

      [2] Saso D., and Bernard Z (2004), “Is Combining Classifiers with Stacking Better than Selecting the Best One?”, Machine Learning, 54, Kluwer Academic Publishers, Netherlands, pp.255–273.

      [3] Domingos Pedro (1998), “Knowledge Discovery via Multiple Models”, Intelligent Data Analysis, 2, pp.187-202.

      [4] Ting, K. M., and Witten, I. H. (1999), “Issues in stacked generalization”, Journal of Artificial Intelligence Research, 10, pp.271–289.

      [5] Breiman L. (1996), “Bagging predictors”, Machine Learning, vol. 24, pp.123–140.

      [6] Oza N. C. and Tumer K. (2008), “Classifier ensembles: Select real-world applications,” Information Fusion, vol. 9, no.1, pp. 4–20.

      [7] Dietterich, T. (2000), “Ensemble methods in machine learning”, In Kittler, J., &Roli, F. (Eds.), First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Springer-Verlag, pp. 1–15.

      [8] Polikar R. (2006), “Ensemble based systems in decision making,” IEEE Circuits System Mag., vol. 6, no. 3, pp. 21–45.

      [9] Rokach L. (2010), “Ensemble-based classifiers,” Artificial Intelligence Review, vol.33, pp.1-39.

      [10] Liu M., Zhang D., Yap P. T., and S. D. (2012), “Hierarchical ensemble of multi-level classifiers for diagnosis of Alzheimer’s disease”, In proc. of Machine Learning in Medical Imaging ( MLMI 2012), Lecture Notes in Computer Science, vol. 7588, pp. 27–35.

      [11] Islam R. and Abawajy J. (2013), “A multi-tier phishing detection and filtering approach”, Journal of Network and Computer Applications, vol. 36, pp.324–335.

      [12] Xiao Jin, Xie Ling, He Changzheng, Jiang Xiaoyi (2012), “Dynamic classifier ensemble model for customer classification with imbalanced class distribution”, Expert Systems with Applications, Volume-39, Issue 3, Pp. 3668‐3675.

      [13] Kelarev A.V., Stranieri A., Yearwood J.L., Abawajy J., Jelinek H.F. (2012), “Improving Classifications for Cardiac Autonomic Neuropathy Using Multi-level Ensemble Classifiers and Feature Selection Based on Random Forest”, In Proceedings of the Tenth Australasian Data Mining Conference (AusDM 2012), Sydney, Australia, pp.93-101.

      [14] Fumera, G. and Roli, F. (2005), “A theoretical and experimental analysis of linear combiners for multiple classifier systems”, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), pp.942–956.

      [15] Kotsiantis SB. (2007), “Supervised machine learning: A review of classification techniques”, Informatica, no.31, pp. 249-68.

      [16] Melville P. and Mooney R. J. (2005), “Creating diversity in ensembles using artificial data”, Information Fusion, vol.6, pp.99-111.

      [17] Domeniconi, C. and Al-Razgan, M. (2009), “Weighted cluster ensembles: Methods and analysis”, ACM Transactions on Knowledge Discovery from Data, 2(4), Article 17.

      [18] Freund Y., Schapire R. (1996), “Experiments with a new boosting algorithm”, Proceedings of 13th International Conference of Machince Learning, pp. 148-56.

      [19] Koliopoulos A.K., Yiapanis P., Tekiner F., Nenadic G., Keane J. (2015), “A Parallel Distributed Weka Framework for Big Data Mining using Spark”, IEEE International Congress on Big Data, IEE Computer Society, pp.9-16.




Article ID: 14159
DOI: 10.14419/ijet.v7i2.33.14159

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.