A review on data stream classification approaches

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Stream data is usually in vast volume, changing dynamically, possibly infinite, and containing multi-dimensional features. The attention towards data stream mining is increasing as regards to its presence in wide range of real-world applications, such as e-commerce, banking, sensor data and telecommunication records. Similar to data mining, data stream mining includes classification, clustering, frequent pattern mining etc. techniques; the special focus of this paper is on classification methods invented to handle data streams. Early methods of data stream classification needed all instances to be labeled for creating classifier models, but there are some methods (Semi-Supervised Learning and Active Learning) in which unlabeled data is employed as well as labeled data. In this paper, by focusing on ensemble methods, semi-supervised and active learning, a review on some state of the art researches is given.

  • Keywords

    Data Stream; Data Stream Classification; Ensemble; Semi-Supervised Learning; Active Learning.

  • References

      [1] H. H. Mahnoosh Kholghi, Mohammad Reza Keyvanpour, "Classification and Evaluation of Data Mining Techniques for Data Stream Requirements," presented at the International Symposium on Computer, Communication, Control and Automation, Tainan, Taiwan, 2010, http://dx.doi.org/10.1109/3CA.2010.5533759.

      [2] Data Streams Models and Algorithms: Springer, 2007,

      [3] J. D. U. Anand Rajaraman, Mining of Massive Datasets: Cambridge, 2012,

      [4] J. G. M. M. Gaber, Learning from Data Streams: Springer, 2007,

      [5] J. Gama, Knowledgeb Discovery from Data Streams: Chapman & Hall/CRC, Taylor & Francis Group, 2010, http://dx.doi.org/10.1201/EBK1439826119.

      [6] M. Kantardzic, Data mining : concepts, models, methods and algorithms: Wiley-IEEE Press, 2011, http://dx.doi.org/10.1002/9781118029145.ch1.

      [7] C. W. Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Kevin W. Hamlen & Nikunj C. Oza, "Facing the reality of data stream classification: coping with scarcity of labeled data," Knowl Inf Syst, vol. 33, p. 32, 2011, http://dx.doi.org/10.1007/s10115-011-0447-8.

      [8] O. M. L. Rokach, Data Mining and Knowledge Discovery Handbook, 2 ed.: Springer, 2010, http://dx.doi.org/10.1007/978-0-387-09823-4.

      [9] L. Z. Dewan Md. Farid, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca Strachan, Graham Sexton, Keshav DahalDewan Md. Farid, Li Zhang, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca Strachan, Graham Sexton & Keshav Dahal, "An adaptive ensemble classifier for mining concept drifting data streams," Expert Systems with Applications, vol. 40, p. 12, 2013,

      [10] R. Polikar, "Ensemble based systems in decision making," IEEE Circuits and Systems Magazine, vol. 6, p. 25, 2006, http://dx.doi.org/10.1109/MCAS.2006.1688199.

      [11] M. S. V. K. Pang-Ning Tan, Introduction to Data Mining vol. 1: Pearson Education, 2006,

      [12] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. USA: Wiley, 2004, http://dx.doi.org/10.1002/9781118914564.refs.

      [13] P. Z. Wenyu Zang, Chuan Zhou & Li Guo, "Comparative study between incremental and ensemble learning on data streams: Case study," Journal of Big Data, vol. 1, p. 16, 2014, http://dx.doi.org/10.1186/2196-1115-1-5.

      [14] X. G.-s. LIU Jing, ZHENG Shi-hui, XIAO Da & GU Li-ze, "Data streams classification with ensemble model based on decision feedback," The Journal of China Universities of Posts and Telecommunications, vol. 21, p. 7, 2014, http://dx.doi.org/10.1016/S1005-8885(14)60272-7.

      [15] D. B. S. P. M. Hanady Abdulsalam, "Classification Using Streaming Random Forests," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 23, p. 15, 2011, http://dx.doi.org/10.1109/TKDE.2010.36.

      [16] D. B. J. Stefanowski, "Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm," IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 25, p. 14, 2014, http://dx.doi.org/10.1109/TNNLS.2013.2251352.

      [17] D. B. J. Stefanowski, "Accuracy Updated Ensemble for Data Streams with Concept Drift," presented at the International Conference on Hybrid Artificial Intelligent Systems, 2011, http://dx.doi.org/10.1007/978-3-642-21222-2_19.

      [18] J. Gao. (2014-07-01). Data Stream Mining: Challenges and Techniques. Available: http://www.cse.buffalo.edu/~jing/talks.htm

      [19] Q. C. Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han & Bhavani Thuraisingham, "Classification and Novel Class Detection of Data Streams in a Dynamic Feature Space," presented at the European conference on Machine learning and knowledge discovery in databases, Berlin, 2010, http://dx.doi.org/10.1007/978-3-642-15883-4_22.

      [20] Q. C. Mohammad M. Masud , Latifur Khan, Charu Aggarwal, Jing Gao, Jiawei Han & Bhavani Thuraisingham, "Addressing Concept-Evolution in Concept-Drifting Data Streams," presented at the IEEE International Conference on Data Mining, 2010, http://dx.doi.org/10.1109/ICDM.2010.160.

      [21] C. W. Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Kevin W. Hamlen & Nikunj C. Oza, "Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 25, p. 14, 2013, http://dx.doi.org/10.1109/TKDE.2012.109.

      [22] J. H. Charu C. Aggarwal, Jianyong Wang & Philip S. Yu, "A Fremework for On-Demand Classification of Evolving Data Streams," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, vol. 18, p. 13, 2006, http://dx.doi.org/10.1109/TKDE.2006.69.

      [23] J. H. P. S. Y. Charu C. Aggarwal, "A Framework for Clustering Evolving Data Streams," in International Conferences of Very Large Data Bases, Berlin, 2003, p. 11,

      [24] (1999, 2014-06-06). KDD Cup 1999 Data. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

      [25] X. X. G. Q. Yan Leng, "Combining Active Learning and Semi-supervised Learning to Construct SVM Classifier," Knowledge-Based Systems, vol. in press, p. 31, 2014, http://dx.doi.org/10.1016/j.knosys.2013.01.032.

      [26] S. C. Z.-H. Z. Yunyun Wang, "New Semi-Supervised Classification Method Based on Modified Cluster Assumption," IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 23, p. 14, 2012, http://dx.doi.org/10.1109/TNNLS.2012.2186825.

      [27] P. L. X. H. Xindong Wua, "Learning from concept drifting data streams with unlabeled data," Neurocomputing, vol. 92, p. 11, 2012, http://dx.doi.org/10.1016/j.neucom.2011.08.041.

      [28] P. Z. Xingquan Zhu, Xiaodong Lin & Yong Shi, "Active Learning From Stream Data Using Optimal Weight Classifier Ensemble," IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART B: CYBERNETICS, vol. 40, p. 15, 2010, http://dx.doi.org/10.1109/TSMCB.2010.2042445.

      [29] Z. A. H. B. Mohammad Javad Hosseini, "Pool and Accuracy Based Stream Classification: A new ensemble algorithm on data stream classification using recurring concept detection," presented at the 11th IEEE International Conference on Data Mining Workshops, 2011, http://dx.doi.org/10.1109/ICDMW.2011.137.

      [30] X. Z. Peng Zhang, Jianlong Tan & Li Guo, "Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams," presented at the 2010 IEEE 10th International Conference on Data Mining (ICDM), Sydney, NSW, 2010, http://dx.doi.org/10.1109/ICDM.2010.125.

      [31] (2014-07-08). UC Irvine Machine Learning Repository. Available: http://archive.ics.uci.edu/ml/.




Article ID: 5225
DOI: 10.14419/jacst.v5i1.5225

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.