Modern Very Fast Decision Tree Model for Mining High-Speed Time-Series Data Stream

  • Authors

    • A. Vanitha Katherine
    • T. Kamalavalli
    • S. Vinothini
    • M. Jagannath
    • V. E. Jayanthi
    2018-12-13
    https://doi.org/10.14419/ijet.v7i4.39.24361
  • Data Mining, Data Streams, Very Fast Decision Tree, Tree Mechanism.
  • Data mining is one of the drastically growing research fields in data analysis. Data is generated on a person, object, element, label in terms of time, days, months, years. Although ample algorithms currently exist for high-speed data streams, they fail to efficiently scale up the data when the data size is large. In this paper, an algorithm is proposed to perform clustering for high-speed data streams using Modern Very Fast Decision Tree (MVFDT) model. It replaces the old decision tree model by clustering to enhance its accuracy. MVFDT takes clusters based records in the database and compares with other cluster of records if any relationship among the records. MVFDT reads a model for clustering which is similar in accuracy of Very Fast Decision Tree (VFDT). In VFDT, new samples are arrived every time for a moving window. But the result of VFDT does not provide satisfactory in terms of data scalability, i.e., large in volume. MVFDT incorporates three different functionalities such as dynamic tree formation, windowing based clustering and classification for calculating the Frequent Pattern (FP) and query process. Experiments are carried out by using large set of time-series and time-changing data streams to compare the clustering and mining efficiency of MVFDT. Experiment results seem to prove that MVFDT model provides more mining efficiency than VFDT.

     

     

  • References

    1. [1] A. Bifet, R. Kirkby, Data Stream Mining a Practical Approach, Technical Report, University of WAIKATO, 2009.

      [2] D. Brzezinski, Mining Data Streams with Concept Drift, Master’s Thesis, Poznan University of Technology, Poznan, Poland, 2010. Available at: http://www.cs.put.poznan.pl/dbrzezinski/publications/ConceptDrift.pdf.

      [3] K. Patel, Review on data stream classification, In Proceedings of the International Conference on Computing and Information Technology, Tirupati, India, 2012, pp. 13–35.

      [4] P.L. Barlett, S. Ben-David, S.R. Kulkarni, Learning changing concepts by exploiting the structuer of change, Machine Learning, Vol. 41, No. 2, 2000, pp. 153–174.

      [5] V. Ganti, J. Gehrke, R. Ramakrishnan, DEMON: Mining and monitoring evolving data, In Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, USA, 2000, pp. 439–448.

      [6] G. Hulten, L. Spencer, P. Domingos, Mining time-changing data streams, In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, 2001, pp. 97–106.

      [7] C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases, 2003, pp. 81–92.

      [8] C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, A framework for projected clustering of high dimensional data streams, In Proceedings of the 13th International Conference on Very Large Data Bases, Toronto, Canada, 2004, pp. 852–863.

      [9] P. Domingos, G. Hulten, Mining high speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, 2000, pp. 71–80.

      [10] B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series classification, IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 12, 2014, pp. 3026–3037.

      [11] Z. Yu, P. Luo, J. You, H. Wong, H. Leung, S. Wu, J. Zhang, G. Han, Incremental semi-supervised clustering ensemble for high dimensional data clustering, IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 3, 2016, pp. 701–714.

      [12] U. Mori, A. Mendiburu, J.A. Lozano, Similarity measure selection for clustering time series databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 1, 2016, pp. 181–195.

      [13] H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, 2005, pp. 491–502.

      [14] H. Yang, S. Fong, G. Sun, R. Wong, A very fast decision tree algorithm for real-time data mining of imperfect data streams in a distributed wireless sensor network, International Journal of Distributed Sensor Networks, Vol. 8, No. 2, 2012, pp. 863545.

      [15] R. Latif, H. Abbas, S. Latif, EVFDT: An enhanced very fast decision tree algorithm for detecting distributed denial of service attack in cloud-assisted wireless body area network, Mobile Information Systems, Vol. 2015, Article ID 260594, 2015, pp. 1–13.

      [16] http://www.USI/repoistory/syntheticdataset.html [Accessed on July 18, 2018].

  • Downloads

  • How to Cite

    Vanitha Katherine, A., Kamalavalli, T., Vinothini, S., Jagannath, M., & E. Jayanthi, V. (2018). Modern Very Fast Decision Tree Model for Mining High-Speed Time-Series Data Stream. International Journal of Engineering & Technology, 7(4.39), 492-496. https://doi.org/10.14419/ijet.v7i4.39.24361