Data Mining Models of High Dimensional Data Streams, and Contemporary Concept Drift Detection Methods: a Comprehensive Review

  • Authors

    • M Sankara Prasanna Kumar
    • A P. Siva Kumar
    • K Prasanna
    2018-07-04
    https://doi.org/10.14419/ijet.v7i3.6.14959
  • CUSUM, streaming ensemble algorithm, concept drift detection, dimensional data streams, change-detection tests, Hoteling’s t-squared test, Bayesian Online Change Point Detection.
  • Concept drift is defined as the distributed data across multiple data streams that change over the time. Concept drift is visible only when the type of collected data changes after some stable period. The emergence of concept drift in data streams leads to increase misclassification and performing degradation of data streams. In order to obtain accurate results, identification of such concept drifts must be visible. This paper focused on a review of the issues related to identifying the changes occurred in the various multivariate high dimensional data streams. The insight of the manuscript is probing the inbuilt difficulties of existing contemporary change-detection methods when they encounter during data dimensions scales.

     

     

  • References

    1. [1] Evangelista PF, Embrechts MJ & Szymanski BK, “Taming the curse of dimensionality in kernels and novelty detectionâ€, Applied soft computing technologies: The challenge of complexity, (2006), pp.425-438.

      [2] Gama J, ŽliobaitÄ— I, Bifet A, Pechenizkiy M & Bouchachia A, “A survey on concept drift adaptationâ€, ACM computing surveys (CSUR), (2014).

      [3] Gama J, Medas P, Castillo G & Rodrigues P, “Learning with drift detectionâ€, Brazilian symposium on artificial intelligence, (2004) pp.286-295.

      [4] Alippi C, Boracchi G & Roveri M, “Just-in-time classifiers for recurrent conceptsâ€, IEEE transactions on neural networks and learning systems, Vol.24, No.4,(2013), pp.620-34.

      [5] Ross GJ, Adams NM, Tasoulis DK & Hand DJ, “Exponentially weighted moving average charts for detecting concept driftâ€, Pattern recognition letters, Vol.33, No.2, (2012), pp.191-8.

      [6] Gama J, Knowledge discovery from data streams, CRC Press, (2010).

      [7] Ditzler G, Roveri M, Alippi C & Polikar R, “Learning in nonstationary environments: A surveyâ€, IEEE Computational Intelligence Magazine, Vol.10, No.4,(2015), pp.12-25.

      [8] Khamassi I, Sayed-Mouchaweh M, Hammami M & Ghédira K, “Self-adaptive windowing approach for handling complex concept driftâ€, Cognitive Computation, Vol.7, No.6,(2015), pp.772-90.

      [9] Minku LL, White AP & Yao X, “The impact of diversity on online ensemble learning in the presence of concept driftâ€, IEEE Transactions on knowledge and Data Engineering, Vol.22, No.5,(2010), pp.730-42.

      [10] Tsymbal A, Pechenizkiy M, Cunningham P & Puuronen S, “Handling local concept drift with dynamic integration of classifiers: Domain of antibiotic resistance in nosocomial infectionsâ€, 19th IEEE International Symposium on Computer-Based Medical Systems, (2006), pp. 679-684.

      [11] Sebastião R, Silva MM, Rabiço R, Gama J & Mendonça T, “Real-time algorithm for changes detection in depth of anesthesia signalsâ€, Evolving Systems, Vol.4, No.1,(2013), pp.3-12.

      [12] Toubakh H & Sayed-Mouchaweh M, “Hybrid dynamic data-driven approach for drift-like fault detection in wind turbinesâ€, Evolving Systems, Vol.6, No.2, (2015), pp.115-29.

      [13] Navarro-Gonzalez JL, Lopez-Juarez I, Ordaz-Hernandez K & Rios-Cabrera R, “On-line incremental learning for unknown conditions during assembly operations with industrial robotsâ€, Evolving Systems, Vol.6, No.2, (2015), pp.101-14.

      [14] Sun J, Li H & Adeli H, “Concept drift-oriented adaptive and dynamic support vector machine ensemble with time window in corporate financial risk predictionâ€, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol.43, No.4,(2013), pp.801-13.

      [15] Aloraini A, “Penalized ensemble feature selection methods for hidden associations in time series environments case study: equities companies in saudi stock exchange marketâ€, Evolving Systems, Vol.6, No.2,(2015), pp.93-100.

      [16] Wang S, Minku LL & Yao X, “Online class imbalance learning and its applications in fault detectionâ€, International Journal of Computational Intelligence and Applications, Vol.12, No.04,(2013).

      [17] AlZoubi O, Fossati D, D’Mello S & Calvo RA, “Affect detection from non-stationary physiological data using ensemble classifiersâ€, Evolving Systems, Vol.6, No.2,(2015), pp.79-92.

      [18] Tünnermann J & Mertsching B, “Region-based artificial visual attention in space and timeâ€, Cognitive computation, Vol.6, No.1, (2014), pp.125-43.

      [19] Amiribesheli M, Benmansour A & Bouchachia A, “A review of smart homes in healthcareâ€, Journal of Ambient Intelligence and Humanized Computing, Vol.6, No.4,(2015), pp.495-517.

      [20] Wald A, Sequential analysis, Courier Corporation, (1973).

      [21] Basseville M & Nikiforov IV, Detection of abrupt changes: theory and application, Englewood Cliffs: Prentice Hall, (1993).

      [22] Pimentel MA, Clifton DA, Clifton L & Tarassenko L, “A review of novelty detectionâ€, Signal Processing, Vol.99,(2014), pp.215-49.

      [23] Ben-Gal I, “Outlier detectionâ€, Data mining and knowledge discovery handbook, (2005), pp.131-146.

      [24] Kuncheva LI, “Change detection in streaming multivariate data using likelihood detectorsâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.25, No.5,(2013), pp.1175-80.

      [25] Zorriassatine F, Al-Habaibeh A, Parkin RM, Jackson MR & Coy J, “Novelty detection for practical pattern recognition in condition monitoring of multivariate processes: a case studyâ€, The International Journal of Advanced Manufacturing Technology, Vol.25, No.9-10, (2005),pp. 954-63.

      [26] Nguyen TD, Du Plessis MC, Kanamori T & Sugiyama M, “Constrained least-squares density-difference estimationâ€, IEICE TRANSACTIONS on Information and Systems, Vol.97, No.7,(2014), pp.1822-9.

      [27] Tartakovsky AG, Rozovskii BL, Blažek RB & Kim H, “Detection of intrusions in information systems by sequential change-point methodsâ€, Statistical methodology, Vol.3, No.3,(2006), pp.252-93.

      [28] Tartakovsky AG, Rozovskii BL, Blazek RB & Kim H, “A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methodsâ€, IEEE Transactions on Signal Processing, Vol.54, No.9, (2006), pp.3372-82.

      [29] Krempl G, “The algorithm APT to classify in concurrence of latency and driftâ€, International Symposium on Intelligent Data Analysis, (2011), pp.222-233.

      [30] Dyer KB, Capo R & Polikar R, “Compose: A semisupervised learning framework for initially labeled nonstationary streaming dataâ€, IEEE transactions on neural networks and learning systems, Vol.25, No.1,(2014), pp.12-26.

      [31] Lung-Yut-Fong A, Lévy-Leduc C & Cappé O, “Robust changepoint detection based on multivariate rank statisticsâ€, A IEEE International Conference on coustics, Speech and Signal Processing (ICASSP), (2011), pp.3608-3611.

      [32] Ditzler G & Polikar R, “Hellinger distance-based drift detection for nonstationary environmentsâ€, IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE), (2011), pp.41-48.

      [33] Montanez GD, Amizadeh S & Laptev N, “Inertial Hidden Markov Models: Modeling Change in Multivariate Time Seriesâ€, AAAI, (2015), pp.1819-1825

      [34] Adams RP & MacKay DJ, “Bayesian online change point detectionâ€, arXiv preprint arXiv:0710.3742, (2007).

      [35] Cavalcante RC, Minku LL & Oliveira AL, “Fedd: Feature extraction for explicit concept drift detection in time seriesâ€, International Joint Conference on Neural Networks (IJCNN), (2016), pp.740-747.

      [36] Barnett I & Onnela JP, “Change point detection in correlation networksâ€, Scientific reports, (2016).

      [37] Idé T, Phan DT & Kalagnanam J, “Change Detection Using Directional Statisticsâ€, IJCAI, (2016), pp.1613-1619.

      [38] Yamada M, Kimura A, Naya F & Sawada H, “Change-Point Detection with Feature Selection in High-Dimensional Time-Series Dataâ€, IJCAI, (2013), pp.1827-1833.

      [39] Hocking T, Rigaill G, Vert JP & Bach F, “Learning sparse penalties for change-point detection using max margin interval regressionâ€, International conference on machine learning, (2013), pp.172-180.

      [40] Harel M, Mannor S, El-Yaniv R & Crammer K, “Concept drift detection through resamplingâ€, International Conference on Machine Learning, (2014), pp.1009-1017.

      [41] Bardwell L & Fearnhead P, “Bayesian detection of abnormal segments in multiple time seriesâ€, Bayesian Analysis, Vol.12, No.1, (2017), pp.193-218.

      [42] Cabrieto J, Tuerlinckx F, Kuppens P, Grassmann M & Ceulemans E, “Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methodsâ€, Behavior research methods, Vol.49, No.3,(2017), pp.988-1005.

      [43] Jones M, Nikovski D, Imamura M & Hirata T, “Exemplar learning for extremely efficient anomaly detection in real-valued time seriesâ€, Data Mining and Knowledge Discovery, Vol.30, No.6, (2016), pp.1427-54.

      [44] Qahtan AA, Alharbi B, Wang S & Zhang X, “A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streamsâ€, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2015), pp.935-944.

      [45] Guha S, Mishra N, Roy G & Schrijvers O, “Robust random cut forest based anomaly detection on streamsâ€, International Conference on Machine Learning, (2016), pp.2712-2721.

      [46] Song X, Wu M, Jermaine C & Ranka S, “Statistical change detection for multi-dimensional dataâ€, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, (2007), pp.667-676.

      [47] Dasu T, Krishnan S, Venkatasubramanian S & Yi K, “An information-theoretic approach to detecting changes in multi-dimensional data streamsâ€, Proc. Symp. on the Interface of Statistics, Computing Science, and Applications, (2006).

      [48] Krempl G, Siddiqui ZF & Spiliopoulou M, “Online clustering of high-dimensional trajectories under concept driftâ€, Proceedings of the European conference on Machine learning and knowledge discovery in databases-Volume Part II, (2011), pp.261-276.

      [49] Gaber MM & Yu PS, “Classification of changes in evolving data streams using online clustering result deviationâ€, Proc. Of International Workshop on Knowledge Discovery in Data Streams, (2006).

      [50] Hunt KM & Turner AG, “The effect of soil moisture perturbations on Indian monsoon depressions in a numerical weather prediction modelâ€, Journal of Climate, Vol.30, No.21,(2017), pp.8811-23.

      [51] Faithfull WJ & Kuncheva LI, “On Optimum Thresholding of Multivariate Change Detectorsâ€, Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), (2014), pp.364-373.

      [52] Krawczyk B, Minku LL, Gama J, Stefanowski J & Woźniak M, “Ensemble learning for data stream analysis: A surveyâ€, Information Fusion, Vol.37, (2017), pp.132-56.

      [53] Maciel BI, Santos SG & Barros RS, “A lightweight concept drift detection ensembleâ€, IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), (2015), pp.1061-1068.

      [54] Woźniak M, Ksieniewicz P, Cyganek B & Walkowiak K, “Ensembles of Heterogeneous Concept Drift Detectors-Experimental Studyâ€, 15th IFIP International Conference on Computer Information Systems and Industrial Management, (2016), pp.538-549.

      [55] Du L, Song Q, Zhu L & Zhu X, “A selective detector ensemble for concept drift detectionâ€, The Computer Journal, Vol.58, No.3, (2014), pp.457-71.

      [56] Alippi C, Boracchi G & Roveri M, “Hierarchical change-detection tests. IEEE transactions on neural networks and learning systemsâ€, Vol.28, No.2, (2017), pp.246-58.

      [57] Bifet A, Frank E, Holmes G & Pfahringer B, “Ensembles of restricted hoeffding treesâ€, ACM Transactions on Intelligent Systems and Technology (TIST), Vol.3, No.2, (2012).

      [58] Frías-Blanco I, del Campo-Ãvila J, Ramos-Jiménez G, Morales-Bueno R, Ortiz-Díaz A & Caballero-Mota Y, “Online and non-parametric drift detection methods based on Hoeffding’s boundsâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.27, No.3, (2015), pp.810-23.

      [59] Baena-García M, del Campo-Ãvila J, Fidalgo R, Bifet A, Gavaldà R & Morales-Bueno R, “Early drift detection methodâ€, (2006).

      [60] Street WN & Kim Y, “A streaming ensemble algorithm (SEA) for large-scale classificationâ€, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, (2001), pp.377-382.

      [61] Kuncheva LI, “Classifier ensembles for detecting concept change in streaming data: Overview and perspectivesâ€, 2nd Workshop SUEMA, (2008), pp.5-10.

  • Downloads

  • How to Cite

    Sankara Prasanna Kumar, M., P. Siva Kumar, A., & Prasanna, K. (2018). Data Mining Models of High Dimensional Data Streams, and Contemporary Concept Drift Detection Methods: a Comprehensive Review. International Journal of Engineering & Technology, 7(3.6), 148-153. https://doi.org/10.14419/ijet.v7i3.6.14959