Extensive analysis of techniques in data streams

Authors

  • Ramesh Balasubramaniam Bharathiar University, Coimbatore
  • K. Nandhini Bharathiar University, Coimbatore

DOI:

https://doi.org/10.14419/ijet.v7i4.14224

Published:

2019-04-12

Keywords:

Hashing, Sampling, Sketching, Stream Data Model, Streaming Techniques.

Abstract

Applications are generating huge capacities (volume) of data at high speeds (velocity) from various sources such as images, text, audio, and video (variety). Big data streams are generated by many applications in today’s world like IoT devices, online purchases, internet traffic, social media, stock exchanges and more. The data source decides whether the processing should be by batch or stream. It is impossible and unnecessary to record all incoming data, hence the need for data reduction techniques in data streaming. These techniques (sampling, sketching, hashing, dimension reduction, and more) enable us to narrow down the big data to relevant data. This data sampled, filtered, hashed, or processed through other techniques is used as input for data analysts to derive meaningful information. A well-designed Data Stream Management System will strike a balance between the right data processing and the cost of processing. This paper highlights the different techniques used in streaming data, related work in that area and the uses in today’s world.

 

 

References

[1] C.C. Aggarwal, "Data Streams and Algorithms", Kluwer Academic Publishers, Boston, 2007.

[2] B. Babcock, S. Babu, M. Datar, R. Motwani and D. Thomas, "Operator scheduling in data stream systems." The VLDB Journal the International Journal on Very Large Data Bases, Vol. 13, Issue. 4, pp.333-353, 2004. https://doi.org/10.1007/s00778-004-0132-6.

[3] C. Böhm, "Similarity search and data mining: Database techniques supporting next decade's applications".

[4] V. Braverman, R. Ostrovsky, and C. Zaniolo, "Optimal sampling from sliding windows." Journal of Computer and System Sciences, Vol. 78, Issue. 1, pp. 260-272, 2012. https://doi.org/10.1016/j.jcss.2011.04.004.

[5] G. Cormode, "Sketch techniques for approximate query processing." Foundations and Trends in Databases. NOW publishers, 2011.

[6] G. Cugola, and A. Margara, "Processing flows of information: From data stream to complex event processing." ACM Computing Surveys (CSUR), Vol. 44, Issue. 3, Article No. 15, June 2012. https://doi.org/10.1145/2187671.2187677.

[7] L. Golab, "Sliding window query processing, over data streams.", University of Waterloo, 2006.

[8] P.J. Haas, "Data-stream sampling: basic techniques and results." In Data Stream Management, pp. 13-44. Springer, Berlin, Heidelberg, 2016. https://doi.org/10.1007/978-3-540-28608-0_2.

[9] G. Hebrail, "Data stream management and mining." Mining massive data sets for security, pp.89-102, 2008.

[10] A.K. Jain, R. Jones and P. Joshi, "Survey of Cryptographic Hashing Algorithms for Message Signing." IJCST, Vol. 8, Issue. 2, 2017.

[11] MS. Kavitha and S. Takmare, "Review of Existing Methods in K-means Clustering Algorithm.", Vol. 4, Issue. 2, 2017.

[12] D. Lee, A. Alric, R. Dustin, and K. Ryan, "A streaming clustering approach using a heterogeneous system for big data analysis." In Computer-Aided Design (ICCAD), 2017 IEEE/ACM International Conference on, pp. 699-706. IEEE, 2017. https://doi.org/10.1109/ICCAD.2017.8203845.

[13] J. Leskovec, A. Rajaraman, and J.D. Ullman, “Mining of massive datasetsâ€, Cambridge university press, chap. 3, 2014. https://doi.org/10.1017/CBO9781139924801.

[14] B.N. Miller, D.L. Bradley, “Problem Solving with Algorithms and Data Structures Using Python†SECOND EDITION. Franklin, Beedle & Associates Inc., 2011.

[15] E. Panigati, F.A. Schreiber, and C. Zaniolo. "Data streams and data stream management systems and languages." In Data Management in Pervasive Systems, pp. 93-111. Springer, Cham, 2015. https://doi.org/10.1007/978-3-319-20062-0_5.

[16] B. Ramesh, R. Nandhini, “Clustering Algorithms – A Literature Reviewâ€, International Journal of Computer Sciences and Engineering, vol. 5, Issue 10, 2017. https://doi.org/10.26438/ijcse/v5i10.302306.

[17] J.A.R. Rojas, M.B. Kery, S. Rosenthal, A. Dey, "Sampling techniques to improve big data exploration." In 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), pp. 26-35. IEEE, 2017. https://doi.org/10.1109/LDAV.2017.8231848.

[18] I. Rozenbaum, “Filtering techniques for data streamsâ€. Rutgers The State University of New Jersey-New Brunswick, 2007.

[19] F. Rusu, and A. Dobra, "Sketching sampled data streams." In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on, pp. 381-392. IEEE, 2009. https://doi.org/10.1109/ICDE.2009.31.

[20] C. Tozzi, “Big Data 101: Dummy’s Guide to Batch vs. Streaming Data†syncsort blog, July 25, 2017

[21] M.H. ur Rehman, C.W. Liew, A. Abbas, P.P. Jayaraman, T.Y. Wah, and S.U. Khan, "Big data reduction methods: a survey." Data Science and Engineering 1, no. 4, pp.265-284, 2016.

[22] J.S. Vitter, "Random sampling with a reservoir." ACM Transactions on Mathematical Software (TOMS) 11, no. 1, pp. 37-57, 1985. https://doi.org/10.1145/3147.3165.

[23] M. Ye, X. Li. and M.E. Orlowska, "Supervised dimensionality reduction on streaming data." In Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on, vol. 1, pp. 674-678. IEEE, 2007.

View Full Article: