Extensive analysis of techniques in data streams

Ramesh Balasubramaniam; K. Nandhini

doi:10.14419/ijet.v7i4.14224

Authors

Ramesh Balasubramaniam
Bharathiar University, Coimbatore
K. Nandhini
Bharathiar University, Coimbatore

Received date: June 19, 2018

Accepted date: January 28, 2019

Published date: April 12, 2019

DOI:

https://doi.org/10.14419/ijet.v7i4.14224

Keywords:

Hashing, Sampling, Sketching, Stream Data Model, Streaming Techniques.

Abstract

Applications are generating huge capacities (volume) of data at high speeds (velocity) from various sources such as images, text, audio, and video (variety). Big data streams are generated by many applications in todayâ€™s world like IoT devices, online purchases, internet traffic, social media, stock exchanges and more. The data source decides whether the processing should be by batch or stream. It is impossible and unnecessary to record all incoming data, hence the need for data reduction techniques in data streaming. These techniques (sampling, sketching, hashing, dimension reduction, and more) enable us to narrow down the big data to relevant data. This data sampled, filtered, hashed, or processed through other techniques is used as input for data analysts to derive meaningful information. A well-designed Data Stream Management System will strike a balance between the right data processing and the cost of processing. This paper highlights the different techniques used in streaming data, related work in that area and the uses in todayâ€™s world.
Â
Â

References

[1] C.C. Aggarwal, "Data Streams and Algorithms", Kluwer Academic Publishers, Boston, 2007.
[2] B. Babcock, S. Babu, M. Datar, R. Motwani and D. Thomas, "Operator scheduling in data stream systems." The VLDB Journal the International Journal on Very Large Data Bases, Vol. 13, Issue. 4, pp.333-353, 2004. https://doi.org/10.1007/s00778-004-0132-6.
[3] C. BÃ¶hm, "Similarity search and data mining: Database techniques supporting next decade's applications".
[4] V. Braverman, R. Ostrovsky, and C. Zaniolo, "Optimal sampling from sliding windows." Journal of Computer and System Sciences, Vol. 78, Issue. 1, pp. 260-272, 2012. https://doi.org/10.1016/j.jcss.2011.04.004.
[5] G. Cormode, "Sketch techniques for approximate query processing." Foundations and Trends in Databases. NOW publishers, 2011.
[6] G. Cugola, and A. Margara, "Processing flows of information: From data stream to complex event processing." ACM Computing Surveys (CSUR), Vol. 44, Issue. 3, Article No. 15, June 2012. https://doi.org/10.1145/2187671.2187677.
[7] L. Golab, "Sliding window query processing, over data streams.", University of Waterloo, 2006.
[8] P.J. Haas, "Data-stream sampling: basic techniques and results." In Data Stream Management, pp. 13-44. Springer, Berlin, Heidelberg, 2016. https://doi.org/10.1007/978-3-540-28608-0_2.
[9] G. Hebrail, "Data stream management and mining." Mining massive data sets for security, pp.89-102, 2008.
[10] A.K. Jain, R. Jones and P. Joshi, "Survey of Cryptographic Hashing Algorithms for Message Signing." IJCST, Vol. 8, Issue. 2, 2017.
[11] MS. Kavitha and S. Takmare, "Review of Existing Methods in K-means Clustering Algorithm.", Vol. 4, Issue. 2, 2017.
[12] D. Lee, A. Alric, R. Dustin, and K. Ryan, "A streaming clustering approach using a heterogeneous system for big data analysis." In Computer-Aided Design (ICCAD), 2017 IEEE/ACM International Conference on, pp. 699-706. IEEE, 2017. https://doi.org/10.1109/ICCAD.2017.8203845.
[13] J. Leskovec, A. Rajaraman, and J.D. Ullman, â€œMining of massive datasetsâ€, Cambridge university press, chap. 3, 2014. https://doi.org/10.1017/CBO9781139924801.
[14] B.N. Miller, D.L. Bradley, â€œProblem Solving with Algorithms and Data Structures Using Pythonâ€ SECOND EDITION. Franklin, Beedle & Associates Inc., 2011.
[15] E. Panigati, F.A. Schreiber, and C. Zaniolo. "Data streams and data stream management systems and languages." In Data Management in Pervasive Systems, pp. 93-111. Springer, Cham, 2015. https://doi.org/10.1007/978-3-319-20062-0_5.
[16] B. Ramesh, R. Nandhini, â€œClustering Algorithms â€“ A Literature Reviewâ€, International Journal of Computer Sciences and Engineering, vol. 5, Issue 10, 2017. https://doi.org/10.26438/ijcse/v5i10.302306.
[17] J.A.R. Rojas, M.B. Kery, S. Rosenthal, A. Dey, "Sampling techniques to improve big data exploration." In 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), pp. 26-35. IEEE, 2017. https://doi.org/10.1109/LDAV.2017.8231848.
[18] I. Rozenbaum, â€œFiltering techniques for data streamsâ€. Rutgers The State University of New Jersey-New Brunswick, 2007.
[19] F. Rusu, and A. Dobra, "Sketching sampled data streams." In Data Engineering, 2009. ICDE'09. IEEE 25th International Conference on, pp. 381-392. IEEE, 2009. https://doi.org/10.1109/ICDE.2009.31.
[20] C. Tozzi, â€œBig Data 101: Dummyâ€™s Guide to Batch vs. Streaming Dataâ€ syncsort blog, July 25, 2017
[21] M.H. ur Rehman, C.W. Liew, A. Abbas, P.P. Jayaraman, T.Y. Wah, and S.U. Khan, "Big data reduction methods: a survey." Data Science and Engineering 1, no. 4, pp.265-284, 2016.
[22] J.S. Vitter, "Random sampling with a reservoir." ACM Transactions on Mathematical Software (TOMS) 11, no. 1, pp. 37-57, 1985. https://doi.org/10.1145/3147.3165.
[23] M. Ye, X. Li. and M.E. Orlowska, "Supervised dimensionality reduction on streaming data." In Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on, vol. 1, pp. 674-678. IEEE, 2007.

Extensive analysis of techniques in data streams

Authors

Ramesh Balasubramaniam

K. Nandhini

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite