Preprocessing data in big data analytics (a survey)

  • Authors

    • J S. T. M. Poovarasi
    • Sujatha Srinivasan
    2018-06-08
    https://doi.org/10.14419/ijet.v7i2.33.15484
  • Big Data, Big Data Analysis, Big Data Preprocessing, Preprocessing Technique
  • Big Data is a large amount of data with various types of data. Big data analytics (BDA) is to uncover a hidden pattern of data, its unknown correlations, observe the market and customer needs and also other information which helps organizations to make right decisions on busi-ness. With today's technology, big data is analyzed and immediately solutions are generated. The effort taken does not provide effective business solutions. It makes the process slow and gives the less efficient intelligent solution which does not provide what the business mar-ket needs. With existing Big Data analytics (BDA) techniques and tools it is not possible to process quickly and efficiently since these are incapable of dealing with mega amount of data. The knowledge and pattern from Big data analytics (BDA) are sometimes inaccurate. Thus the data requires some preprocessing like aggregation, sampling, normalization, in order to save data in the memory locations of a computer and analyze it. This study gives a short review of the preprocessing techniques for Big Data.

     

     


     
  • References

    1. [1] Big Data Analytics. 2016. J. Traub and V. Markl. https://dbs.uni-leipzig.de/file/itit-2016-0024-editorial.pdf

      [2] M. Junghanns, A. Petermann, K. Gomez, E. Rahm: - Scalable Graph Data Management and Analytics with Hadoop. Univ. of Leipzig, 2015

      [3] Rosenthal F., Volk P.B., Hahmann M., Habich D., Lehner W. (2009) Drift-Aware Ensemble Regression. In: Perner P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009.

      [4] 2016 - Felix Gessert, Steffen Friedrich, Norbert Ritter: Univ. of Hamburg, W.Wingerat et al., Real-time streaming analytics for Big Data.

      [5] “Big Dynamic Data†by S. Hagedorn and colleagues from TU. Ilmenau, 2016.

      [6] Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools. Marwan Hassani ; Pascal Spaus ; Alfredo Cuzzocrea ; Thomas Seidl. Publication Year: 2016.

      [7] Demirkan, H., Delen,D.: putting analytics and big data in cloud, 2013.

      [8] García S, Luengo J,Herrera F. Data Preprocessing in Data Mining. Berlin: Springer; 2015.

      [9] D. pyle Meta Driven Preprocesing: Albeit data preprocessing technique 1999, http://www.drng.org/pmml-v2-0.htm.

      [10] S. Chintapalliet al., “Benchmarking streaming computation engines: Storm, Flink and Spark streaming,†in Proc. of the first IEEE Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM’16), Chicago, USA, 2016.

      [11] Park S., Lee Y. (2013) Secure Hadoop with Encrypted HDFS. In: Park J.J...H., Arabnia H.R., Kim C., Shi W., Gil JM.

      [12] Hadoop imbalanced preprocessing: Map Reduce implementations of random oversampling, random undersampling and ‘‘Synthetic Minority Oversampling TEchnique’’ (SMOTE) algorithms using Hadoop – hulet and s Del Rio – 2014.

      [13] S.Rio, López, J.M.Benitez,d F.Herrera, On the use of MapReduce for Imbalanced Big Data using Random Forest. Information Sciences 285 (2014)

      [14] https://github.com/triguero/ROSEFW-RF Jun 9, 2015 - Hadoop 2.5. Ant. Associated paper:

      [15] I. Triguero, S. Río, V. López, J. Bacardit, J.M. Benítez, F. Herrera. ROSEFW-RF Big data preprocessing methods and prospects.

  • Downloads

  • How to Cite

    S. T. M. Poovarasi, J., & Srinivasan, S. (2018). Preprocessing data in big data analytics (a survey). International Journal of Engineering & Technology, 7(2.33), 726-729. https://doi.org/10.14419/ijet.v7i2.33.15484