Study With Comparing Big-Data Handling Techniques using Apache Hadoop Map Reduce VS Apache Spark

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Current digital world face trouble with massive information, again it made a demand for latest and advanced software frameworks for efficiently processing present world large data. Because digital world information is double rapidly, generally but existing and traditional tools for Big Data (BD) are becoming insufficient since enormous data processing towards to distributed, parallel, and group (Batch). Main essential thing is to evaluate tools and technologies, one important thing must follow the understanding of what to evaluate for. Even growing multiple options the intention of choosing Big Data functions for the digital world will be difficult. In the existing tools had merits, disadvantages and lack of many limitations but many had an overlapping custom. This survey looks at the major attention on BD the basic area is associated with analytics tools. In the current digital world (DW), exactly every computation perform on online as interactive processing also introduce apache free access tool to overcome restrictions and issues in Hadoop by Apache open Spark.


  • Keywords

    Big Data; Hadoop; Map Reduce; Spark .

  • References

      [1] BingbingRao, liqiang Wang,, “A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing”,3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress, (IEEE), (2017), pp.81-88.

      [2] S Agarwal, S Kandula, N Bruno, M C Wu, I Stoica, J Zhou,” Re-optimizing data-parallel computing”, In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, (2012),

      [3] J Ahrens, B Hendrickson, G Long, S Miller, R Ross, D Williams,” Data-intensive science in the us doe: case studies and future challenges”, Computing in Science & Engineering,(2011).

      [4] A Alexandrov, R Bergmann, S Ewen, J C Freytag, F Hueske, A Heise, O Kao, M Leich, U Leser, V Markl,” The stratosphere platform for big data analytics”, The VLDB Journal, (2014).

      [5] M Armbrust, R S Xin, C Lian, Y Huai, D Liu, J K Bradley, X Meng, T Kaftan, M J Franklin, A Ghodsi,” Spark sql: Relational data processing in spark”, In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, (2015), pp.1383-1394.

      [6] Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, TawfiqHasanin, “A survey of open source tools for machine learning with big data in the Hadoop ecosystem”, Journal of Big Data, (2015),pp.1-36.

      [7] Jai PrakashVerma, Bankim Patel, Atul Patel, “Big Data Analysis: Recommendation System with Hadoop Framework”, IEEE International Conference on Computational Intelligence & Communication Technology, (2015), pp.92-96.

      [8] YavuzCanbay, serefsagiroglu,” Big data anonymization with spark”, Diego García Gil, Sergio RamírezGallego, Salvador García, Francisco Herrera,” A comparison on scalability for batch big data processing on Apache Spark and Apache Flink”, Big Data Analytics, (2017), pp.1-12.

      [9] Amir Bahmani, Alexander B Sibley, Mahmoud Parsian, KourosOwzar, Frank Mueller,”SparkScore: Leveraging Apache Spark for Distributed Genomic Inference”, International Parallel and Distributed Processing Symposium Workshops (IPDPSW)Chicago, IL, USA, IEEE, (2016), pp.435-442.

      [10] Jian Fu, Junwei Sun, Kaiyuan Wang,”SPARK—A Big Data Processing Platform for Machine Learning”,International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration, IEEE, ( 2016), pp.48-51.

      [11] AsmelashTekaHadgu, Aastha Nigam, Ernesto Diaz Aviles,” Large-scale learning with AdaGrad on Spark”, 2015 IEEE International Conference on, Santa Clara CA, IEEE, (2015), pp. 2828-2830.

      [12] Hang Tao, Bin Wu, Xiuqin Lin, Budgeted mini-batch parallel gradient descent for support vector machines on Spark, In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, (2014), pp. 945-950.

      [13] SauptikDhar, Congrui Yi, Naveen Ramakrishnan, Mohak Shah,ADMM based Scalable Machine Learning on Spark, in Big Data(Big Data), 2015 IEEE International Conference on, Santa Clara CA, (2015), pp. 1174-1182.

      [14] Zhijie Han, Yujie Zhang, “A Big Data Processing Platform Based on Memory Computing, in Parallel Architectures, Algorithms and Programming (PAAP)”, 2015 Seventh International Symposium on,Nanjing, (2015), pp. 172-176.

      [15] E.Dede, B.Sendir, P.Kuzlu, J Weachock, M Govindaraju, L Ramakrishnan, “Processing Cassandra Datasets with Hadoop -Streaming Based Approaches”, IEEE Transactions on Services Computing, (2015), pp. 46-58.

      [16] N.Deshai, G.P.S.Varma, S.V.Ramana, “A study on analytical framework to breakdown conditions among data quality measurements” in International Journal of Engineering & Technology, Vol 7(1.1), pp: 167-172, 2018.

      [17] N.Deshai, S.Venkataramana, I.Hemalatha, G.P.S.Varma, “A Study on Big Data Hadoop Map Reduce Job Scheduling”, International Journal of Engineering & Technology, Vol 7(3.31), pp: 59-65, 2017.

      [18] N.Deshai, P. Swamy, G.P.S.Varma, “Big Data Challenges and Analytics Processing Over health Prescriptions”, Jouonal of Advance Research in Dynamical & Control Systems, 15-Special Issue Vol 7(3.31), pp: 650-657, Oct’2017.




Article ID: 15997
DOI: 10.14419/ijet.v7i4.1.15997

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.