A vibrant data placement approach for map reduce in diverse environments

  • Authors

    • J Sujatha
    • K Meena
    2018-03-10
    https://doi.org/10.14419/ijet.v7i2.4.10034
  • Map Reduce, HDFS, Dynamic Data Placement (DDP), File Systems, Data Nodes.
  • Map reduce assumes that the computing capacity is same for each node in a cluster. Each node is assigned to the same load in homogeneous environment, hence it fully use the resources in the cluster. In such a cluster, there is likely to be various speciï¬cations of PCs or servers, which causes the abilities of the nodes to differ. If such a heterogeneous environment still uses the original Hadoop strategy that distributes data blocks into each node equally and the load is also evenly distributed to each node, then the overall performance of Hadoop may be reduced. The majorreasonis thatdifferentcomputing capacitiesbetweennodes causethetask executiontimeto differ so thatthefasterexecutionrate nodes processinglocal data blocks faster than other slower nodes do.The required data should be transferredfrom another node through the network.Becausewaitingforthedatatransmissiontimeincreasesthetask executiontime,it causestheentirejobexecution timeto becomeprolonged.

  • References

    1. [1] AmazonElasticMapReduce, http:// aws. amazon. Com /elasticmapreduce.

      [2] Apache, http://httpd.apache.org/.

      [3] Hadoop, http://hadoop.apache.org/.

      [4] Hadoop Distributed File System, http:// hadoop. apache.org/ docs/stable/hdfs_design.html.

      [5] HadoopMapReduce,http://hadoop.apache .org /docs /stable /mapred_tutorial. html.

      [6] HadoopYahoo, http: // www. ithome. com.tw /itadm/article.php

      [7] D.Borthakur, K.Muthukkaruppan, K.Ranganathan, S.Rash, J.-S. Sarma, N.Spiegelberg, D.Molkov, R.Schmidt, J.Gray,H.Kuang,A.Menon,A. Aiyer, Apache Hadoop goes realtime at Facebook, in: SIGMOD ’11, Athens, Greece, June 12–16, 2011. https://doi.org/10.1145/1989323.1989438.

      [8] F. Chang, J. Dean, S. Ghemawat, W.-C. Hsieh, D.A. Wallach, M. Burrows, T. Chan- dra, A. Fiker, R.E. Gruber, BigTable: a distributed storage system for structured data, in: 7th USENIX Symposium on Operating Systems Design and Implemen- tation, OSDI’06, 2006, pp. 205–218.

      [9] Q. Chen, D. Zhang, M. Guo, Q. Deng, S. Guo, SAMR: a self-adaptive MapRe- duce scheduling algorithm in heterogeneous environment, in: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), IEEE, 2010, pp. 2736–2743.

      [10] J. Dean, S. Ghemawat, MapReduce: simpliï¬ed data processing on large clusters, in: OSDI ’04, Dec. 2004, pp. 137–150.

      [11] S. Ghemawat, H. Gobioff, S.-T. Leung, “The Google ï¬le system, in: Proc. SOSP 2003, pp. 29–43. https://doi.org/10.1145/945445.945450.

      [12] B. He, W. Fang, Q. Luo, N. Govindaraju, T. Wang, Mars: MapReduce framework on graphics processors, in: ACM 2008, 2008, pp. 260–269. https://doi.org/10.1145/1454115.1454152.

      [13] G. Lee, B.G. Chun, R.H.Katz, Heterogeneity-aware resource allocation and scheduling in the cloud, in: Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, vol.11, 2011.

      [14] C. Tian, H. Zhou, Y. He, L. Zha, A dynamic MapReduceschedulerforheteroge- neous workloads, in: Eighth International Conference on Grid and Cooperative Computing, GCC’09, IEEE, 2009.

      [15] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, Improving MapReduce performance in heterogeneous environments, in: Proc. OSDI, San Diego, CA, De- cember 2008, pp. 29–42.

  • Downloads

  • How to Cite

    Sujatha, J., & Meena, K. (2018). A vibrant data placement approach for map reduce in diverse environments. International Journal of Engineering & Technology, 7(2.4), 20-22. https://doi.org/10.14419/ijet.v7i2.4.10034