A vibrant data placement approach for map reduce in  diverse environments

J Sujatha; K Meena

doi:10.14419/ijet.v7i2.4.10034

Authors

J Sujatha
K Meena

Received date: March 10, 2018

Accepted date: March 10, 2018

Published date: March 10, 2018

DOI:

https://doi.org/10.14419/ijet.v7i2.4.10034

Keywords:

Map Reduce, HDFS, Dynamic Data Placement (DDP), File Systems, Data Nodes.

Abstract

Map reduce assumes that the computing capacity is same for each node in a cluster. Each node is assigned to the same load in homogeneous environment, hence it fully use the resources in the cluster. In such a cluster, there is likely to be various speciï¬cations of PCs or servers, which causes the abilities of the nodes to differ. If such a heterogeneous environment still uses the original Hadoop strategy that distributes data blocks into each node equally and the load is also evenly distributed to each node, then the overall performance of Hadoop may be reduced. The majorreasonis thatdifferentcomputing capacitiesbetweennodes causethetask executiontimeto differ so thatthefasterexecutionrate nodes processinglocal data blocks faster than other slower nodes do.The required data should be transferredfrom another node through the network.Becausewaitingforthedatatransmissiontimeincreasesthetask executiontime,it causestheentirejobexecution timeto becomeprolonged.

References

[1] AmazonElasticMapReduce, http:// aws. amazon. Com /elasticmapreduce.
[2] Apache, http://httpd.apache.org/.
[3] Hadoop, http://hadoop.apache.org/.
[4] Hadoop Distributed File System, http:// hadoop. apache.org/ docs/stable/hdfs_design.html.
[5] HadoopMapReduce,http://hadoop.apache .org /docs /stable /mapred_tutorial. html.
[6] HadoopYahoo, http: // www. ithome. com.tw /itadm/article.php
[7] D.Borthakur, K.Muthukkaruppan, K.Ranganathan, S.Rash, J.-S. Sarma, N.Spiegelberg, D.Molkov, R.Schmidt, J.Gray,H.Kuang,A.Menon,A. Aiyer, Apache Hadoop goes realtime at Facebook, in: SIGMOD â€™11, Athens, Greece, June 12â€“16, 2011. https://doi.org/10.1145/1989323.1989438.
[8] F. Chang, J. Dean, S. Ghemawat, W.-C. Hsieh, D.A. Wallach, M. Burrows, T. Chan- dra, A. Fiker, R.E. Gruber, BigTable: a distributed storage system for structured data, in: 7th USENIX Symposium on Operating Systems Design and Implemen- tation, OSDIâ€™06, 2006, pp. 205â€“218.
[9] Q. Chen, D. Zhang, M. Guo, Q. Deng, S. Guo, SAMR: a self-adaptive MapRe- duce scheduling algorithm in heterogeneous environment, in: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), IEEE, 2010, pp. 2736â€“2743.
[10] J. Dean, S. Ghemawat, MapReduce: simpliï¬ed data processing on large clusters, in: OSDI â€™04, Dec. 2004, pp. 137â€“150.
[11] S. Ghemawat, H. Gobioff, S.-T. Leung, â€œThe Google ï¬le system, in: Proc. SOSP 2003, pp. 29â€“43. https://doi.org/10.1145/945445.945450.
[12] B. He, W. Fang, Q. Luo, N. Govindaraju, T. Wang, Mars: MapReduce framework on graphics processors, in: ACM 2008, 2008, pp. 260â€“269. https://doi.org/10.1145/1454115.1454152.
[13] G. Lee, B.G. Chun, R.H.Katz, Heterogeneity-aware resource allocation and scheduling in the cloud, in: Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, vol.11, 2011.
[14] C. Tian, H. Zhou, Y. He, L. Zha, A dynamic MapReduceschedulerforheteroge- neous workloads, in: Eighth International Conference on Grid and Cooperative Computing, GCCâ€™09, IEEE, 2009.
[15] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, I. Stoica, Improving MapReduce performance in heterogeneous environments, in: Proc. OSDI, San Diego, CA, De- cember 2008, pp. 29â€“42.

A vibrant data placement approach for map reduce in diverse environments

Authors

J Sujatha

K Meena

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite