Simplified Mapreduce Mechanism for Large Scale Data Processing

Md Tahsir Ahmed Munna; Shaikh Muhammad Allayear; Mirza Mohtashim Alam; Sheikh Shah Mohammad Motiur Rahman; Md Samadur Rahman; M Mesbahuddin Sarker

doi:10.14419/ijet.v7i3.8.15211

Authors and Affiliations

Md Tahsir Ahmed Munna
Shaikh Muhammad Allayear
Mirza Mohtashim Alam
Sheikh Shah Mohammad Motiur Rahman
Md Samadur Rahman
M Mesbahuddin Sarker

About this article

DOI:

https://doi.org/10.14419/ijet.v7i3.8.15211

Received:

06-07-2018

Accepted:

06-07-2018

Published:

07-07-2018

Views:

307

Downloads:

21

Download PDF

Keywords:

MapReduce, Large Scale Data, Hadoop, Simplified Algorithm, Performance Analysis

Abstract

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

References

Welcome to Apache™ Hadoop®! (n.d.). Retrieved August 28, 2017, from http://hadoop.apache.org/

Shaikh Muhammad Allayer, Md. Salahuddin, Faishal Ahmed and Sung Soon Park: Introducing iSCSI Protocol on Online Based MapReduce Mechanism. ICCSA 2014: Computational Science and Its Applications – ICCSA 2014 pp 691-706.

S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapre-duce. In SIGMOD, pages 975–986, 2010.

F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce environment. TKDE, 23(9):1282–1298, 2011.

Y. Lin, D. Agrawal, C. Chen, B. C. Ooi, and S. Wu. Llama: leverag-ing columnar storage for scalable join processing in the mapreduce framework. In SIGMOD, pages 961–972, 2011.

View more references (26)

A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In SIGMOD, pages 949–960, 2011.

X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-joinprocessing using mapreduce. PVLDB, 5(11):1184–1195, 2012.

R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using mapreduce. In SIGMOD, pages 495– 506, 2010.

A. Metwally and C. Faloutsos. V-smart-join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors. PVLDB, 5(8):704–715, 2012.

F. N. Afrati, A. D. Sarma, D. Menestrina, A. G. Parameswaran, and J. D. Ullman. Fuzzy joins using mapreduce. In ICDE, pages 498–509,2012.

W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using mapreduce. PVLDB, 5(10):1016–1027, 2012.

S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607–614, 2011.

C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In SIGKDD, pages 837–846, 2009.

G. D. F. Morales, A. Gionis, and M. Sozio. Social content matching in mapreduce. PVLDB, 4(7):460–469, 2011.

B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. PVLDB, 5(5):454– 465, 2012.

H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938– 948, 2010.S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solv-ing graph problems in mapreduce. In SPAA, pages 85–94, 2011.

A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personali-zation: scalable online collaborative filtering. In WWW,pages 271–280, 2007.

R. L. F. Cordeiro, C. T. Jr., A. J. M. Traina, J. Lopez, U. Kang, and C. Faloutsos. Clustering very large multi-dimensional datasets with mapreduce. In SIGKDD, pages 690–698, 2011.

A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In SIGKDD, pages 681–689, 2011.

B. Panda, J. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2):1426– 1437, 2009.

A. Ghoting, P. Kambadur, E. P. D. Pednault, and R. Kannan. Nim-ble: a toolkit for the implementation of parallel data mining and ma-chine learning algorithms on mapreduce. In SIGKDD, pages 334–342, 2011.

N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggre-gation for large mapreduce jobs. PVLDB, 4(11):1135–1145, 2011.

N. Laptev, K. Zeng, and C. Zaniolo. Early accurate results for ad-vanced analytics on mapreduce. PVLDB, 5(10):1028– 1039, 2012.

R. Grover and M. J. Carey. Extending map-reduce for efficient pred-icate-based sampling. In ICDE, pages 486– 497, 2012.

S. Chen. Cheetah: A high performance, custom data warehouse on top of mapreduce. PVLDB, 3(2):1459–1468, 2010.

F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231–240, 2010.

G. Wang, M. A. V. Salles, B. Sowell, X. Wang, T. Cao, A. J. Demers, J. Gehrke, and W. M. White. Behavioral simulations in mapreduce. PVLDB, 3(1):952–963, 2010.

B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalized page rank on mapreduce. In SIGMOD, pages 973–984, 2011.

J. Jestes, F. Li, and K. Yi. Building wavelet histograms on large data in mapreduce. In PVLDB, pages 617–620, 2012.

Abhishek Verma, Nicolas Zea, Brian Cho, Indranil Gupta, Roy H. Campbell: Breaking the MapReduce Stage Barrier.

Google Code Archive - Long-term storage for Google Code Project Hosting. (n.d.). Retrieved August 28, 2017, from http://code.google.com/p/hop

How to Cite

Tahsir Ahmed Munna, M., Muhammad Allayear, S., Mohtashim Alam, M., Shah Mohammad Motiur Rahman, S., Samadur Rahman, M., & Mesbahuddin Sarker, M. (2018). Simplified Mapreduce Mechanism for Large Scale Data Processing. International Journal of Engineering and Technology, 7(3.8), 16-21. https://doi.org/10.14419/ijet.v7i3.8.15211

Download Citation