Simplified Mapreduce Mechanism for Large Scale Data Processing


  • Md Tahsir Ahmed Munna
  • Shaikh Muhammad Allayear
  • Mirza Mohtashim Alam
  • Sheikh Shah Mohammad Motiur Rahman
  • Md Samadur Rahman
  • M Mesbahuddin Sarker





MapReduce, Large Scale Data, Hadoop, Simplified Algorithm, Performance Analysis


MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.




[1] Welcome to Apache™ Hadoop®! (n.d.). Retrieved August 28, 2017, from

[2] Shaikh Muhammad Allayer, Md. Salahuddin, Faishal Ahmed and Sung Soon Park: Introducing iSCSI Protocol on Online Based MapReduce Mechanism. ICCSA 2014: Computational Science and Its Applications – ICCSA 2014 pp 691-706.

[3] S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In SIGMOD, pages 975–986, 2010.

[4] F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce environment. TKDE, 23(9):1282–1298, 2011.

[5] Y. Lin, D. Agrawal, C. Chen, B. C. Ooi, and S. Wu. Llama: leveraging columnar storage for scalable join processing in the mapreduce framework. In SIGMOD, pages 961–972, 2011.

[6] A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In SIGMOD, pages 949–960, 2011.

[7] X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-joinprocessing using mapreduce. PVLDB, 5(11):1184–1195, 2012.

[8] R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using mapreduce. In SIGMOD, pages 495– 506, 2010.

[9] A. Metwally and C. Faloutsos. V-smart-join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors. PVLDB, 5(8):704–715, 2012.

[10] F. N. Afrati, A. D. Sarma, D. Menestrina, A. G. Parameswaran, and J. D. Ullman. Fuzzy joins using mapreduce. In ICDE, pages 498–509,2012.

[11] W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using mapreduce. PVLDB, 5(10):1016–1027, 2012.

[12] S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607–614, 2011.

[13] C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In SIGKDD, pages 837–846, 2009.

[14] G. D. F. Morales, A. Gionis, and M. Sozio. Social content matching in mapreduce. PVLDB, 4(7):460–469, 2011.

[15] B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. PVLDB, 5(5):454– 465, 2012.

[16] H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938– 948, 2010.S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85–94, 2011.

[17] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW,pages 271–280, 2007.

[18] R. L. F. Cordeiro, C. T. Jr., A. J. M. Traina, J. Lopez, U. Kang, and C. Faloutsos. Clustering very large multi-dimensional datasets with mapreduce. In SIGKDD, pages 690–698, 2011.

[19] A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In SIGKDD, pages 681–689, 2011.

[20] B. Panda, J. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2):1426– 1437, 2009.

[21] A. Ghoting, P. Kambadur, E. P. D. Pednault, and R. Kannan. Nimble: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In SIGKDD, pages 334–342, 2011.

[22] N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. PVLDB, 4(11):1135–1145, 2011.

[23] N. Laptev, K. Zeng, and C. Zaniolo. Early accurate results for advanced analytics on mapreduce. PVLDB, 5(10):1028– 1039, 2012.

[24] R. Grover and M. J. Carey. Extending map-reduce for efficient predicate-based sampling. In ICDE, pages 486– 497, 2012.

[25] S. Chen. Cheetah: A high performance, custom data warehouse on top of mapreduce. PVLDB, 3(2):1459–1468, 2010.

[26] F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231–240, 2010.

[27] G. Wang, M. A. V. Salles, B. Sowell, X. Wang, T. Cao, A. J. Demers, J. Gehrke, and W. M. White. Behavioral simulations in mapreduce. PVLDB, 3(1):952–963, 2010.

[28] B. Bahmani, K. Chakrabarti, and D. Xin. Fast personalized page rank on mapreduce. In SIGMOD, pages 973–984, 2011.

[29] J. Jestes, F. Li, and K. Yi. Building wavelet histograms on large data in mapreduce. In PVLDB, pages 617–620, 2012.

[30] Abhishek Verma, Nicolas Zea, Brian Cho, Indranil Gupta, Roy H. Campbell: Breaking the MapReduce Stage Barrier.

[31] Google Code Archive - Long-term storage for Google Code Project Hosting. (n.d.). Retrieved August 28, 2017, from

View Full Article: