An Efficient Makespan Model for Hybrid Dual Parallel Computing Framework

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    MapReduce (MR) is the most widely adopted and used computing platform for processing complex scientific and data intensive application. Hadoop MapReduce (HMR) is widely used MR framework across various organization due to its open source nature. Cloud service provider (CSP) such Azure HDInsight offers computing resources to its user and only pays for their use. MapReduce framework currently been used are not efficient due to sequential computing of Map and Reduce phase. As a result, incurs higher computing cost and exhibit underutilization of cloud resources. Minimizing cost of execution on such platform is most desired. To overcome research challenges, this work firstly present Hybrid Dual Parallel Computing (HDPC) framework. HDPC offers parallel computation of Map and Reduce phase. To further enhance resource utilization parallel execution of map and reduce operation is carried out considering multi-core environments available with virtual computing workers. Lastly, this work presented job makespan/execution model and working structure of HDPC framework. Experiment are conducted on Microsoft Azure HDInsight cloud platform considering stream and non-stream application to evaluate performance of HDPC framework over existing computing model. The outcome shows significant performance improvement in terms of execution time. Overall good correlation is seen among practical execution and theoretical execution outcome shows proposed HDPC framework is robust, scalable, cost efficient and support dynamic analysis on cloud computing environment.




  • Keywords

    Big data, Bioinformatics, Cloud computing, GPU, Hadoop, Linear regression, MapReduce, Multi-core, Parallel computing.

  • References

      [1] K. Taura, T. Endo, K. Kaneda, and A. Yonezawa, “Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources,” in SIGPLAN Not., vol. 38, no. 10, pp. 216–229, 2003.

      [2] B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang, “Mars: a MapReduce framework on graphics processors,” in Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT ’08, p. 260, 2008.

      [3] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” ACM SIGOPS Oper. Syst. Rev., vol. 41, no. 3, pp. 59–72, Mar. 2007.

      [4] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,“Spark: Cluster Computing with Working Sets,” in Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, (Boston,MA), June 2010.

      [5] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” ACM Commun., vol. 51, no. 1, pp. 107–113, Jan. 2008.

      [6] “Apache Hadoop.” [Online]. Available: [Accessed: 21-july-2018].

      [7] U. Kang, C. E. Tsourakakis, and C. Faloutsos, “PEGASUS: Mining Peta-scale Graphs,” Knowl. Inf. Syst., vol. 27, no. 2, pp. 303–325, May 2011.

      [8] X. Shi et al., "Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications," in IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 8, pp. 2300-2315, Aug. 1 2015.

      [9] J. Zhu, J. Li, E. Hardesty, H. Jiang and K. C. Li, "GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms," Computer and Information Science (ICIS), 2014 IEEE/ACIS 13th International Conference on, Taiyuan, pp. 321-326, 2014.

      [10] M. Zaharia, A. Konwinski, A.D. Joseph, R.H. Katz and I. Stoica, &ldquo,Improving Mapreduce Performance in Heterogeneous Environments,&rdquo, Proc. Eighth USENIX Conf. Operating Systems Design and Implementation (OSDI), pp. 29-42, 2008.

      [11] D. Dahiphale et al., "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications," in IEEE Transactions on Network and Service Management, vol. 11, no. 1, pp. 101-115, March 2014.

      [12] E. Deelman, G. Singh, M. Livny, B. Berriman and J. Good, "The cost of doing science on the cloud: The Montage example," 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, pp. 1-12, 2008.

      [13] N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, and C. Krintz, “See spot run: using spot instances for mapreduce workflows,” in Proc. 2010 USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’10. USENIX Association, pp. 7–7, 2010.

      [14] X. Lin, Z. Meng, C. Xu, and M. Wang, “A Practical Performance Model for Hadoop MapReduce,” in Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on, pp. 231–239, 2012.

      [15] X. Cui, X. Lin, C. Hu, R. Zhang, and C. Wang, “Modeling the Performance of MapReduce under Resource Contentions and Task Failures,” in Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, vol. 1, pp. 158–163, 2013.

      [16] M. Khan, Y. Liu and M. Li, "Data locality in Hadoop cluster systems," 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Xiamen, pp. 720-724, 2014.

      [17] M. Xu, S. Alamro, T. Lan and S. Subramaniam, "CRED: Cloud Right-Sizing with Execution Deadlines and Data Locality," in IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 12, pp. 3389-3400, 2017.

      [18] H. Alshammari, J. Lee and H. Bajwa, "H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs," in IEEE Transactions on Cloud Computing, vol. PP, no. 99, pp. 1-1, 2016.

      [19] Daria Glushkova, Petar Jovanovic, Alberto Abelló, “MapReduce Performance Models for Hadoop 2.x”, in Workshop Proceedings of the EDBT/ICDT 2017 Joint Conference, ISSN 1613-0073, 2017.

      [20] M. Ehsan, K. Chandrasekaran, Y. Chen and R. Sion, "Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop," in IEEE Transactions on Cloud Computing, vol. PP, no. 99, pp. 1-1, 2017.

      [21] M. Khan, Y. Jin, M. Li, Y. Xiang and C. Jiang, "Hadoop Performance Modeling for Job Estimation and Resource Provisioning," in IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 441-454, 2016.

      [22] Z. Zhang, L. Cherkasova and B. T. Loo, "Optimizing cost and performance trade-offs for MapReduce job processing in the cloud," 2014 IEEE Network Operations and Management Symposium (NOMS), Krakow, 2014, pp. 1-8.

      [23] K. Chen, J. Powers, S. Guo and F. Tian, "CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds," in IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 6, pp. 1403-1412, June 2014.

      [24] T. White, Hadoop: The Definitive Guide. O’Reilly Media, 2009.

      [25] Kajdanowicz, T.; Indyk, W.; Kazienko, P.; Kukul, J., "Comparison of the Efficiency of MapReduce and Bulk Synchronous Parallel Approaches to Large Network Processing," Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on , vol., no., pp.218,225, 10-10 Dec. 2012.

      [26] Changjian Wang; Yuxing Peng; Mingxing Tang; Dongsheng Li; Shanshan Li; Pengfei You, "MapCheckReduce: An Improved MapReduce Computing Model for Imprecise Applications," Big Data (BigData Congress), 2014 IEEE International Congress on , vol., no., pp.366,373, June 27 2014-July 2 2014.

      [27] S. Dooms, T. De Pessemier, and L. Martens, “Movietweetings: a movie rating dataset collected from twitter,” in Workshop on Crowdsourcing and Human Computation for Recommender Systems, CrowdRec at RecSys, vol. 13, 2013.

      [28] Khan, M., Huang, Z., Li, M., Taylor, GA., – Optimizing Hadoop parameter settings with gene expression programming guided PSO. Concurrency Computation: Practice and Experience, DOI: 10.1002/cpe.3786, 2016.

      [29] Saccharomyces genome database (SGD). (2015). [Online] Available:

      [30] Influenza Virus Resource. (2015). [Online] Available:

      [31] Amazon product dataset “”, last accessed sep 2, 2018.

      [32] R. M. Esteves and C. Rong, "Using Mahout for Clustering Wikipedia's Latest Articles: A Comparison between K-means and Fuzzy C-means in the Cloud," 2011 IEEE Third International Conference on Cloud Computing Technology and Science, Athens, pp. 565-569, 2011.




Article ID: 27963
DOI: 10.14419/ijet.v7i4.19.27963

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.