Performance Evaluation and Resource Optimization of Parallel Hadoop Clusters with an Intelligent Scheduler


  • Manishankar sankar Bharathiar university
  • Sathayanarayana S Bharathiar university
  • Sathayanarayana S Bharathiar university





Data generated from real time information systems always has an incremental growth, varied representations available in the current industries big picture. Processing of data in large scale requires a parallel processing system like Hadoop cluster. Major challenge that arises in a cluster-based system is evaluating the performance of system and optimizing resources. The research carried out proposes a model for Hadoop cluster with a super node who manages the cluster and a mediation manager who does the performance monitoring. Super node in the system is equipped with intelligent scheduler that does the scheduling of the job with optimal resources. The intelligent scheduler works with cross mutation principle of genetic algorithm to find the best matching resource. The mediation node deploys ganglia monitor to collect and record the performance parameters of the Hadoop cluster. The system over all does the scheduling of different jobs with optimal usage of resources thus achieving better efficiency compared to the native scheduler in Hadoop.



[1] J. Eckroth, “Teaching Future Big Data Analysts : Curriculum and Experience Report,†2017.

[2] J. V Gautam, H. B. Prajapati, V. K. Dabhi, and S. Chaudhary, “A survey on job scheduling algorithms in Big data processing,†2015 IEEE Int. Conf. Electr. Comput. Commun. Technol., pp. 1–11, 2015.

[3] A. Sfrent and F. Pop, “Asymptotic scheduling for many task computing in Big Data platforms,†Inf. Sci. (Ny)., vol. 319, pp. 71–91, 2015.

[4] Q. Lu, S. Li, W. Zhang, and L. Zhang, “A genetic algorithm-based job scheduling model for big data analytics,†Eurasip J. Wirel. Commun. Netw., vol. 2016, no. 1, 2016.

[5] R. Kune, P. K. Konugurthi, A. Agarwal, R. R. Chillarige, and R. Buyya, “Genetic Algorithm Based Data-Aware Group Scheduling for Big Data Clouds,†in Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014, 2015, pp. 96–104.

[6] D. Cheng, J. Rao, C. Jiang, and X. Zhou, “Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters,†in Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015, 2015, pp. 956–965.

[7] D. Jiang, B. Ooi, L. Shi, and S. Wu, “Big Data Processing Using Hadoop: Survey on Scheduling,†Proc. VLDB Endow., vol. 3, no. 10, pp. 272–277, 2010.

[8] L. De Giovanni and F. Pezzella, “An Improved Genetic Algorithm for the Distributed and Flexible Job-shop Scheduling problem,†Eur. J. Oper. Res., vol. 200, no. 2, pp. 395–408, 2010.

[9] A. Rasooli and D. G. Down, “A hybrid scheduling approach for scalable heterogeneous hadoop systems,†in Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, 2012, pp. 1284–1291.

[10] S. Liu, J. Xu, Z. Liu, and X. Liu, “Evaluating task scheduling in hadoop-based cloud systems,†in Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013, 2013, pp. 47–53.

[11] A. Rasooli and D. G. Down, “Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity,†J. Grid Comput., vol. 12, no. 3, pp. 499–519, 2014.

[12] D. Ding, F. Dong, and J. Luo, “Multi-Q: Multiple Queries Optimization Based on MapReduce in Cloud,†Proc. - 2014 2nd Int. Conf. Adv. Cloud Big Data, CBD 2014, pp. 100–107, 2015.

[13] J. Zhu, J. Li, E. Hardesty, H. Jiang, and K. C. Li, “GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms,†in 2014 IEEE/ACIS 13th International Conference on Computer and Information Science, ICIS 2014 - Proceedings, 2014, pp. 321–326.

[14] J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, “Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing),†Proc. VLDB Endow., vol. 3, no. 1–2, pp. 515–529, 2010.

[15] Y. Zhang et al., “Parallel Processing Systems for Big Data: A Survey,†Proc. IEEE, vol. 104, no. 11, pp. 2114–2136, 2016.

[16] A. Alexandrov et al., “Massively Parallel Data Analysis with PACTs on Nephele,†Proc. 36th Int. Conf. Very Large Data Bases, pp. 1625–1628, 2010.

[17] B. Jena, M. K. Gourisaria, S. S. Rautaray, and M. Pandey, “Improvising Name Node Performance By Aggregator Aided HADOOP Framework,†pp. 382–388, 2016.

[18] X. Wu, “A MapReduce Optimization Method on Hadoop Cluster,†Proc. - 2015 Int. Conf. Ind. Informatics - Comput. Technol. Intell. Technol. Ind. Inf. Integr. ICIICII 2015, pp. 18–21, 2016.

[19] A. Vaccaro, L. Troiano, A. Vaccaro, and M. C. Vitelli, “On-line smart grids optimization by case-based reasoning on big data On-line Smart Grids Optimization by Case-Based Reasoning on Big Data,†no. September, 2016

[20] A. Ramaprasath, A. Srinivasan, and C.-H. Lung, “Performance optimization of big data in mobile networks,†2015 IEEE 28th Can. Conf. Electr. Comput. Eng., vol. 2015–June, no. June, pp. 1364–1368, 2015.

View Full Article: