Performance Evaluation and Resource Optimization of Parallel Hadoop Clusters with an Intelligent Scheduler
Data generated from real time information systems always has an incremental growth, varied representations available in the current industries big picture. Processing of data in large scale requires a parallel processing system like Hadoop cluster. Major challenge that arises in a cluster-based system is evaluating the performance of system and optimizing resources. The research carried out proposes a model for Hadoop cluster with a super node who manages the cluster and a mediation manager who does the performance monitoring. Super node in the system is equipped with intelligent scheduler that does the scheduling of the job with optimal resources. The intelligent scheduler works with cross mutation principle of genetic algorithm to find the best matching resource. The mediation node deploys ganglia monitor to collect and record the performance parameters of the Hadoop cluster. The system over all does the scheduling of different jobs with optimal usage of resources thus achieving better efficiency compared to the native scheduler in Hadoop.
 J. Eckroth, â€œTeaching Future Big Data Analysts : Curriculum and Experience Report,â€ 2017.
 J. V Gautam, H. B. Prajapati, V. K. Dabhi, and S. Chaudhary, â€œA survey on job scheduling algorithms in Big data processing,â€ 2015 IEEE Int. Conf. Electr. Comput. Commun. Technol., pp. 1â€“11, 2015.
 A. Sfrent and F. Pop, â€œAsymptotic scheduling for many task computing in Big Data platforms,â€ Inf. Sci. (Ny)., vol. 319, pp. 71â€“91, 2015.
 Q. Lu, S. Li, W. Zhang, and L. Zhang, â€œA genetic algorithm-based job scheduling model for big data analytics,â€ Eurasip J. Wirel. Commun. Netw., vol. 2016, no. 1, 2016.
 R. Kune, P. K. Konugurthi, A. Agarwal, R. R. Chillarige, and R. Buyya, â€œGenetic Algorithm Based Data-Aware Group Scheduling for Big Data Clouds,â€ in Proceedings - 2014 International Symposium on Big Data Computing, BDC 2014, 2015, pp. 96â€“104.
 D. Cheng, J. Rao, C. Jiang, and X. Zhou, â€œResource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters,â€ in Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015, 2015, pp. 956â€“965.
 D. Jiang, B. Ooi, L. Shi, and S. Wu, â€œBig Data Processing Using Hadoop: Survey on Scheduling,â€ Proc. VLDB Endow., vol. 3, no. 10, pp. 272â€“277, 2010.
 L. De Giovanni and F. Pezzella, â€œAn Improved Genetic Algorithm for the Distributed and Flexible Job-shop Scheduling problem,â€ Eur. J. Oper. Res., vol. 200, no. 2, pp. 395â€“408, 2010.
 A. Rasooli and D. G. Down, â€œA hybrid scheduling approach for scalable heterogeneous hadoop systems,â€ in Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, 2012, pp. 1284â€“1291.
 S. Liu, J. Xu, Z. Liu, and X. Liu, â€œEvaluating task scheduling in hadoop-based cloud systems,â€ in Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013, 2013, pp. 47â€“53.
 A. Rasooli and D. G. Down, â€œGuidelines for Selecting Hadoop Schedulers Based on System Heterogeneity,â€ J. Grid Comput., vol. 12, no. 3, pp. 499â€“519, 2014.
 D. Ding, F. Dong, and J. Luo, â€œMulti-Q: Multiple Queries Optimization Based on MapReduce in Cloud,â€ Proc. - 2014 2nd Int. Conf. Adv. Cloud Big Data, CBD 2014, pp. 100â€“107, 2015.
 J. Zhu, J. Li, E. Hardesty, H. Jiang, and K. C. Li, â€œGPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms,â€ in 2014 IEEE/ACIS 13th International Conference on Computer and Information Science, ICIS 2014 - Proceedings, 2014, pp. 321â€“326.
 J. Dittrich, J.-A. QuianÃ©-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad, â€œHadoop++: Making a yellow elephant run like a cheetah (without it even noticing),â€ Proc. VLDB Endow., vol. 3, no. 1â€“2, pp. 515â€“529, 2010.
 Y. Zhang et al., â€œParallel Processing Systems for Big Data: A Survey,â€ Proc. IEEE, vol. 104, no. 11, pp. 2114â€“2136, 2016.
 A. Alexandrov et al., â€œMassively Parallel Data Analysis with PACTs on Nephele,â€ Proc. 36th Int. Conf. Very Large Data Bases, pp. 1625â€“1628, 2010.
 B. Jena, M. K. Gourisaria, S. S. Rautaray, and M. Pandey, â€œImprovising Name Node Performance By Aggregator Aided HADOOP Framework,â€ pp. 382â€“388, 2016.
 X. Wu, â€œA MapReduce Optimization Method on Hadoop Cluster,â€ Proc. - 2015 Int. Conf. Ind. Informatics - Comput. Technol. Intell. Technol. Ind. Inf. Integr. ICIICII 2015, pp. 18â€“21, 2016.
 A. Vaccaro, L. Troiano, A. Vaccaro, and M. C. Vitelli, â€œOn-line smart grids optimization by case-based reasoning on big data On-line Smart Grids Optimization by Case-Based Reasoning on Big Data,â€ no. September, 2016