Optimizing the performance of hadoop clusters through efficient cluster management techniques

  • Authors

    • K S. Shraddha Bollamma
    • S Manishankar
    • M V. Vishnu
    2018-05-29
    https://doi.org/10.14419/ijet.v7i2.31.13389
  • Big data, hadoop, heterogeneous clusters, map reduce, yarn, zookeeper.
  • The necessity for processing the huge data has become a critical task in the age of Internet, even though data processing has evolved into a next generation level still data processing and information extraction has many problems to solve. With the increase in data size retrieving useful information with a given span of time is a herculean task. The most optimal solution that has been adopted is usage of distributed computing environment supporting data processing involving suitable model architecture with large complex structure. Although processing has achieved good amount of improvement, efficiency, energy utilization and accuracy has been compromised. The research aims to propose an efficient environment for data processing with optimized energy utilization and increased performance. Hadoop environment common and popular among big data processing platform has been chosen as base for enhancement. Creating a multi node Hadoop cluster architecture on top of which an efficient cluster monitor is setup and an algorithm to manage efficiency of the cluster is formulated. Cluster monitor is incorporated with Zoo keeper, Yarn (Node and resource manager). Zoo keeper does the monitoring of cluster nodes of the distributed system and identifies critical performance problems. Yarn plays a vital role in managing the resources efficiently and controlling the nodes with the help of hybrid scheduler algorithm. Thus this integrated platform helps in monitoring the distributed cluster as well as improving the performance of the overall Big Data processing. 

     

     

  • References

    1. [1] Chalvantzis N, Konstantinou I & Kozyris N, “BBQ: Elastic MapReduce over cloud platformsâ€, Proceedings-17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID, (2017), pp. 766–771.

      [2] Luo L, Wu W, Di D & Zhang F, “A resource scheduling algorithm of cloud computing based on energy efficient optimization methodsâ€, IEEE Green Comput. Conf., (2012), pp. 1–6.

      [3] Cheng D, Rao J, Guo Y, Jiang C & Zhou X, “Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuningâ€, IEEE Trans. Parallel Distrib. Syst., Vol.28, No.3, (2017), pp.774–786.

      [4] Huang W, Meng L, Zhang D & Zhang W, “In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Modelâ€, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., Vol.10, No.1, (2017), pp.3–19.

      [5] Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A & Qin X, “Improving mapreduce performance through data placement in heterogeneous hadoop clustersâ€, IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), (2010), pp.1-9.

      [6] Slagter K, Hsu CH, Chung YC & Zhang D, “An improved partitioning mechanism for optimizing massive data analysis using Map Reduceâ€, J. Supercomput., Vol.66, No.1, (2013), pp.539–555.

      [7] Tan Y, Wang W, Wu Q & Lin J, “An Implementation of Heterogeneous Architecture Based MapReduce in the Cloudsâ€, 2nd International Conference on Cloud Computing and Internet of Things (CCIOT), (2016), pp.16–20.

      [8] Gautam JV, Prajapati HB, Dabhi VK & Chaudhary S, “A survey on job scheduling algorithms in Big data processingâ€, IEEE Int. Conf. Electr. Comput. Commun. Technol., (2015), pp.1–11.

      [9] Barbieru C & Pop F, “Soft real-time hadoop scheduler for big data processing in smart citiesâ€, Proc. - Int. Conf. Adv. Inf. Netw. Appl. AINA, (2016), pp. 863–870.

      [10] Dean J & Ghemawat S, “MapReduce: Simplified Data Processing on Large Clustersâ€, Commun. ACM, Vol.51, No.1, (2008).

      [11] Arasanal RM & Rumani DU, “Improving mapreduce performance through complexity and performance based data placement in heterogeneous hadoop clustersâ€, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2013), pp. 115–125.

      [12] Deepthy Ashok M, Subramanian K & Prabhu A, “Dynamic Slot Allocation for Map Reduce Clustersâ€, International Journal of Control Theory and Applications, (2017).

      [13] Nandimath J, Banerjee E, Patil A, Kakade P, Vaidya S & Chaturvedi D, “Big data analysis using Apache Hadoopâ€, IEEE 14th International Conference on Information Reuse & Integration (IRI), (2013), pp.700–703.

      [14] Das TK & Kumar PM, “BIG Data Analytics: A Framework for Unstructured Data Analysisâ€, Int. J. Eng. Sci. Technol., Vol.5, No.1, (2013), pp.153–156.

      [15] Jain AK, Murty MN & Flynn PJ, “Data clustering: a reviewâ€, ACM Comput. Surv., Vol.31, No.3, (1999), pp.264–323.

      [16] Ahmad A & Dey L, “A k-mean clustering algorithm for mixed numeric and categorical dataâ€, Data Knowl. Eng., Vol.63, No.2, (2007), pp.503–527.

      [17] Velmurugan T, “Evaluation of k-Medoids and Fuzzy C-Means clustering algorithms for clustering telecommunication dataâ€, Int. Conf. Emerg. Trends Sci. Eng. Technol., (2012), pp.115–120.

      [18] Kim M & Ramakrishna RS, “Projected clustering for categorical datasetsâ€, Pattern Recognit. Lett., Vol.27, No.12, (2006), pp.1405–1417, 2006.

      [19] Subramanian K & Prabhu A, “Simplified Data Analysis of Big Data in Map Reduceâ€, 2017.

      [20] Gokuldev S, Rao A & Karthi, “An EMT scheduling approach with optimum load balancing in computational gridâ€, Int. J. Appl. Eng. Res., Vol.11, No.8, (2016), pp.5753–5757.

      [21] Gokuldev S & Radhakrishnan R, “An adaptive job scheduling with efficient fault tolerance strategy in computational gridâ€, Int. J. Eng. Technol., Vol.6, No.4, (2014), pp.1793–1798.

  • Downloads

  • How to Cite

    S. Shraddha Bollamma, K., Manishankar, S., & V. Vishnu, M. (2018). Optimizing the performance of hadoop clusters through efficient cluster management techniques. International Journal of Engineering & Technology, 7(2.31), 19-22. https://doi.org/10.14419/ijet.v7i2.31.13389