Optimizing the performance of hadoop clusters through efficient cluster management techniques

  • Authors

    • K S. Shraddha Bollamma
    • S Manishankar
    • M V. Vishnu
    https://doi.org/10.14419/ijet.v7i2.31.13389

    Received date: May 28, 2018

    Accepted date: May 28, 2018

    Published date: May 29, 2018

  • Big data, hadoop, heterogeneous clusters, map reduce, yarn, zookeeper.
  • Abstract

    The necessity for processing the huge data has become a critical task in the age of Internet, even though data processing has evolved into a next generation level still data processing and information extraction has many problems to solve. With the increase in data size retrieving useful information with a given span of time is a herculean task. The most optimal solution that has been adopted is usage of distributed computing environment supporting data processing involving suitable model architecture with large complex structure. Although processing has achieved good amount of improvement, efficiency, energy utilization and accuracy has been compromised. The research aims to propose an efficient environment for data processing with optimized energy utilization and increased performance. Hadoop environment common and popular among big data processing platform has been chosen as base for enhancement. Creating a multi node Hadoop cluster architecture on top of which an efficient cluster monitor is setup and an algorithm to manage efficiency of the cluster is formulated. Cluster monitor is incorporated with Zoo keeper, Yarn (Node and resource manager). Zoo keeper does the monitoring of cluster nodes of the distributed system and identifies critical performance problems. Yarn plays a vital role in managing the resources efficiently and controlling the nodes with the help of hybrid scheduler algorithm. Thus this integrated platform helps in monitoring the distributed cluster as well as improving the performance of the overall Big Data processing. 

  • References

    1. Chalvantzis N, Konstantinou I & Kozyris N, “BBQ: Elastic MapReduce over cloud platforms”, Proceedings-17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID, (2017), pp. 766–771.
    2. Luo L, Wu W, Di D & Zhang F, “A resource scheduling algorithm of cloud computing based on energy efficient optimization methods”, IEEE Green Comput. Conf., (2012), pp. 1–6.
    3. Cheng D, Rao J, Guo Y, Jiang C & Zhou X, “Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning”, IEEE Trans. Parallel Distrib. Syst., Vol.28, No.3, (2017), pp.774–786.
    4. Huang W, Meng L, Zhang D & Zhang W, “In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model”, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., Vol.10, No.1, (2017), pp.3–19.
    5. Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A & Qin X, “Improving mapreduce performance through data placement in heterogeneous hadoop clusters”, IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), (2010), pp.1-9.
    6. Slagter K, Hsu CH, Chung YC & Zhang D, “An improved partitioning mechanism for optimizing massive data analysis using Map Reduce”, J. Supercomput., Vol.66, No.1, (2013), pp.539–555.
    7. Tan Y, Wang W, Wu Q & Lin J, “An Implementation of Heterogeneous Architecture Based MapReduce in the Clouds”, 2nd International Conference on Cloud Computing and Internet of Things (CCIOT), (2016), pp.16–20.
    8. Gautam JV, Prajapati HB, Dabhi VK & Chaudhary S, “A survey on job scheduling algorithms in Big data processing”, IEEE Int. Conf. Electr. Comput. Commun. Technol., (2015), pp.1–11.
    9. Barbieru C & Pop F, “Soft real-time hadoop scheduler for big data processing in smart cities”, Proc. - Int. Conf. Adv. Inf. Netw. Appl. AINA, (2016), pp. 863–870.
    10. Dean J & Ghemawat S, “MapReduce: Simplified Data Processing on Large Clusters”, Commun. ACM, Vol.51, No.1, (2008).
    11. Arasanal RM & Rumani DU, “Improving mapreduce performance through complexity and performance based data placement in heterogeneous hadoop clusters”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2013), pp. 115–125.
    12. Deepthy Ashok M, Subramanian K & Prabhu A, “Dynamic Slot Allocation for Map Reduce Clusters”, International Journal of Control Theory and Applications, (2017).
    13. Nandimath J, Banerjee E, Patil A, Kakade P, Vaidya S & Chaturvedi D, “Big data analysis using Apache Hadoop”, IEEE 14th International Conference on Information Reuse & Integration (IRI), (2013), pp.700–703.
    14. Das TK & Kumar PM, “BIG Data Analytics: A Framework for Unstructured Data Analysis”, Int. J. Eng. Sci. Technol., Vol.5, No.1, (2013), pp.153–156.
    15. Jain AK, Murty MN & Flynn PJ, “Data clustering: a review”, ACM Comput. Surv., Vol.31, No.3, (1999), pp.264–323.
    16. Ahmad A & Dey L, “A k-mean clustering algorithm for mixed numeric and categorical data”, Data Knowl. Eng., Vol.63, No.2, (2007), pp.503–527.
    17. Velmurugan T, “Evaluation of k-Medoids and Fuzzy C-Means clustering algorithms for clustering telecommunication data”, Int. Conf. Emerg. Trends Sci. Eng. Technol., (2012), pp.115–120.
    18. Kim M & Ramakrishna RS, “Projected clustering for categorical datasets”, Pattern Recognit. Lett., Vol.27, No.12, (2006), pp.1405–1417, 2006.
    19. Subramanian K & Prabhu A, “Simplified Data Analysis of Big Data in Map Reduce”, 2017.
    20. Gokuldev S, Rao A & Karthi, “An EMT scheduling approach with optimum load balancing in computational grid”, Int. J. Appl. Eng. Res., Vol.11, No.8, (2016), pp.5753–5757.
    21. Gokuldev S & Radhakrishnan R, “An adaptive job scheduling with efficient fault tolerance strategy in computational grid”, Int. J. Eng. Technol., Vol.6, No.4, (2014), pp.1793–1798.
  • Downloads

  • How to Cite

    S. Shraddha Bollamma, K., Manishankar, S., & V. Vishnu, M. (2018). Optimizing the performance of hadoop clusters through efficient cluster management techniques. International Journal of Engineering and Technology, 7(2.31), 19-22. https://doi.org/10.14419/ijet.v7i2.31.13389