A Framework For Effective Processing Of Jobs In Hadoop

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

     The main challenges in oozie based scheduling is high computing, high CPU usage and resource intensive. This leads to resource contention in production because it was not load balanced optimally.

    The objective of the proposed New Sql Server built Java based (NSSJ) Scheduling is to overcome some of the current challenges in the existing oozie based scheduling. It stores all the inventory information on SQL Server environment. SQL Server is preferred over Hbase, because at any given point of time, there were multiple threads hitting same inventory table to ensure transaction level processing. One can run or kill or put on hold any number of deamons or jobs at any point of time. This gives complete flexibility to the end user to load balance based on the number of jobs. It has auto restart feature when a task or job fails. It will try to attempt for one re-run, if it fails second time, it will put the job in abandoned state.

    Thus the proposed NSSJ scheduling load balances the resource optimally during production.



  • Keywords

    Hadoop, Daemons, Oozie, CPU, Cluster, Capacity

  • References

      [1] R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan. (2014) “Hadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters”, Journal of Parallel and Distributed Computing, vol. 7, n. 03, pp. 2166-2179.

      [2] Yuansong Qiao, Xueyuan Wang, Guiming Fang, Brian Lee. (2016) “Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininet”, IEEE Symposium on

      [3] Computers and Communication. Pp 784-790.

      [4] Rammohan, N., Baburaj, E. (2014) “Genetic Clustering with Workload Multi-task Scheduler in Cloud Environment”, International Journal on Communications Antenna and Propagation, pp. 77-86.

      [5] C. Vorapongkitipun and N. Nupairoj. (2014) “Improving performance of small-file accessing in Hadoop”, 11th International Joint Conference on Computer Science and Software Engineering, pp. 200-205.

      [6] M. Ishii, J. Han, and H. Makino. (2013) “Design and performance evaluation for hadoop clusters on virtualized environment”, International Conference on Information Networking (ICOIN), pp. 244-249.

      [7] J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Poly-zotis. (2011) “Array-based query processing in Hadoop”, International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-11.

      [8] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors. (2010) “Improving mapreduce performance through data placement in heterogeneous hadoop clusters”, IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, pp. 1-9.

      [9] Zaharia, M. (2009) “Job scheduling for multi-user MapReduce clusters”, EECS Department, University of California, Berkeley, Vol. 55, pp. 1-16.

      [10] Guo S. (2013). “Hadoop Operations and Cluster Management”, Packt Publishing.

      [11] J. Dean and S. Ghemawat. (2008) “MapReduce: Simplified data processing on large clusters”, Communication of the ACM, vol. 51, pp. 107-113.

      [12] Tan YS, Tan J, Chng ES. (2011) “Hadoop framework: impact of data organization on performance”. Wiley Online Library, 43: 1241-1260.

      [13] Lee SW, Yu F. (2014) “Securing KVM-based cloud systems via virtualization introspection”, 47th Hawaii International Conference on System Sciences, pp. 5028-5037.




Article ID: 23776
DOI: 10.14419/ijet.v7i4.36.23776

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.