A Framework For Effective Processing Of Jobs In Hadoop

  • Authors

    • Amarendra Mohanty
    • Dr. P.Ranjana
    2018-12-09
    https://doi.org/10.14419/ijet.v7i4.36.23776
  • Hadoop, Daemons, Oozie, CPU, Cluster, Capacity
  •  The main challenges in oozie based scheduling is high computing, high CPU usage and resource intensive. This leads to resource contention in production because it was not load balanced optimally.

    The objective of the proposed New Sql Server built Java based (NSSJ) Scheduling is to overcome some of the current challenges in the existing oozie based scheduling. It stores all the inventory information on SQL Server environment. SQL Server is preferred over Hbase, because at any given point of time, there were multiple threads hitting same inventory table to ensure transaction level processing. One can run or kill or put on hold any number of deamons or jobs at any point of time. This gives complete flexibility to the end user to load balance based on the number of jobs. It has auto restart feature when a task or job fails. It will try to attempt for one re-run, if it fails second time, it will put the job in abandoned state.

    Thus the proposed NSSJ scheduling load balances the resource optimally during production.

     

     

  • References

    1. [1] R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan. (2014) “Hadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clustersâ€, Journal of Parallel and Distributed Computing, vol. 7, n. 03, pp. 2166-2179.

      [2] Yuansong Qiao, Xueyuan Wang, Guiming Fang, Brian Lee. (2016) “Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininetâ€, IEEE Symposium on

      [3] Computers and Communication. Pp 784-790.

      [4] Rammohan, N., Baburaj, E. (2014) “Genetic Clustering with Workload Multi-task Scheduler in Cloud Environmentâ€, International Journal on Communications Antenna and Propagation, pp. 77-86.

      [5] C. Vorapongkitipun and N. Nupairoj. (2014) “Improving performance of small-ï¬le accessing in Hadoopâ€, 11th International Joint Conference on Computer Science and Software Engineering, pp. 200-205.

      [6] M. Ishii, J. Han, and H. Makino. (2013) “Design and performance evaluation for hadoop clusters on virtualized environmentâ€, International Conference on Information Networking (ICOIN), pp. 244-249.

      [7] J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Poly-zotis. (2011) “Array-based query processing in Hadoopâ€, International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-11.

      [8] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors. (2010) “Improving mapreduce performance through data placement in heterogeneous hadoop clustersâ€, IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, pp. 1-9.

      [9] Zaharia, M. (2009) “Job scheduling for multi-user MapReduce clustersâ€, EECS Department, University of California, Berkeley, Vol. 55, pp. 1-16.

      [10] Guo S. (2013). “Hadoop Operations and Cluster Managementâ€, Packt Publishing.

      [11] J. Dean and S. Ghemawat. (2008) “MapReduce: Simplified data processing on large clustersâ€, Communication of the ACM, vol. 51, pp. 107-113.

      [12] Tan YS, Tan J, Chng ES. (2011) “Hadoop framework: impact of data organization on performanceâ€. Wiley Online Library, 43: 1241-1260.

      [13] Lee SW, Yu F. (2014) “Securing KVM-based cloud systems via virtualization introspectionâ€, 47th Hawaii International Conference on System Sciences, pp. 5028-5037.

  • Downloads

  • How to Cite

    Mohanty, A., & P.Ranjana, D. (2018). A Framework For Effective Processing Of Jobs In Hadoop. International Journal of Engineering & Technology, 7(4.36), 200-203. https://doi.org/10.14419/ijet.v7i4.36.23776