A Framework For Effective Processing Of Jobs In Hadoop

Amarendra Mohanty; Dr. P.Ranjana

doi:10.14419/ijet.v7i4.36.23776

Authors

Amarendra Mohanty
Dr. P.Ranjana

Received date: December 12, 2018

Accepted date: December 12, 2018

Published date: December 9, 2018

DOI:

https://doi.org/10.14419/ijet.v7i4.36.23776

Keywords:

Hadoop, Daemons, Oozie, CPU, Cluster, Capacity

Abstract

Â The main challenges in oozie based scheduling is high computing, high CPU usage and resource intensive. This leads to resource contention in production because it was not load balanced optimally.
The objective of the proposed New Sql Server built Java based (NSSJ) Scheduling is to overcome some of the current challenges in the existing oozie based scheduling. It stores all the inventory information on SQL Server environment. SQL Server is preferred over Hbase, because at any given point of time, there were multiple threads hitting same inventory table to ensure transaction level processing. One can run or kill or put on hold any number of deamons or jobs at any point of time. This gives complete flexibility to the end user to load balance based on the number of jobs. It has auto restart feature when a task or job fails. It will try to attempt for one re-run, if it fails second time, it will put the job in abandoned state.
Thus the proposed NSSJ scheduling load balances the resource optimally during production.
Â
Â

References

[1] R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan. (2014) â€œHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clustersâ€, Journal of Parallel and Distributed Computing, vol. 7, n. 03, pp. 2166-2179.
[2] Yuansong Qiao, Xueyuan Wang, Guiming Fang, Brian Lee. (2016) â€œDoopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininetâ€, IEEE Symposium on
[3] Computers and Communication. Pp 784-790.
[4] Rammohan, N., Baburaj, E. (2014) â€œGenetic Clustering with Workload Multi-task Scheduler in Cloud Environmentâ€, International Journal on Communications Antenna and Propagation, pp. 77-86.
[5] C. Vorapongkitipun and N. Nupairoj. (2014) â€œImproving performance of small-ï¬le accessing in Hadoopâ€, 11th International Joint Conference on Computer Science and Software Engineering, pp. 200-205.
[6] M. Ishii, J. Han, and H. Makino. (2013) â€œDesign and performance evaluation for hadoop clusters on virtualized environmentâ€, International Conference on Information Networking (ICOIN), pp. 244-249.
[7] J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Poly-zotis. (2011) â€œArray-based query processing in Hadoopâ€, International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-11.
[8] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors. (2010) â€œImproving mapreduce performance through data placement in heterogeneous hadoop clustersâ€, IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, pp. 1-9.
[9] Zaharia, M. (2009) â€œJob scheduling for multi-user MapReduce clustersâ€, EECS Department, University of California, Berkeley, Vol. 55, pp. 1-16.
[10] Guo S. (2013). â€œHadoop Operations and Cluster Managementâ€, Packt Publishing.
[11] J. Dean and S. Ghemawat. (2008) â€œMapReduce: Simplified data processing on large clustersâ€, Communication of the ACM, vol. 51, pp. 107-113.
[12] Tan YS, Tan J, Chng ES. (2011) â€œHadoop framework: impact of data organization on performanceâ€. Wiley Online Library, 43: 1241-1260.
[13] Lee SW, Yu F. (2014) â€œSecuring KVM-based cloud systems via virtualization introspectionâ€, 47th Hawaii International Conference on System Sciences, pp. 5028-5037.

A Framework For Effective Processing Of Jobs In Hadoop

Authors

Amarendra Mohanty

Dr. P.Ranjana

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite