A Review: Map Reduce Framework for Cloud Computing

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

     



  • Keywords


    Data Mining, Cloud, Map Reduce Framework, HDFS (Hadoop Distributed File System), Parallel Programming, Distributed Databases

  • References


      K. Chen and WM. Zheng, “Cloud computing: System instances andcurrent research,” Journal of Software, vol. 20, no. 5, pp. 1337-1348,2009 (In Chinese).

      [2] K. Sharma, G. Shrivastava, and 0V. Kumar, “Web Mining: Today andTomorrow,” In Proceedings of the IEEE 3rd International Conference onElectronics Computer Technology, Athens, vol. 1, pp. 399–403, April2011.

      [3] highlyscalable.wordpress.com/2012/02/01/MapReduce-patterns

      [4] “Pincer-Search Algorithm for Discovering Maximum FrequentSet” – AkashSaxena, NITJ

      [5] “Pincer-Search: An Efficient Algorithm for Discovering theMaximum Frequent Set” – Dao-I Lin, Zvi M. Kedem, 1999

      [6] “Study of Data Mining algorithm in cloud computing usingMapReduce Framework” – Viki Patel, Prof. V. B. Nikam,V.J.T.I, Mumbai, 2013

      [7] H. Cheng, P. Tan, S. Jon , and W. F. Punch, “Recommendation viaQuery Centered Random Walk on K-partite Graph,” In Proceedings ofthe IEEE International Conference on Data Mining, Omaha, pp. 457–462, October 2007.

      [8] A. Javed and A. Khokhar, “Frequent pattern mining on message passingmultiprocessor systems,” Distributed and Parallel Databases, vol. 16, pp.321-334, 2004.

      [9] C. Giannella, K. Liu, T. Olsen, and H. Kargupta, “Communication efficient construction of decision trees over heterogeneously distributeddata,” In Proceedings of the Fourth IEEE International Conference onData Mining, pp. 67-74, 2004.

      [10] R. Chen, S. Krishnamoorthy, “A New Algorithm for LearningParameters of a Bayesian Network from Distributed Data,” InProceedings of the 2002 IEEE International Conference on Data Mining,Maebashi City, pp. 585–588, 2002.

      [11] E. Lozano, E. Acuna, “Parallel Algorithms for Distance- based andDensity-based Outliers,” In Proceedings of The Fifth IEEE InternationalConference on Data Mining, Houston, pp. 27-30, November 2005.

      [12] A Topchy, A K Jain, W F Punch, “Combining Multiple WeakClusterings,” In Proceedings of the 3rd IEEE International Conferenceon Data Mining, pp. 331-338, 2003.

      [13] G. Chen, X. Wu, X. Zhu, “Sequential pattern mining in multiplestreams,” In Proceedings of the 30th International Conference on Datamining.Houston, pp. 585-588, 2005.

      [14] M. Cheng, “Web data mining Based on cloud computing,” ComputerScience, vol. 38, no. 10A, pp. 146-149, 2011 (In Chinese).

      [15] WZ. Zhao, HF. Ma, YL, “Fu. Research on Parallel k-means AlgorithmDesign Based on Hadoop Platform,” Computer Science, vol. 38, no.10pp. 166-168, 2011 (In Chinese).


 

View

Download

Article ID: 20224
 
DOI: 10.14419/ijet.v7i4.6.20224




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.