Technical challenges and perspectives in batch and stream big data machine learning

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Machine Learning is playing a predominant role across various domains. However traditional Machine Learning algorithms are becoming unsuitable for majority of applications as the data is acquiring new characteristics. Sensors, devices, servers, Internet, Social Networking, Smart phones and Internet of Things are contributing the major sources of data. Hence there is a paradigm shift in the Machine learning with the advent of Big Data. Research works are in evolution to deal with Big Data Batch and stream real time data. In this paper, we highlighted several research works that contributed towards Big Data Machine Learning.

  • Keywords


    Big Data, Knowledge Discovery, Machine learning, Batch, Stream

  • References


      [1] http://www.cs.cmu.edu/~tom/

      [2] I. Witten, E. Frank, and M. Hall. Data Mining:Practical Machine Learning Tools and Techniques.Morgan Kaufmann, San Mateo, CA, 3rd edition, 2011.

      [3] Domingos, Pedro. "A few useful things to know about Machine Learning."Communications of the ACM 55.10 (2012): 78-87.

      [4] D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, February 6, 2001

      [5] Begoli, Edmon, and James Horey. "Design principles for effective knowledge discovery from big data." Software Architecture (WICSA) and European Conference on Software Architecture (ECSA), 2012 Joint Working IEEE/IFIP Conference on. IEEE, 2012.

      [6] Wu, Xindong, et al. "Data mining with big data." Knowledge and Data Engineering, IEEE Transactions on 26.1 (2014): 97-107.

      [7] Wu, Xindong, et al. "Online feature selection with streaming features."Pattern Analysis and Machine Intelligence, IEEE Transactions on 35(5) (2013): 1178-1192.

      [8] Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012, August). Online feature selection for mining big data. In Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications (pp. 93-100). ACM.

      [9] Kraska, T., Talwalkar, A., Duchi, J. C., Griffith, R., Franklin, M. J., & Jordan, M. I. (2013). MLbase: A Distributed Machine-learning System. In CIDR.

      [10] Lin, J., &Kolcz, A. (2012, May). Large-scale machine learning at twitter. InProceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 793-804). ACM.

      [11] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.

      [12] A. Gates, O. Natkovich, S. Chopra, P. Kamath,S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience.VLDB, 2009.

      [13] Suthaharan, S. (2014). Big data classification: Problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Performance Evaluation Review, 41(4), 70-73.

      [14] Tu, W., & Sun, S. (2012, August). Cross-domain representation-learning framework with combination of class-separate and domain-merge objectives. In Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining (pp. 18-25). ACM.

      [15] Kang, U., &Faloutsos, C. (2013). Big graph mining: algorithms and discoveries. ACM SIGKDD Explorations Newsletter, 14(2), 29-36.

      [16] Sun, Y.,& Han, J.(2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20-28.

      [17] Lin, J., & Ryaboy, D. (2013). Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explorations Newsletter, 14(2), 6-19.

      [18] Gama, J. (2010). Knowledge discovery from data streams. CRC Press.

      [19] Bifet, A. (2013). Mining big data in real time. Informatica, 37(1).

      [20] Bifet, A., Holmes, G., Kirkby, R., &Pfahringer, B. (2010). Moa: Massive online analysis. The Journal of Machine Learning Research, 11, 1601-1604.

      [21] De Francisci Morales, G. (2013, May). SAMOA: A platform for mining big data streams. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 777-778). International World Wide Web Conferences Steering Committee.

      [22] Amatriain, X. (2013). Mining large streams of user data for personalized recommendations. ACM SIGKDD Explorations Newsletter, 14(2), 37-48.


 

View

Download

Article ID: 9225
 
DOI: 10.14419/ijet.v7i1.3.9225




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.