Short Text Mining: Machine Learning and Statistical Modelling Approaches Compared

  • Authors

    • Omar H. Al-Barahamtoshy
    2019-03-01
    https://doi.org/10.14419/ijet.v8i1.11.28086
  • Tweets, Topic Categorization, Short Text Mining, Natural Language Processing, Online Learning, Knowledge Transfer
  • With the growth of technology, social media has gained popularity and now plays a key role in modern day to day communication. Given such trend, social media has gained increasing influence on our society to the extent it has become a part of our language to say I am going to “Tweet†about some thought. Like any community driven content, people find complex means to interact with each other. Twitter offers people the ability to tag their tweets with hashtags to specify the topic of tweet’s content. However, like any community driven convention, there exists many tweets which do not have hashtags. In this paper, we seek to explore the methods in the literature that can categorize tweets without hashtags. We have evaluated one method, which proved to be very promising due to its flexibility and extensibility to many applications. We also discuss future enhancement and extension possibilities, and provide a critique of the current method’s drawbacks.

     

  • References

    1. [1] P. Koniusz, F. Yan, P. Gosselin, and K. Mikolajczyk, (2017). Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 2, Feb. 2017.

      [2] N. M. Ali, S. W. Jun, M. Karis, M. Ghazaly, M Aras, (2016). Object Classification and Recognition using Bag-of-Words (BoW) Model, 2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia.

      [3] J. Albadarneh, B. Talafha, M. Al-Ayyoub, B. Zaqaibeh, M. Al-Smadi, Y. Jararweh and E. Benkhelifa, (2015). Using Big Data Analytics For Authorship Authentication of Arabic Tweets, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing.

      [4] S. Katsumata, E. Motohashi, A. Nishimoto, E. Toyosawa, (2016). Website Classification Using Latent Dirichlet Allocation and its Application for Internet Advertising, 2016 IEEE 16th International Conference on Data Mining Workshops.

      [5] Y. Chen, and S. Li, (2016). Using Latent Dirichlet Allocation to Improve Text Classification Performance of Support Vector Machine, 2016 IEEE Congress on Evolutionary Computation (CEC).

      [6] Ramos-Soto, M. Lama, B. Vazquez-Barreiros, A. Bugarin, M. Mucientes, S. Barro , (2015). Towards Textual Reporting in Learning Analytics Dashboards, 2015 IEEE 15th International Conference on Advanced Learning Technologies.

      [7] R. Kilany, R. Ammar, S. Rajasekaran, (2016). A Correlation-Based Algorithm for Classifying Technical Articles, 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 50-53.

      [8] S. Yuan, X. Wu and Y. Xiang , (2016). Incorporating Pre-Training in Long Short-Term Memory Networks for Tweets Classification, 2016 IEEE 16th International Conference on Data Mining, pp. 1329- 1334.

      [9] Önal , A. Ertugrul, (2014). Effect of Using Regression in Sentiment Analysis, 2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014), pp. 1822-1825.

      [10] Y. Li, Y. Zhang, C. Wang, H. Xie, G. Chen, and X. Gao, (2011). Bag-of Features Based Medical Image Retrieval via Multiple Assignment and Visual Words Weighting, IEEE Transactions on Medical Imaging ,Vol. 30, No. 11, Nov. 2011, pp. 1996 – 2011.

      [11] H. Bosch, D. Thom, F. Heimerl, E. Puttmann, S. Koch, R. Kruger, M. Worner, and T. Ertl, (2013). ScatterBlogs2: Real-Time Monitoring of Microblog Messages through User-Guided Filtering, IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No. 12, pp. 2022- 2031.

      [12] Schmitt, D. Zellhofer, (2012). Condition Learning from User Preferences, 2012 Sixth International Conference on Research Challenges in Information Science (RCIS), pp. 1 – 11.

      [13] Rafea, N. A. Mostafa, (2013). Topic Extraction in Social Media, 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 94 – 98.

      [14] Shoukry, A. Rafea , (2012). Sentence-Level Arabic Sentiment Analysis, 2012 International Conference on Collaboration Technologies and Systems (CTS), pp. 546 – 550.

      [15] M. Alshawabkeh, J. A. Aslam, J.r Dy and D. Kaeli, (2011). Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems, 2011 10th International Conference on Machine Learning and Applications, pp. 145-150.

  • Downloads

  • How to Cite

    H. Al-Barahamtoshy, O. (2019). Short Text Mining: Machine Learning and Statistical Modelling Approaches Compared. International Journal of Engineering & Technology, 8(1.11), 33-39. https://doi.org/10.14419/ijet.v8i1.11.28086