Combining rule-based and bag-of-words for phase-level sentiment analysis of blog comments

  • Authors

    • Frederick F. Patacsil Pangasinan State University-Urdaneta City Campus
    2018-09-05
    https://doi.org/10.14419/ijet.v7i2.27.13552
  • Blogs, Sentiment Analysis, Machine Learning, Satisfaction, Automatic Polarity Classifier.
  • Blogs are one of the platforms that express personal opinions, which are intended to create awareness and used as an instrument to establish trust among customers products and services or about a specific topic. A new classifying model was experimented to improve the sentiment classification of the blog comments. This technique combined Rule-Based (RB) and Bag-of-Word (BoW) model to solve major weaknesses of this Bow model in the conduct of Sentiment Analysis (SA) evaluations. The proposed technique was experimented to esti-mate the Philippine Internet customers’ satisfaction related to the quality of the services provided by the ISPs in the Philippines. In addition, automatic word seeding, building of sentiment dictionary utilizing an online dictionary, n-gram, tokenization, stemming and other SA tech-niques were applied to extract useful information from the blog comment dataset. The results of the research showed that the configurations involving BoW-RB + SVM + bi-gram + Porter stemmer achieved a high classification accuracy of 88%.

    Capturing the contextual meaning of the neighboring words of a given sentimental word provides a significant help to increase the classifi-cation performance of the proposed classifier method.

     

     

  • References

    1. [1] Blood, R., 2000. Weblogs: A history and perspective. Rebecca’s pocket, 7(9), p.2000.

      [2] Zhou, L. and Hovy, E.H., 2006, March. On the Summarization of Dynamically Introduced Information: Online Discussions and Blogs. In AAAI Spring symposium: Computational approaches to analyzing weblogs (p. 237).

      [3] Marlow, C., 2004, May. Audience, structure and authority in the weblog community. In International Communication Association Conference (Vol. 27).

      [4] Winer, D., 2003. What makes a weblog a weblog? Weblogs at Harvard Law, 23, p.2003.

      [5] Hu, M., Sun, A. and Lim, E.P., 2007, November. Comments-oriented blog summarization by sentence extraction. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 901-904). ACM. https://doi.org/10.1145/1321440.1321571.

      [6] Zheng, R., Li, J., Chen, H. and Huang, Z., 2006. A framework for authorship identification of online messages: Writingâ€style features and classification techniques. Journal of the Association for Information Science and Technology, 57(3), pp.378-393. https://doi.org/10.1002/asi.20316.

      [7] Joshi, N.S. and Itkat, S.A., 2014. A survey on feature level sentiment analysis. International Journal of Computer Science and Information Technologies, 5(4), pp.5422-5425.

      [8] Turnkey, P.D., 2002, July. Thumbs up or thumbs down semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.

      [9] Rice, D.R. and Zorn, C., 2013. Corpus-based dictionaries for sentiment analysis of specialized vocabularies. Proceedings of NDATAD, pp.98-115.

      [10] Zagibalov, T. and Carroll, J., 2008, August. Automatic seed word selection for unsupervised sentiment classification of Chinese text. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 1073-1080). Association for Computational Linguistics. https://doi.org/10.3115/1599081.1599216.

      [11] Nusko, B., Tahmasebi, N. and Mogren, O., 2016, July. Building a sentiment lexicon for swedish. In Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland(No. 126, pp. 32-37). Linköping University Electronic Press.

      [12] Gamallo, P. and Garcia, M., 2014. Citius: A naive-bayes strategy for sentiment analysis on english tweets. In Proceedings of the eighth international Workshop on Semantic Evaluation (SemEval 2014) (pp. 171-175). https://doi.org/10.3115/v1/S14-2026.

      [13] Kasthuriarachchy, B.H., De Zoysa, K. and Premaratne, H.L., 2014, December. Enhanced bag-of-words model for phrase-level sentiment analysis. In Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on (pp. 210-214). IEEE. https://doi.org/10.1109/ICTER.2014.7083903.

      [14] Benamara, F., Cesarano, C., Picariello, A., Recupero, D.R. and Subrahmanian, V.S., 2007, March. Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In ICWSM.

      [15] Patel, N.D. and Chand, C., 2014. Selecting Best Features Using Combined Approach in POS Tagging for Sentiment Analysis. Vol. 3 Issue. [3], pg. 425-430 2014, 23.

      [16] Subrahmanian, V.S. and Reforgiato, D., 2008. AVA: Adjective-verb-adverb combinations for sentiment analysis. IEEE Intelligent Systems, 23(4), pp.43-50. https://doi.org/10.1109/MIS.2008.57.

      [17] An, N.T.T. and Hagiwara, M., 2014, June. Adjective-based estimation of short sentence’s impression. In KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13 (No. 100, pp. 1219-1234). Linköping University Electronic Press.

      [18] Yu, H. and Hatzivassiloglou, V., 2003, July. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing (pp. 129-136). Association for Computational Linguistics. https://doi.org/10.3115/1119355.1119372.

      [19] Roebuck, K., 2011. Encryption: High-impact Strategies-What You Need to Know Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Tebbo.

      [20] Hatzivassiloglou, V. and McKeown, K.R., 1997, July. Predicting the semantic orientation of adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics (pp. 174-181). Association for Computational Linguistics.

      [21] Wilson, T., Wiebe, J. and Hoffmann, P., 2009. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational linguistics, 35(3), pp.399-433. https://doi.org/10.1162/coli.08-012-R1-06-90.

      [22] Romanyshyn, M., 2013. Rule-based sentiment analysis of ukrainian reviews. International Journal of Artificial Intelligence & Applications, 4(4), p.103. https://doi.org/10.5121/ijaia.2013.4410.

      [23] Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M. and Khan, I.A., 2017. Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PloS one, 12(2), p.e0171649. https://doi.org/10.1371/journal.pone.0171649.

      [24] Barber, I., PHPir, http://phpir.com/part-of-speech-tagging/. 2010.

      [25] Bogartz, R.S., Interrater agreement and combining ratings. University of Massachusetts, Amherst. http://www-unix. oit. umass. edu/~ bogartz.

      [26] Barbosa, L. and Feng, J., 2010, August. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd international conference on computational linguistics: posters (pp. 36-44). Association for Computational Linguistics.

      [27] Klavans, J. and Kan, M.Y., 1998, August. Role of verbs in document analysis. In Proceedings of the 17th international conference on Computational linguistics-Volume 1 (pp. 680-686). Association for Computational Linguistics.

      [28] Niu, Z., Yin, Z. and Kong, X., 2012, August. Sentiment classification for microblog by machine learning. In Computational and Information Sciences (ICCIS), 2012 Fourth International Conference on (pp. 286-289). Ieee. https://doi.org/10.1109/ICCIS.2012.276.

      [29] Wei, Z., Miao, D., Chauchat, J.H. and Zhong, C., 2008, May. Feature selection on Chinese text classification using character n-grams. In International Conference on Rough Sets and Knowledge Technology (pp. 500-507). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79721-0_68.

  • Downloads

  • How to Cite

    F. Patacsil, F. (2018). Combining rule-based and bag-of-words for phase-level sentiment analysis of blog comments. International Journal of Engineering & Technology, 7(2.27), 311-318. https://doi.org/10.14419/ijet.v7i2.27.13552