VBS Stemmer: A vocabulary-based stemmer

  • Authors

    • Hamed Zakeri Rad Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
    • Sabrina Tiun Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
    • Saidah Saad Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
    2018-04-13
    https://doi.org/10.14419/ijet.v7i2.9192
  • English Suffix Removal, Information Retrieval, Stemming Algorithm, Suffix Removal, Vocabulary Based Stemmer,
  • Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution to the stemming problem for the applications which are based on the semantic relation between words or dictionary based and need valid words. The Vocabulary part of VBS stemmer is generated based on WordNet. To validate the VBS Stemmer, part of “Cranfield 1400†test collection being used, and the result shows significant improvements over the previous stemmers.

  • References

    1. [1] Bacchin, M., N. Ferro, and M. Melucci. “The effectiveness of a graph-based algorithm for stemming,†in ICADL. Springer.2002.

      [2] Lovins, J.B., “Development of a stemming algorithm,â€MIT Information Processing Group, Electronic Systems Laboratory Cambridge.1968.

      [3] Porter, M.F., “An algorithm for suffix stripping,â€Program,14(3):1980.pp. 130-137. https://doi.org/10.1108/eb046814.

      [4] Dawson, J.L., “Suffix removal and word conflation,â€ALLC Bulletin, Michaelmas,1974. pp. 33-46.

      [5] Dattola, R.T., “FIRST: Flexible information retrieval system for text,â€Journal of the Association for Information Science and Technology, 1979. 30(1):pp. 9-14. https://doi.org/10.1002/asi.4630300103.

      [6] Porter, M.F., “Snowball: A language for stemming algorithms,†2001.

      [7] Willett, P., “The Porter stemming algorithm: then and now,†Program, 2006. 40(3):pp. 219-223. https://doi.org/10.1108/00330330610681295.

      [8] Van Rijsbergen, C.J., S.E. Robertson, and M.F. Porter, “New models in probabilistic information retrieval,â€British Library Research and Development Department. 1980

      [9] Chris, D.P. “Another stemmer,†in ACM SIGIR Forum. 1990.

      [10] Kraaij, W. and R. Pohlmann, “Porter’s stemming algorithm for Dutch. Informatiewetenschap,†1994: pp. 167-180.

      [11] Idris, N. and S.S. Mustapha, “Stemming for term conflation in Malay texts,†2001.

      [12] Orengo, V.M. and C. Huyck. “A stemming algorithm for the portuguese language,†in String Processing andInformation Retrieval, 2001. SPIRE 2001. Proceedings. Eighth International Symposium on IEEE.2001 https://doi.org/10.1109/SPIRE.2001.989755.

      [13] Ramanathan, A. and D.D. Rao. “A lightweight stemmer for Hindi,†in the Proceedings of EACL. 2003.

      [14] Taghva, K., R. Beckley, and M. Sadeh. “A stemming algorithm for the farsi language. in Information Technology: Coding and Computing,â€2005. ITCC 2005. International Conference onIEEE. 2005.

      [15] Savoy, J., “Searching strategies for the Bulgarian language,â€Information Retrieval, 2007. 10(6):pp. 509-529. https://doi.org/10.1007/s10791-007-9033-9.

      [16] Savoy, J., “Searching strategies for the Hungarian language,â€Information processing & management, 2008. 44(1):pp. 310-324. https://doi.org/10.1016/j.ipm.2007.01.022.

      [17] Sawalha, M. and E. Atwell.†Comparative evaluation of arabic language morphological analysers and stemmers,†in Proceedings of COLING 2008 22nd International Conference on Comptational Linguistics (Poster Volume)). 2008. Coling 2008 Organizing Committee. 2008.

      [18] Sharma, D., “Stemming algorithms: A comparative study and their analysis,â€International Journal of AppliedInformation Systems, 2012. 4(3): pp. 7-12. https://doi.org/10.5120/ijais12-450655.

      [19] Oard, D.W., G.-A. Levow, and C.I. Cabezas. “CLEF experiments at Maryland: Statistical stemming and backoff translation,†in Workshop of the Cross-Language Evaluation Forum for European Languages. Springer.2000.

      [20] Bacchin, M., N. Ferro, and M. Melucci, “A probabilistic model for stemmer generation,â€Information Processing &Management, 2005. 41(1):pp. 121-137. https://doi.org/10.1016/j.ipm.2004.04.006.

      [21] Majumder, P., et al., “YASS: Yet another suffix stripper,â€ACM transactions on information systems (TOIS), 2007.25(4): pp. 18.

      [22] Paik, J.H., D. Pal, and S.K. Parui. “A novel corpus-based stemming algorithm using co-occurrence statistics,†in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM.2011. https://doi.org/10.1145/2009916.2010031.

      [23] Miller, G. and C. Fellbaum, “Wordnet: An electronic lexical database,â€MIT Press Cambridge.1998.

  • Downloads

  • How to Cite

    Zakeri Rad, H., Tiun, S., & Saad, S. (2018). VBS Stemmer: A vocabulary-based stemmer. International Journal of Engineering & Technology, 7(2), 551-554. https://doi.org/10.14419/ijet.v7i2.9192