VBS Stemmer: A vocabulary-based stemmer

Hamed Zakeri Rad; Sabrina Tiun; Saidah Saad

doi:10.14419/ijet.v7i2.9192

Authors

Hamed Zakeri Rad
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
Sabrina Tiun
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
Saidah Saad
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia

Received date: January 17, 2018

Accepted date: April 6, 2018

Published date: April 13, 2018

DOI:

https://doi.org/10.14419/ijet.v7i2.9192

Keywords:

English Suffix Removal, Information Retrieval, Stemming Algorithm, Suffix Removal, Vocabulary Based Stemmer,

Abstract

Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution to the stemming problem for the applications which are based on the semantic relation between words or dictionary based and need valid words. The Vocabulary part of VBS stemmer is generated based on WordNet. To validate the VBS Stemmer, part of â€œCranfield 1400â€ test collection being used, and the result shows significant improvements over the previous stemmers.

References

[1] Bacchin, M., N. Ferro, and M. Melucci. â€œThe effectiveness of a graph-based algorithm for stemming,â€ in ICADL. Springer.2002.
[2] Lovins, J.B., â€œDevelopment of a stemming algorithm,â€MIT Information Processing Group, Electronic Systems Laboratory Cambridge.1968.
[3] Porter, M.F., â€œAn algorithm for suffix stripping,â€Program,14(3):1980.pp. 130-137. https://doi.org/10.1108/eb046814.
[4] Dawson, J.L., â€œSuffix removal and word conflation,â€ALLC Bulletin, Michaelmas,1974. pp. 33-46.
[5] Dattola, R.T., â€œFIRST: Flexible information retrieval system for text,â€Journal of the Association for Information Science and Technology, 1979. 30(1):pp. 9-14. https://doi.org/10.1002/asi.4630300103.
[6] Porter, M.F., â€œSnowball: A language for stemming algorithms,â€ 2001.
[7] Willett, P., â€œThe Porter stemming algorithm: then and now,â€ Program, 2006. 40(3):pp. 219-223. https://doi.org/10.1108/00330330610681295.
[8] Van Rijsbergen, C.J., S.E. Robertson, and M.F. Porter, â€œNew models in probabilistic information retrieval,â€British Library Research and Development Department. 1980
[9] Chris, D.P. â€œAnother stemmer,â€ in ACM SIGIR Forum. 1990.
[10] Kraaij, W. and R. Pohlmann, â€œPorterâ€™s stemming algorithm for Dutch. Informatiewetenschap,â€ 1994: pp. 167-180.
[11] Idris, N. and S.S. Mustapha, â€œStemming for term conflation in Malay texts,â€ 2001.
[12] Orengo, V.M. and C. Huyck. â€œA stemming algorithm for the portuguese language,â€ in String Processing andInformation Retrieval, 2001. SPIRE 2001. Proceedings. Eighth International Symposium on IEEE.2001 https://doi.org/10.1109/SPIRE.2001.989755.
[13] Ramanathan, A. and D.D. Rao. â€œA lightweight stemmer for Hindi,â€ in the Proceedings of EACL. 2003.
[14] Taghva, K., R. Beckley, and M. Sadeh. â€œA stemming algorithm for the farsi language. in Information Technology: Coding and Computing,â€2005. ITCC 2005. International Conference onIEEE. 2005.
[15] Savoy, J., â€œSearching strategies for the Bulgarian language,â€Information Retrieval, 2007. 10(6):pp. 509-529. https://doi.org/10.1007/s10791-007-9033-9.
[16] Savoy, J., â€œSearching strategies for the Hungarian language,â€Information processing & management, 2008. 44(1):pp. 310-324. https://doi.org/10.1016/j.ipm.2007.01.022.
[17] Sawalha, M. and E. Atwell.â€ Comparative evaluation of arabic language morphological analysers and stemmers,â€ in Proceedings of COLING 2008 22nd International Conference on Comptational Linguistics (Poster Volume)). 2008. Coling 2008 Organizing Committee. 2008.
[18] Sharma, D., â€œStemming algorithms: A comparative study and their analysis,â€International Journal of AppliedInformation Systems, 2012. 4(3): pp. 7-12. https://doi.org/10.5120/ijais12-450655.
[19] Oard, D.W., G.-A. Levow, and C.I. Cabezas. â€œCLEF experiments at Maryland: Statistical stemming and backoff translation,â€ in Workshop of the Cross-Language Evaluation Forum for European Languages. Springer.2000.
[20] Bacchin, M., N. Ferro, and M. Melucci, â€œA probabilistic model for stemmer generation,â€Information Processing &Management, 2005. 41(1):pp. 121-137. https://doi.org/10.1016/j.ipm.2004.04.006.
[21] Majumder, P., et al., â€œYASS: Yet another suffix stripper,â€ACM transactions on information systems (TOIS), 2007.25(4): pp. 18.
[22] Paik, J.H., D. Pal, and S.K. Parui. â€œA novel corpus-based stemming algorithm using co-occurrence statistics,â€ in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM.2011. https://doi.org/10.1145/2009916.2010031.
[23] Miller, G. and C. Fellbaum, â€œWordnet: An electronic lexical database,â€MIT Press Cambridge.1998.

VBS Stemmer: A vocabulary-based stemmer

Authors

Hamed Zakeri Rad

Sabrina Tiun

Saidah Saad

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite