A semantic search engine based on morphology processing to improve Arabic search

  • Authors

    • Aziz Barbar Lebanese University
    • Anis Ismail AUST
    2019-06-30
    https://doi.org/10.14419/ijet.v7i4.28688
  • Use about five key words or phrases in alphabetical order, Separated by Semicolon
  • In this paper, improving retrieval from Arabic text is tackled. A number of techniques such as truncating, stemming, and morphological analyzers have been introduced into to improve the retrieval performance in search engines. In Arabic search engines, three methods are mainly used: word, stem, and root. The word method is based only on term matching, while the other two methods use morphological anal-ysis. The two methods have different levels of morphological analysis, however, each of these has its limitations. For example, the word and stem methods may miss some relevant records that may contain morphological variations of the targeted word. On the other hand, the root method will always retrieve irrelevant records because it extracts the root from the word, and then searches for all possible morphologi-cal variations of that word. The limitations of the current search methods have motivated this research to investigate a new method to be used in Arabic search engines. This approach is called Semantic Search based on Morphological Processing. This method is based on se-mantic links of the morphological forms. The aim of introducing this method is based on the hope that this method will improve the effec-tiveness of the word and stem methods in terms of retrieving more relevant records. At the same time, it is also hoped that the proposed method will improve the root method by rejecting the irrelevant records that may be retrieved by the root method. A morphology analysis algorithm was designed and proposed to provide the needed stem and pattern extraction with high precision. The proposed algorithm targets modern Arabic text that is not discretized and may contain some faults in spelling. The proposed algorithm is rule-based and can solve all Arabic morphological variations.

     

     

  • References

    1. [1] Ibn Manzur, “Lisan El Arab,†Dar Kotob Al Ilmiyah; Latest Edition, 2006.

      [2] A. Chen and F. Gey, “Building an Arabic Stemmer for Information Retrieval,†in Proc. The Eleventh Text Retrieval conference, National Institute of Standards and Technology (NIST), 2002.

      [3] S. Brin and L. Page, “The Anatomy of a Large Hypertextual Web Search Engine,†in Proc. Seventh International World-Wide Web Conference, Australia, 1998 https://doi.org/10.1016/S0169-7552(98)00110-X.

      [4] Wikipeida Website on Arabic Language [Online], Available: http://en.wikipedia.org/wiki/Arabic_language

      [5] G. Weber, “Top Languages - The World's 10 most influential Languages†[Online], Available: http://www.andaman.org/BOOK/reprints/weber/rep-weber.htm.

      [6] S. Malik, N. Prakash, S. Marwaha, “Role of Search Engines in Intelligent Information Retrieval on Web,†in Proc. The 2nd National Conference, INDIA COM, 2008.

      [7] K. Satya Sai Prakash and S. V. Raghavan, “Intelligent Search Engine: Simulation to Implementation,†in Proc. 6th International conference on Information Integration and Web-based Applications and Services (iiWAS2004), Jakarta, Indonesia, 2004, pp. 203-212.

      [8] D. Meng and X. Huang, “An Interactive Intelligent Search Engine Model Research Based on User Information Preference,†in Proc. 9th International Conference on Computer Science and Informatics, 2006. https://doi.org/10.2991/jcis.2006.103.

      [9] X. Shen, Y. Xu, J. Yu, K. Zhang, “Intelligent Search Engine Based on Formal Concept Analysis†in Proc. IEEE International Conference on Granular Computing, 2007. https://doi.org/10.1109/GrC.2007.62.

      [10] M. Hattab, B. Haddad, M. Yaseen, A. Duraidi, A. Abu Shmais, “Addaall Arabic Search Engine: Improving Search based on Combination of Morphological Analysis and Generation Considering Semantic Patternsâ€, 2007, [Online]. Available: http://www.uop.edu.jo/download/cv/Addlaall-Search-Engine--Hattab-Haddad-Yaseen-UOP.pdf.

      [11] S. Kumar and S. Kumar Malik, “Towards Semantic Web Based Search Engines,†presented at National Conference on Advances in Computer Networks & Information Technology (NCACNIT-09), 2009.

      [12] I. Hmeidi, G. Kanaan, M. Evens, “Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents,†Journal of the American Society for Information Science, 48/10, 1997, pp. 867-881. https://doi.org/10.1002/(SICI)1097-4571(199710)48:10<867::AID-ASI3>3.3.CO;2-R.

      [13] S. Jaber and R. Delmonte, “Sarrif – The Elegant Arabic Morphology Parser,†in Proc The 9th international conference on Computational Linguistics and Intelligent Text Processing, Springer-Verlag Berlin, Heidelberg, 2008.

      [14] V. Cavalli-Sforza, A. Soudi, T. Mitamura, “Arabic Morphology Generation Using a Concatenative Strategy,†presented at 1st North American chapter of the Association for Computational Linguistics conference, San Francisco, CA, USA, 2000.

      [15] K. Darwish, “Building a Shallow Arabic Morphological Analyzer in One Day,†in Proc. The ACL-02 workshop on Computational Approaches to Semitic Languages, Stroudsburg, PA, USA 2002 https://doi.org/10.3115/1118637.1118643.

      [16] M Attia (2000), “A Large-Scale Computational Processor of the Arabic Morphology, and Applications,†[Online], Available: http://www.nemlar.org/Publications/M_A_Thesis2000.pdf.

      [17] J. Goldsmith (2000), “Unsupervised Learning of the Morphology of a Natural Language,†[Online], Available: http://humanities.uchicago.edu/faculty/goldsmith.

      [18] A. N. De Roeck, W. Al-Fares, “A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Rootsâ€, in Proc. The 38th Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA, 2000 https://doi.org/10.3115/1075218.1075244.

      [19] M. Attia, “A Large-Scale Computational Processor of the Arabic Morphology, and Applications,†M.Sc. thesis, Dept. of Computer Engineering, Faculty of Engineering, Cairo University, 2000.

      [20] D. I. Moldovan and R. Mihalcea, “Using Wordnet and Lexical Operators to Improve Internet Searches,†IEEE Internet Computing Journal, Vol. 4, 2000, pp. 34–43 https://doi.org/10.1109/4236.815847.

      [21] D. Buscaldi, P. Rosso, E.S. Arnal, “A Wordnet-Based Query Expansion Method for Geographical Information Retrieval,†presented in the 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain.

      [22] P.M. Kruse, A. Naujoks, D. Roesner, M. Kunze, “Clever Search: A Wordnet Based Wrapper for Internet Search Enginesâ€, in Proc. The 2nd GermaNet Workshop, Bonn, Germany, 2005.

      [23] R. Guha, R. McCool, E. Miller, “Semantic Search Meets the Web,†in Proc. The 12th international conference on World Wide Web, ACM Press, 2003, pp. 700–709 https://doi.org/10.1145/775152.775250.

      [24] C. Rocha, D. Schwabe, M.P. de Aragao, “A Hybrid Approach for Searching in the Semantic Web,†in Proc. The 13th international conference on World Wide Web, 2004, pp. 374–383. https://doi.org/10.1145/988672.988723.

      [25] M. E. Muhammad, “From the Treasures of Arabic Morphology,†Zam Zam Publishers, 2005, pp. 1-359

      [26] E. Adams, “A Study of Trigrams and their Feasibility as Index Terms in a Full Text Information Retrieval System,†PhD Dissertation, George Washington University, USA, 1991.

      [27] S. Al-Fedaghi and F. Al-Anzi, “A New Algorithm to Generate Arabic Root-Pattern Forms,†in Proc. The 11th National Computer Conference, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia, 1989, pp. 04-07.

      [28] I. Al-Kharashi and M. Evens, “Comparing Words, Stems, and Roots as Index terms in an Arabic Information Retrieval System,†Journal of the American Society for Information Science, 45/8, 1994, pp. 548-560. https://doi.org/10.1002/(SICI)1097-4571(199409)45:8<548::AID-ASI3>3.0.CO;2-X.

      [29] K.B. Beesley, “Arabic Morphological Analysis on the Internet,†in Proc. The 6th International Conference and Exhibition on Multi-Lingual Computing, Cambridge, 1998.

      [30] M. F. Porter (1980), “An Algorithm for Suffix Stripping,†Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 14/3, pp. 130-137. https://doi.org/10.1108/eb046814.

      [31] N. Mansour, R. Haraty, W. Daher, M. Houri, “An Auto-Indexing Method for Arabic Textâ€, Information Processing and Management: An International Journal, Volume 44 Issue 4, July 2008. https://doi.org/10.1016/j.ipm.2007.12.007.

      [32] H. Al-Haj and A. Lavie, “The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation,†in Proc. The Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010), Denver, Colorado, 2010.

      [33] M. Gridach and N. Chenfour, “Design and Realization of an Arabic Morphological Automaton: New Approach for Arabic Morphological Analysis and Generation,†IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 3, No. 2, May 2011. https://doi.org/10.1155/2011/629305.

      [34] M. Al-Sadat Hoseini, “Semantic Processing of Arabic Language,†Journal of American Science, Vol. 7, No. 4, 2011, pp. 174-178

      [35] A. Y. Samarah, “Arabic Linguistics and Sibawaihi,†International Journal of Academic Researchâ€, Vol. 3, No. 2, 2011

      [36] M. Aljlayl, “Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach,†in Proc. ACM eleventh conference on Information and Knowledge Management, 2002. https://doi.org/10.1145/584845.584848.

      [37] L. S. Larkey, L. Ballesteros, M. E. Connell, (2002), “Light Stemming for Arabic Information Retrieval,†[Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.877&rep=rep1&type=pdf

      [38] C. D. Paice, “Method for Evaluation of Stemming Algorithms Based on Error Counting,†Journal of American Society for Information Science, 47 (8), 1996, pp. 632-649. https://doi.org/10.1002/(SICI)1097-4571(199608)47:8<632::AID-ASI8>3.0.CO;2-U.

      [39] J. Rowley, “The Electronic Libraryâ€, London: Library Association Publishing, 1998.

  • Downloads

  • How to Cite

    Barbar, A., & Ismail, A. (2019). A semantic search engine based on morphology processing to improve Arabic search. International Journal of Engineering & Technology, 7(4), 6547-6557. https://doi.org/10.14419/ijet.v7i4.28688