A new approach for finding semantic similar scientific articles

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each part two by two. The final results are then calculated based on weighted mean from different parts. Results are compared with human scores to find how it is close to Pearson’s correlation coefficient. The correlation coefficient above 87 percent is the result of the proposed system. The system works precisely in identifying the similarities.

  • Keywords

    Similarities; Semantic Similarities; Text Preprocessing; WordNet.

  • References

      [1] Sheth, A, Lytras M., "Information Retrieval by Semantic Similarity", int. journal on semantic web & information systems, 2(3), pp: 55-73. (2006).

      [2] Ramprasath, M, Hariharan, Sh.,”Using ontology for Measuring Semantic Similarity for Question Answering System”IEEE International conference on Advanced Communication control and Computing Technologies(ICACCD), pp: 218-223. (2012).

      [3] Sahami, M, Heilman, T., “A Web-based Kernel Function for Measuring the Similarity of Short text Snippets”, Proceeding of 15th International Word Wide Web Conference. (2006). http://dx.doi.org/10.1145/1135777.1135834.

      [4] Madylova, A., “A Taxonomy based Semantic Similarity Documents Using Cosine Measure”, Computer an Information Sciences, IEEE,Iscis 2009.24th, International Symposium. (2009).

      [5] Mihalcea, R., Corley, C, Strapparava, C., “Corpus-based and Knowledge-based Measures of Text Semantic Similarity”, Proceeding of th National Conference on Artificial Intelligence ,pages:775-780. (2006).

      [6] Ghazizadeh Ahsaee, M, Naghibzadeh, M, Yasrebi Naieni, S.E., “Weighted Semantic Similarity Assesment Using Word Net ”, Dept. of Computer Engineering Ferdowsi University of Mashhad, Iran , International Conference on computer & Information Science(ICCIS), pp:66-71, (2012).

      [7] Qasim, A, Omar, N, Albared, M., “Combined Statistical Methods to Measure Semantic Text Similarity in Holy Qurʼanic Translations”, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, university Kebang Saan Mlaysia, 43600 Bangi Selangor, Malaysia, vol5(17), pp:1-7, (2013).

      [8] Huang, A., “Similarity Measure for Text Document Clustering”, Department of Computer Science The University of Waikato, Hamilton, New Zealand, pp:49-56, (2008).

      [9] Song, W, Cheol Park, S., “An Improved Genetic Algotithm for Document Clustering With Semantic Similarity Measure”, Division of Electronics and Information Engineering, Chonbuk National University, Jeonju, 561756, korea(IEEE), pp:536-540. (2006).

      [10] Porter, M., “An algorithm for suffix stripping. Program”.14(3), pp.130-137, (1980). http://dx.doi.org/10.1108/eb046814.

      [11] Lin, F, Sandkuhl, K., “A Survey of Exploiting WordNet in Ontology Matching”. In IFIP International Federation for Information Processing, Artificial Intelligence and Practice II; Max Bramer; (Boston: Springer), Vol 276, pages: 341–350, (2008).

      [12] Cimiano, P., “Ontology Learning and Population from Text: Algorithms, Evaluation and Applications”, Springer, 2006.

      [13] Lin, D., “An information-theoretic definition of similarity”. In Proceeding of the15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, USA, pp. 296–304, (1998).

      [14] Petrakis, E.G.M., Varelas, G., “Design and evaluation of semantic similarity measures for concepts stemming from the same or different ontologies”. In 4th Workshop on Multimedia Semantics (WMS’06), pp. 44–52, (2006).

      [15] Resnic, P., “Using Information content to evaluate semantic similarity in a taxonomy”, Proceedings of IJCAI-95, vol. 1, 448-453, (1995).

      [16] Anisimov, A.V., Marchenko, O.O, and Kysenko .V.K., “A Method for the Coputation of the Semantic Similarity and Relatedness between Natural Language Words”, Cybernetics and Systems Analysis, Vol 047, pp: 515-522, (2011). http://dx.doi.org/10.1007/s10559-011-9334-2.




Article ID: 4012
DOI: 10.14419/jacst.v4i1.4012

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.