A web text similarity learning and classification approach for efficient information extraction

  • Authors

    • Sunil Kumar Thota Gitam University, Vishakapatnam, Andhra Pradesh, India
    • Dr. Tummala Sita Mahalakshmi Gitam University, Vishakapatnam
    2019-02-15
    https://doi.org/10.14419/ijet.v7i4.18075
  • Web Mining, Text Similarity, Classification, Information Extraction.
  • Over the last few years, the explosion of the World Wide Web has allowed users to access more and more information. In this circumstance, search engines have become a necessary tool for users to uncover the information they require in a huge space. As a result, the task of organizing this rich information becomes more difficult every day. It plays an important function in accomplishing the information, but numerous of the returned results are not related to the user's necessitates, because they are ranked according to the string match of the user's query. This resulted in semantic differences involved in the meaning of the keywords in the retrieved documents and the terms used in the user's query. The problem of categorizing large sources of information into groups of similar topics has not yet been resolved. In this paper, it proposes a web-text similarity learning (WTSL) method and classification based on SVM mechanism. This proposal aims to automate the estimation of the semantic comparison among the words or article to enhance the information extraction. The experimental results suggest the improvisation towards retrieving more accurate results by retrieving more relevant documents.

     

     

  • References

    1. [1] J. Shen, E. Zheng, Z. Cheng, C. Deng, "Assisting Attraction Classification by Harvesting Web Data", IEEE Access Volume: 5 Pages: 1600 - 1608, 2017. https://doi.org/10.1109/ACCESS.2017.2656878.

      [2] Tzu-Yi Chan, Yue-Shan Chang, "Enhancing Classification Effectiveness of Chinese News Based on Term Frequency", IEEE 7th International Symposium on Cloud and Service Computing (SC2), Pages: 124 - 131,2017.

      [3] C. Chen, X. Meng, Z. Xu, T. Lukasiewicz, "Location-Aware Personalized News Recommendation with Deep Semantic Analysis", IEEE Access, Volume: 5 Pages: 1624 - 1638, 2017. https://doi.org/10.1109/ACCESS.2017.2655150.

      [4] J. Gracia, E.Mena, "Web-Based Measure of Semantic Relatedness", In Proceedings of 9th International Conference On Web Information Systems Engineering (Wise '08), Vol. 5175, Pp. 136-150, 2008. https://doi.org/10.1007/978-3-540-85481-4_12.

      [5] J. Ruohonen, "Classifying Web Exploits with Topic Modeling", 28th International Workshop on Database and Expert Systems Applications (DEXA) Pages: 93 - 97, 2017. https://doi.org/10.1109/DEXA.2017.35.

      [6] U. Kumaresan, K. Ramanujam, "Web Dat a Extraction from Scientific Publishers' Website Using Heuristic Algorithm", International Journal of Intelligent Systems and Applications (IJISA), Vol.9, No.10, pp. 31 - 39, https://doi.org/10.5815/ijisa.2017.10.04.

      [7] R. L. Cilibrasi, P.M.B. Vitanyi, "The Google Similarity Distance", IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No 3, 370-383, 2007. https://doi.org/10.1109/TKDE.2007.48.

      [8] Tchiegue, R. Li, S. Ma, "A web text classification technique for unlabeled training samples", 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) Pages: 437 - 440, 2015.

      [9] T. M. Veeragangadhara Swamy, G. T. Raju, "A Novel Prefetching Technique through Frequent Sequential Patterns from Web Usage Data", An International Journal of Advanced Computer Technology, Vol. 4, No. 6, June 2015.

      [10] J. Hoxha, P. Mika, R. Blanco, "Learning Relevance of Web resources across Domains to make recommendations", 12th international conference on Machine Learning and Applications, vol. 2, pp. 325-330, 2013. https://doi.org/10.1109/ICMLA.2013.144.

      [11] Y. Li, A. Algarni, M. Albathan, Y. Shen, and M.A. Bijaksana, "Relevance Feature Discovery for Text Mining", In IEEE Trans. Knowl. Data Eng., vol. 26, no. 6, pp., Jan. 2015.

      [12] P. Li, H. Wang, K. Q. Zhu, Z. Wang, and X. Wu, "Computing term similarity by large probabilistic is a knowledge", In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, ser. CIKM '13, New York, NY, USA, pp. 1401-1410, 2013. https://doi.org/10.1145/2505515.2505567.

      [13] Y. Li, A. Algarni, and N. Zhong. "Mining positive and negative patterns for relevance feature discovery", In KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 753-762, New York, NY, USA, 2010. https://doi.org/10.1145/1835804.1835900.

      [14] C. Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis, and Khaled F. Shaalan. "A survey of Web information extraction systems", IEEE Transactions on Knowledge and Data Engineering, 18(10):1411-1428, 2006. https://doi.org/10.1109/TKDE.2006.152.

      [15] D. Zhou, X. Wu, W. Zhao, S. Lawless, J. Liu, "Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Dataâ€, IEEE Transactions on Knowledge and Data Engineering Volume: 29, Issue: 7, Pages: 1536 - 1548, 2017. https://doi.org/10.1109/TKDE.2017.2668419.

      [16] M. A. Siddiqui,"Mining Wikipedia to Rank Rock Guitarists", International Journal of Intelligent Systems and Applications (IJISA) , vol.7, no.12, pp.50 - 56, https://doi.org/10.5815/ijisa.2015.12.05.

      [17] X. He, C.H.Q. Ding, H. Zha, H.D. Simon, "Automatic topic identification using webpage clustering", In Proceedings of IEEE International Conference on Data Mining, pp.195-202, 2001.

      [18] W. Hua, Z. Wang, H. Wang, K. Zheng, and X. Zhou, "Understand Short Texts by Harvesting and Analyzing Semantic Knowledge", IEEE Transactions on Knowledge and Data Engineering, 1041-4347, 2016.

      [19] A. Ashari, M. Riasetiawan, "Document Summarization using TextRank and Semantic Network", International Journal of In telligent Systems and Applications (IJISA), Vol.9, No.11, pp. 26 - 33, https://doi.org/10.5815/ijisa.2017.11.04.

      [20] X. Wu, Dong Zhou, Yu Xu, S. Lawless, "Personalized query expansion utilizing multi-relational social data", 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) Pages: 65 - 70, 2017. https://doi.org/10.1109/SMAP.2017.8022669.

      [21] S. Lawrence, L. Giles, A. Spink, "Inquirus Web metasearch tool: A user evaluation", In Proceedings of WebNet, PP. 819-820, 2000.

      [22] S. T. Wu, Y. Li, and Y. Xu, "Deploying approaches for pattern refinement in text mining", In Proc. IEEE Conf. Data Mining, pp. 1157-1161, 2006. https://doi.org/10.1109/ICDM.2006.50.

      [23] N. Zhong, Yuefeng Li, and Sheng-Tang Wu, "Effective Pattern Discovery for Text Mining", Vol. 24, NO. 1, January 2012.

      [24] A. Anagnostopoulos, A. Broder, and K. Punera, "Effective and Efficient Classification on a Search-Engine Model, Knowledge and Information Systems, 2007.

      [25] Z. Zhang, Q. Li, and D. Zeng, "Mining evolutionary topic patterns in community question answering systems", IEEE Trans. Syst., Man, Cybern. Vol. 41, no. 5, pp. 828-833, 2011. https://doi.org/10.1109/TSMCA.2011.2157131.

      [26] J. Zhu, Member, K. Wang, Y. Wu, Zhongyi Hu, and H. Wang, "Mining User-Aware Rare Sequential Topic Patterns in Document Streams", IEEE Transactions on Knowledge and Data Engineering, 2016. https://doi.org/10.1109/TKDE.2016.2541149.

      [27] M. Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. "Open information extraction from the Web". In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670-2676, 2007.

      [28] M. S. Kamel, "An Efficient Concept Based Mining Model for Enhancing Text Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, October 2010.

  • Downloads

  • How to Cite

    Kumar Thota, S., & Tummala Sita Mahalakshmi, D. (2019). A web text similarity learning and classification approach for efficient information extraction. International Journal of Engineering & Technology, 7(4), 4856-4861. https://doi.org/10.14419/ijet.v7i4.18075