An approach towards Semantic Web Document Clustering using Background Knowledge

  • Authors

    • Sujata R. Kolhe
    • Dr. S. D. Sawarkar
    https://doi.org/10.14419/ijet.v7i3.24.24569
  • Web Document clustering, k-mean, cuckoo search algorithm, Wordnet
  • Extensive use of Information Technology in diverse applications has lead to massive online database. This database is often retrieved for various purposes. In order to smooth  the progress of effective browsing and efficient searching for a user, one has to arrange the documents in a systematic manner. Clustering text documents is a vital step in organizing, management and indexing a huge text data on Web. Traditional approaches toil taking place in the keyword identical method, the routine destroys then the handler didn’t change to the suitable upshot. Therefore, it grows into essential to signify the article semantically besides formerly remain managed. The background knowledge: Wordnet can be used for this purpose. In this paper Semantic weight is used to represent the document semantically and then a nature inspired meta heuristic algorithm : Cuckoo Search is implemented for automatic web search results clustering. Wordnet concepts are been identified from the text by considering the ontology of the text word. A novel method to find the semantic similarity is presented to represent the document as features.. These enhanced features are fed to the  clustering algorithm. Cuckoo Search algorithm solves the problem of automatically defining number of clusters. Divide n conquer technique is used to avoid algorithm converging too quickly. The algorithm is tested on AMBIENT Dataset and shown good results.

     

     

  • References

    1. [1] Aggarwal C.C., Zhai C. (2012) A Survey of Text Clustering Algorithms. In: Aggarwal C., Zhai C. (eds) Mining Text Data. Springer, Boston, MA

      [2] Carrot2. http://project.carrot2.org/.

      [3] Carpineto, C., Osi´nski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41(3), 17:1–17:38 (2009)

      [4] R. Baeza-Yates, A.B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc., 1999.

      [5] C. Carpineto, S. Osin´ ski, G. Romano, D. Weiss, A survey of Web clustering engines, ACM Comput. Surv. 41 (2009) 1–38.

      [6] Data Clustering: Algorithms and Applications. CRC Press; 2014.

      [7] McCallum, Andrew Kachites. "Bow: A toolkit for statistical languag modeling, text retrieval, classification and clustering." http://www.cs.cmu.edu/~mccallum/bow. 1996.

      [8] Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

      [9] P.S.Bradley,U.M.Fayyad. Reï¬ning initial points for K- Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91–99. Morgan Kaufmann, San Fran- cisco, CA, 1998

      [10] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,†Proc. 21st Ann. Int’l ACM SIGIR Conf., pp. 46-54, 1998.

      [11] Chim, Hung, and Xiaotie Deng. "Efficient phrase-based document similarity for clustering." IEEE Transactions on Knowledge and Data Engineering 20.9 (2008): 1217-1229.

      [12] Hammouda, Khaled M., and Mohamed S. Kamel. "Efficient phrase-based document indexing for web document clustering." IEEE Transactions on knowledge and data engineering 16.10 (2004): 1279-1296.

      [13] Osinski, Stanislaw, and Dawid Weiss. "A concept-driven algorithm for clustering search results." IEEE Intelligent Systems 20.3 (2005): 48-54.

      [14] Ahmed, M.S., Amar, M.K.: Semantic Web Search Results Clustering Using Lingo and Wordnet. In: IJRRCS: Kohat University of Science and Technology (KUST), Vol. 1, No 2, pp. 71–76. , Pakistan (2010)

      [15] Cobos, Carlos, et al. "Web document clustering based on global-best harmony search, K-means, frequent term sets and Bayesian information criterion." Evolutionary Computation (CEC), 2010 IEEE Congress on. IEEE, 2010.

      [16] Cui, Xiaohui, Thomas E. Potok, and Paul Palathingal. "Document clustering using particle swarm optimization." Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE. IEEE, 2005.

      [17] Bouras, Christos, and Vassilis Tsogkas. "A clustering technique for news articles using WordNet." Knowledge-Based Systems 36 (2012): 115-128.

      [18] Wei, Tingting, et al. "A semantic approach for text clustering using WordNet and lexical chains." Expert Systems with Applications 42.4 (2015): 2264-2275.

      [19] McCallum, Andrew Kachites. "Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering." http://www. cs. cmu. edu/~ mccallum/bow (1996).

      [20] Yang, Xin-She. Nature-inspired metaheuristic algorithms. Luniver press, 2010.

      [21] Cobos, Carlos, et al. "Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion." Information Sciences 281 (2014): 248-264.

      [22] Fellbaum, Christiane. WordNet. John Wiley & Sons, Inc., 1998.

      Wu, Zhibiao, and Martha Palmer. "Verbs semantics and lexical selection." Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1994.
  • Downloads

  • How to Cite

    R. Kolhe, S., & S. D. Sawarkar, D. (2018). An approach towards Semantic Web Document Clustering using Background Knowledge. International Journal of Engineering & Technology, 7(3.24), 733-737. https://doi.org/10.14419/ijet.v7i3.24.24569