An Efficient Modified K-Means and Artificial Bee Colony Algorithm for Mining Search Result from Web Database

  • Authors

    • V Sabitha
    • Dr S.K. Srivatsa
    https://doi.org/10.14419/ijet.v7i2.20.17377
  • Web data clusters, modified k-means clustering, artificial bee colony algorithm, annotation generation, similarity measures.
  • Nowadays, data growth is directly proportional to time and it is a challenge to store the data as well as retrieve the data in an organized fashion. The main goals of a web data clustering algorithm are to produce appropriate clusters for the end user, to assign the available data to the most relevant cluster, to respond the end user instantly. In this paper, we propose a new algorithm namely ‘An Efficient              Modified K-Means and Artificial Bee Colony Algorithm’ to cluster web data. The proposed algorithm is the combination of K-means and Artificial Bee Colony (ABC) algorithm. The reasons for incorporating k-means algorithm are its simplicity and efficiency [9].             Initially, ABC algorithm is employed to achieve clustering [10] and this is followed by the application of k-means algorithm. The initial cluster centre is fixed by ABC algorithm. On experimental analysis, it is proved that the performance of An Efficient Modified K-Means and Artificial Bee Colony Algorithm is better than the other comparative algorithms, in terms of precision and recall. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. From the annotated search result, frequently used websites are identified by using apriori Algorithm which involve pattern mining. The              advantage of this new technique is fast operation on dataset containing items and provides facilities to avoid unnecessary scans to the database.

     

     

  • References

    1. [1] Das S & Konar A, “Automatic image pixel clustering with an improved differential evolutionâ€, Applied Soft Computing, Vol.9, (2009), pp.226–236.

      [2] Das S, Abraham A & Konar A, “Automatic clustering using an improved differential evolution algorithmâ€, IEEE Transaction on Systems, Man, and Cybernetics–Part A: Systems and Humans, Vol.38, (2008), pp.218–237.

      [3] Das S, Abraham A & Konar A, “Automatic clustering with a multi-elitist particle swarm optimization algorithmâ€, Pattern Recognition Letters, Vol.29, (2008), pp.688–699.

      [4] Han J & Kamber M, Data Mining: Concepts and Techniques, second ed., Morgan Kaufman, San Francisco, (2006).

      [5] Baeza-Yates R & Ribeiro-Neto R, Modern Information Retrieval, Addison Wesley, ACM Press, New York, (1999).

      [6] Hammouda KM & Kamel MS, “Efficient phrase-based document indexing for web document clusteringâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.16, (2004), pp.1279–1296.

      [7] Kalashnikov DV, Chen ZS, Mehrotra S & Nuray-Turan R, “Web people search via connection analysisâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.20 (2008), pp.1550–1565.

      [8] Aggrawal CC & Reddy CK, Data Clustering Algorithms and Applications, CRC Press, (2014).

      [9] MacQueen J, “Some methods for classification and analysis of multivariate observationsâ€, Proc. 5th Berkeley Symp. Math. Stat. Probability, (1967).

      [10] Karaboga D, “An idea based on honey bee swarm for numerical optimizationâ€, Technical Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department, (2005).

      [11] Karaboga D, Gorkemli B, Ozturk C & Karaboga N, “A comprehensive survey: artificial bee colony (ABC) algorithm and applicationsâ€, Artificial Intelligence Review, Vol.42, No.1,(2014), pp.21-57.

      [12] Karaboga D & Ozturk C, “A novel clustering approach: Artificial Bee Colony (ABC) algorithmâ€, Applied soft computing, Vol.11, No.1,(2011), pp.652-657.

      [13] Lee S & Lee W, “Evaluation of time complexity based on max average distance for K-means clusteringâ€, Int. J. Security Appl., (2012), pp.449–454.

      [14] Manning C, Raghavan P & Schütze H, Introduction to Information Retrieval, Cambridge University Press, Cambridge, England, (2008).

      [15] Reddy D & Jana PK, “Initialization for K-means clustering using voronoi diagramâ€, Procedia Technol., Vol.4, (2012), pp.395–400.

      [16] Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou ZH, Steinbach M, Hand D & Steinberg D, “Top 10 algorithms in data miningâ€, Knowl. Inf. Syst., Vol.14, (2008), pp.1–37.

      [17] He H, Meng W, Yu C & Wu Z, “Automatic Extraction of Dynamic Record Sections From Search Engine Result Pagesâ€, VLDB J., Vol.13, No.3, (2012), pp.256-273.

      [18] Liu W, Meng X & Meng W, “Vide: A Vision-Based Approach for Deep Web Data Extractionâ€, IEEE Trans. Knowledge and Data Eng., Vol.22, No.3,(2010), pp.447-460.

  • Downloads

  • How to Cite

    Sabitha, V., & S.K. Srivatsa, D. (2018). An Efficient Modified K-Means and Artificial Bee Colony Algorithm for Mining Search Result from Web Database. International Journal of Engineering & Technology, 7(2.20), 396-400. https://doi.org/10.14419/ijet.v7i2.20.17377