A cluster Analysis for Binary Data Using Genetic Algorithms

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    This research was initially driven by the lack of clustering algorithms that focus on binary data. A promising technique to analyze this type of data, namely Genetic Clustering for Unknown K (GCUK) became the main subject in this research. GCUK was applied to cluster four binary data and there is a presence of an imbalanced data in one of the data sets. The results show that GCUK is an efficient and effective clustering algorithm compared to K-means. The other contribution is the capability of GCUK for clustering the unbalanced data. Standard clustering algorithms cannot simply be applied to this type of data sets as it can cause a misclassification results.


  • Keywords

    Binary Data; Clustering; Genetic Algorithms.

  • References

      [1] Hruschka ER, Campello R, Freitas AA & de Carvalho A (2009), A Survey of Evolutionary Algorithms for Clustering/ Systems, Man, and Cybernetics, Part C: Applications and Reviews. IEEE Transactions 39(2), 133-155.

      [2] Jain AK (2010), Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31(8), 651-666.

      [3] Ordonez C (2003), Clustering binary data streams with K-means. In DMKD03: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 12-19

      [4] Baragona R, Battaglia F, Polu, I. Evolutionary Statistical Procedures, Springer, Berlin and Heidelberg, (2011).

      [5] Bandyopadhyay S, Maulik U (2002), Genetic Clustering for Automatic Evolution of Clusters and Application to Image Recognition. Pattern Recognition, 35, 1197-1208.

      [6] Saharan S & Baragona R (2013), A New Genetic Algorithm for Clustering Binary Data with Application to Traffic Accidents in Christchurch. Far East Journal of Theoretical Statistics 45(1), 67-89.

      [7] Lin HJ, Yang FW, Kao YT (2005), An Efficient GA-based Clustering Technique. Tamkang Journal of Science and Engineering 8(2), 113-122

      [8] Maulik U, Bandyopadhyay S (2000), Genetic Algorithm-based Clustering Technique. Pattern Recognition 33(9), 1455-1465.

      [9] Safe M, Carballido J, Ponzoni I & Brignole N (2004), On Stopping Criteria for Genetic Algorithms. Advances in Artificial Intelligence, 405-413.

      [10] Milligan G, Cheng R (1996), Measuring the influence of individual data points in a cluster analysis. Journal of Classification 13(2), 315-335.




Article ID: 28174
DOI: 10.14419/ijet.v7i4.30.28174

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.