Feature Selection using Genetic Algorithm for Clustering  high Dimensional Data

Kahkashan Kouser; Amrita Priyam

doi:10.14419/ijet.v7i2.11.11001

Article Summary Abstract References Full Article How to cite

Authors
- Kahkashan Kouser
- Amrita Priyam
https://doi.org/10.14419/ijet.v7i2.11.11001

Received date: April 3, 2018

Accepted date: April 3, 2018

Published date: April 3, 2018
feature selection, clustering, high dimensional data, Genetic algorithm.
Abstract

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.Â Â Â Finally, the result of GA-HDClusteringÂ is Â comparedÂ withÂ the Â traditional Â clusteringÂ algorithm.Â For comparison different validity Â matrices Â such Â as Â SumÂ of Â squared Â error Â (SSE), Â Within Â Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),Â Â are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm.
Â
References
1. [1] Sun, M., Xiong, L., Sun, H., & Jiang, D. (2009, October), A GA-based feature selection for high-dimensional data clustering. In 3rd International Conference on Genetic and Evolutionary Computing WGEC'09, pp. 769-772.
  [2] Sun, H. J., & Xiong, L. H. (2009, August), Genetic algorithm-based high-dimensional data clustering technique. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD'09, Vol. 1, pp. 485-489.
  [3] Parsons, L., Haque, E., & Liu, H. (2004), Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorations Newsletter 6, 90-105.
  [4] Alzubaidi, A., Cosma, G., Brown, D., & Pockley, A. G. (2016, October), Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information. In International Conference on Interactive Technologies and Games (iTAG), pp. 70-76.
  [5] Tiwari, R., & Singh, M. P. (2010), Correlation-based attribute selection using genetic algorithm. International Journal of Computer Applications 4, 28-34.
  [6] Li, J. (2015, December), A feature subset selection algorithm based on feature activity and improved GA. In 11th International Conference on Computational Intelligence and Security (CIS), pp. 206-210.
  [7] Chaimontree, S., Atkinson, K., & Coenen, F. (2010, November). Best clustering configuration metrics: towards multiagent based clustering. In International Conference on Advanced Data Mining and Applications (pp. 48-59). Springer, Berlin, Heidelberg.
  [8] David Bouldin Index, Available at: https://en.wikipedia.org/wiki/DavieBouldin_index
  [9] Hall, M. A. (1999). Correlation-based feature selection for machine learning.
  [10] Rostami, M., & Moradi, P. (2014, May), A clustering based genetic algorithm for feature selection. In 6th Conference on Information and Knowledge Technology (IKT), pp. 112-116.
  [11] Desale, K. S., & Ade, R. (2015, January), Genetic algorithm based feature selection approach for effective intrusion detection system. In International Conference on Computer Communication and Informatics (ICCCI), pp. 1-6.
  [12] Song, Q., Ni, J., & Wang, G. (2013), A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering 25, 1-14.
  [13] Chandrashekar, G., & Sahin, F. (2014), A survey on feature selection methods. Computers & Electrical Engineering 40, 16-28.
  [14] Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.
  [15] Han, J., Pei, J., & Kamber, M. (2011), Data mining: concepts and techniques. Elsevier.
  [16] Dunham, M. H. (2006), Data mining: Introductory and advanced topics. Pearson Education India..
Downloads
How to Cite
Kouser, K., & Priyam, A. (2018). Feature Selection using Genetic Algorithm for Clustering high Dimensional Data. International Journal of Engineering and Technology, 7(2.11), 27-30. https://doi.org/10.14419/ijet.v7i2.11.11001
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

Authors

Abstract

References

Downloads

How to Cite