A new generic interpretation of enhanced subspace clustering in high dimensional data

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    The prominent challenging task in data mining is to find the high dimensional data point clusters. In particular, the subspace clustering methods can be understood well in high dimensional data mining process. However the traditional subspace clustering techniques failed to find significance and quality of clusters that are present in the identified subspaces in growing the number of dimensions in large data. Most of the conventional clustering algorithms used bottom up search method and took multiple database scans to lead inefficiency. This research paper focuses a new enhanced subspace clustering scheme called ENSUBCLU, which overcomes the inefficiency from traditional subspace clustering techniques. Initially ENSUBCLU model was found in the dense units for each one dimensional projection of a given dataset. After that, applied subspace steering scheme to identify the promising subspaces and their combinations of common points among one dimensional subspaces. This model finds all interesting combinational dense core regions, from all lower dimensions of dense units. This lead to the reduction of subspace processing and obtain high quality subspace clusters and eliminates the redundant subspace clusters using hashing technique. Finally this model scales well with increasing attributes. ENSUBCLU model presents an empirical study on various synthetic, real world datasets and find the maximal subspace clusters in more improved manner than existing algorithms. It can even tackle many application areas like social networking, computer vision, bio-informatics, financial and sales analysis maintaining the high dimensional data.



  • Keywords

    Enhanced Subspace Clustering; Dense Core Region; High Dimensional Data Mining; Hashing; Subspace Clustering Algorithms.

  • References

      [1] Zhang T, Ramakrishnan R, LivnyM, “BIRCH: an efficient data clustering method for very large databases”,International conference on management of data, vol. 1.ACM Press, USA,1996,103–114 https://doi.org/10.1145/233269.233324.

      [2] Ester M, Kriegel H, Sander J, Xu X, A density-based algorithm for discovering clusters in large spatial databases with noise, International Conference Knowledge Discovery Data Mining 96(34):1996, 226–231.

      [3] Bellman RE, “Adaptive control processes: a guided tour”, Princeton University Press, New Jersey, 1961.https://doi.org/10.1515/9781400874668.

      [4] Beyer K, Goldstein J,“When is nearest neighbor meaningful? Proc 7th IntConf Database Theory. In: Database Theory–ICDT’99. Lecture Notes in Computer Science. Springer, Berlin Heidelberg Vol. 15, no. 40, 1999, pp 217-235.

      [5] Joliffe IT,”Principle component analysis”, 2nd edn. Springer, New York, 2002.

      [6] Tan, Pa-Ning, Steinbach, Michael, Kumar, Vipi, Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, 2005.

      [7] Kelvin, Sim, Gopalkrishnan, Vivekanand, Zimek, Arthur, Cong, GAO, A survey onenhanced subspace clustering, Data Mining Knowl. Discovery,Vol. 26, no. 2, 2013,332–397.https://doi.org/10.1007/s10618-012-0258-x.

      [8] Rama Devi.J, Venkateswara Rao M, An era of enhanced subspace clustering in high-dimensional data, i-manager’s Journal on Computer Science, Vol. 4, no. 3,September –November 2016,29-36.

      [9] Kailing K, Kriegel HP, Kroger P,” Density-connected subspace clustering for high-dimensional data”,In: SIAM international conference on data mining.2004,246–256,.

      [10] Parsons L, Haque E, Liu H “Subspace clustering for high dimensional data: a review”, ACM SIGKDD ExplorNewsl Vol.6, no.1, 2004, 90–105.https://doi.org/10.1145/1007730.1007731.

      [11] Agrawal R, Gehrke J, Gunopulos D, Raghavan P, Automatic subspace clustering of high-dimensional data for data mining applications. In: Proceedings of the ACM international conference on management of data (SIGMOD), 1998, 94–105.https://doi.org/10.1145/276304.276314.

      [12] Jyoti Yadav, Dharmender Kumar Subspace Clustering using CLIQUE: An Exploratory Study, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Vol. 3, no. 2, February 2014.

      [13] Cheng, C.H., Fu, A.C., Zhang, Y.:”Entropy-Based Subspace Clustering for Mining Numerical Data”. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases, San Diego, FL. (1999)https://doi.org/10.1145/312129.312199.

      [14] Goil, S., Nagesh, H., Choudhary, A.:”MAFIA: Efficiant and Scalable Subspace Clustering for Very Large Data Sets”. Tech. Report No. CPDC-TR-9906-010, Center for Parallel and Distributed Computing, Dept. of Electrical and Computer Engineering, Northwestern University 1999.

      [15] Sequeira, K., Zaki, M., SCHISM: A New Approach for Interesting Subspace Mining. In: The Proceedings of the Fourth IEEE Conference on Data Mining, .2004, 186–193.Kriegel H, Kroger P, Renz M, and Wurst S,

      [16] A generic framework for efficient subspace clustering of high–dimensional data, In Proceedings of fifth IEEE International Conference on Data Mining,vol. 46, no.4, Mar 2004,255-271,.

      [17] Yi-Hong Chu, Jen-Wei Huang, Kun-Ta Chuang, De-Nian Yang, “Density Conscious Subspace Clustering for High-Dimensional Data”, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no.1, January 2010. 16-30, https://doi.org/10.1109/TKDE.2008.224.

      [18] Assent I, Emmanuel M, Seidl T, “Inscy: Indexing subspace clusters with in-process-removal of redundancy”, in: Eighth IEEE international conference on data mining. IEEE.2008, 719–724.https://doi.org/10.1109/ICDM.2008.46.

      [19] Aggarwal, C.C., Wolf, J.L., Yu, P.S., Cecilia, Procopiuc, Jong Soo, Park, 1999. Fast algorithms for projected clustering. In: Proc. ACM SIGMOD international conference on Management of data, New York, U.S.A., pp. 61–72.https://doi.org/10.1145/304182.304188.

      [20] Aggarwal, C.C., Yu. P.S, Finding generalized projected clusters in high dimensional spaces. In: Proc. 1st International Conference on Management of Data, New york, U.S.A, 2000, 70-81

      [21] Woo Kyou-Gu, Lee, Jeo-Hoon, Kim, Myou-Ho, Lee, Yo-Joon, and FINDIT: a fast and intelligent subspace-clustering algorithm. Inf. Softw. Technol. Vol. 46, no. 4, 2004, 255–271.

      [22] Jing, L., Ng, M.K., Xu, J., Joshua Zhexue, Huang. Subspace clustering of text documents with feature weighting k-means algorithm. In: Proc. 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Springer, Berlin Heidelberg, 2005, 802–812.https://doi.org/10.1007/11430919_94.

      [23] Gan, G., Wu, J, A convergence theorem for the fuzzy subspace clustering (FSC) algorithm,PatternRecogn. Vol. 41, no. 6, 2008, 1939–1947.https://doi.org/10.1016/j.patcog.2007.11.011.

      [24] Chan, E.Y., Ching, W.K., Ng, M.K., Huang, Joshua Z, An optimization algorithm for clustering using weighted dissimilarity measures. Pattern recognition Vol. 37, no.5, 2004, 943–952.https://doi.org/10.1016/j.patcog.2003.11.003.

      [25] Dash M, Choi K, Scheuermann P, and Liu H,.“Feature selection for clustering-a filter solution. In: Proceedings of the two IEEE International Conference on Data Mining (ICDM), 2002, 115-122.

      [26] Günnemann S, Müller E, Färber I, and Seidl T “Detection of orthogonal concepts in subspaces of high dimensional data”. In: Proceedings of the 18 ACM Conference on Information and Knowledge Management (CIKM), 2009,1317-1326,https://doi.org/10.1145/1645953.1646120.

      [27] Fromont É, Prado A, and Robardet C, “Constraint-based subspace clustering”. In the Proceedings th of the 9 SIAM International Conference on Data Mining (SDM), 2009, 26-37.https://doi.org/10.1137/1.9781611972795.3.

      [28] Lakshmi, B.J., et al. A rough set based subspace-clustering technique for high dimensional data. Journal of King Saud University – Computer and Information Sciences 2017. https://doi.org/10.1016/j.jksuci.2017.09.003.

      [29] Suguna. M &Palaniammal. S, An Efficient Density Conscious Subspace Clustering Method using Top-down and Bottom-up Strategies‟, International Journal of Computer Science and Information Technologies, vol. 5, no. 3, 2014,3839-3842.

      [30] Rama Devi.J, Venkateswara Rao M, Design a New Model for Enhanced Subspace Clustering of High –Dimensional Data, International Journal of Control Theory and Applications ISSN: 0974–5572, Vol. 9, no. 42, 2016, 201-208.

      [31] Erdös P, LehnerJ ,” The distribution of the number of summands in the partitions of a positive integer”, Duke Mathematical Journal,Vol. 8,no.2,1941,335–345. https://doi.org/10.1215/S0012-7094-41-00826-8.

      [32] Muller E, Gunnemann S, Assent I, Seidl T and Farber I, 2009.

      [33] Lichman,UCI Machine Learning Repository, available at: 2007.http://archive.ics.uci.edu/m/.




Article ID: 20398
DOI: 10.14419/ijet.v7i4.20398

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.