A brief survey of unsupervised agglomerative hierarchical clustering schemes

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Unsupervised hierarchical clustering process is a mathematical model or exploratory tool aims to provide the easiest way to categorize the distinct groups over the large volume of real time observations or dataset in tree form based on nature of similarity measures without prior knowledge. Dataset is an important aspect in the hierarchical clustering process that denotes the behavior of living species depicts the properties of a natural phenomenon and result of a scientific experiment and observation of a running machinery system without label identification. The hierarchical clustering scheme consists of Agglomerative and Divisive that is applicable to employ into various scientific research areas like machine learning, pattern recognition, big data analysis, image pixel classification, information retrieval, and bioinformatics for distinct patterns identification. This paper discovered a brief survey of agglomerative hierarchical clustering schemes with its clustering procedures, linkage metrics, complexity analysis, key issues and development of AHC scheme.


  • Keywords

    Agglomerative Hierarchical Clustering; Clustering Process; Distance Metric; Divisive Hierarchical Clustering; Similarity Measure; Linkage Method.

  • References

      [1] Antti Honkela, Jeremias Seppa & Esa Alhoniemi, “Agglomerative independent variable group analysis”, Neurocomputing, vol. 71, 2008, pp. 1311-1320. https://doi.org/10.1016/j.neucom.2007.11.024.

      [2] Athman Bauguettaya, Qi Yu, Xumin Liu, Xiangmin Zhou & Andy Song, “Efficient Agglomerative Hierarchical Clustering”, Expert Systems with Applications, vol. 42, no. 5, 2015, pp. 2785-2797. https://doi.org/10.1016/j.eswa.2014.09.054.

      [3] Caiming Zhong, Duoqian Miao & Pasi Franti, “Minimum Spanning tree based split-and-merge: A hierarchical clustering method”, Information Sciences, vol. 181, 2011, pp. 3397-3410.

      [4] Chih-Tang Chang, Lai Jim, ZC & Jeng, MD, “Fast agglomerative clustering using information of k-nearest neighbors”, Pattern Recognition, vol. 43, 2010, pp. 3958-3968. https://doi.org/10.1016/j.patcog.2010.06.021.

      [5] Chung-Chain Hsu, Chin-Long Chen & Yu-Wei Su, “Hierarchical clustering of mixed data based on distance hierarchy”, Information Sciences, vol. 177, no. 20, 2007, pp. 4474-4492. https://doi.org/10.1016/j.ins.2007.05.003.

      [6] Davies, DL & Bouldin, DW, “Cluster Separation Measure”, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 1,
      no. 2, 1979, pp. 95-105. https://doi.org/10.1109/TPAMI.1979.4766909.

      [7] De Amorim, RC, “Feature Relevance in Ward”s Hierarchical Clustering Using the Lp Norm”, Journal of Classification, vol. 32,
      no. 1, 2015, pp. 46–62. https://doi.org/10.1007/s00357-015-9167-1.

      [8] Defays, D, “An efficient algorithm for a complete link method“, Computer Journal British Computer Society, vol. 20, no. 4,
      1977, pp. 364–366. https://doi.org/10.1093/comjnl/20.4.364.

      [9] Duda, R, Hart, PE & Stork, DG, “Pattern classification”, 2nd edition, New York, NY: John Wiley & Sons, 2001.

      [10] Dutta, M, Kakoti Mahanta, A & Arun Pujari, K, “QROCK: A quick version of the algorithm for clustering of categorical data”, Pattern Recognition Letter, vol.26, 2005, pp. 2364-2373. https://doi.org/10.1016/j.patrec.2005.04.008.

      [11] Fionn Murtagh & Legendre P, “Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion”, Journal of Classification, vol. 31, 2014, pp. 274-295. https://doi.org/10.1007/s00357-014-9161-z.

      [12] Franti, P, Kaukoranta, T, Sen, DF & Chang, KS, “Fast and memory efficient implementation of the exact PNN”, IEEE Transaction on Image Processing, vol. 9, 2000, pp. 773-777. https://doi.org/10.1109/83.841516.

      [13] George Karypis, Eui-Hong Han & Vipin Kumar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling, Computers”, vol. 32, no. 8, 1999, pp. 68-75. https://doi.org/10.1109/2.781637.

      [14] George Kollios, Dimitrios Gunopulos, Nick Koudas & Stefan Berchtold, “Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets”, IEEE Transaction on Knowledge and Data Engineering, vol. 15, no. 5, 2003, pp. 1170-1187. https://doi.org/10.1109/TKDE.2003.1232271.

      [15] Gower J, “A comparison of some methods of cluster analysis”, Biometrics, vol. 23, no. 4, 1967, pp. 623-628. https://doi.org/10.2307/2528417.

      [16] Gower, JC & Ross, GJS, “Minimum spanning trees and single linkage cluster analysis”, Journal of the Royal Statistical Society, Series C, vol. 18, no. 1, 1969, pp. 54–64. https://doi.org/10.2307/2346439.

      [17] Ham, J & Kamber, M, “Data Mining: Concepts and Techniques”, 2nd Edition. Kaufmann, 2006.

      [18] Hisashi Koga, Tetsuo Ishibashi & Toshinori Watanabe, “Fast agglomerative hierarchical cluatering algorithm using locality-sensitive hashing”, Knowledge and Information Systems , vol. 12, 2007, pp. 25-53. https://doi.org/10.1007/s10115-006-0027-5.

      [19] Jain, A & Dubes, R, “Algorithm for clustering data”, Englewood Cliffs, NJ: Prentice Hall, 1988.

      [20] Jain, AK, Murty, MN & Flynn, PJ, “Data Clustering: A Review”, ACM Computer Surveys, vol. 31, no. 3, 1988, pp. 264-323. https://doi.org/10.1145/331499.331504.

      [21] Johnson, S, “Hierarchical clustering schemes”, Psychometrika, vol. 32, no. 3, 1967, pp. 241-254. https://doi.org/10.1007/BF02289588.

      [22] Lai Jim, ZC, Juan Eric, YT & Lai Franklin, JC, “Rough clustering using generalized fuzzy clustering algorithm”, Pattern Recognition, vol. 46, 2013, pp. 2538-2547. https://doi.org/10.1016/j.patcog.2013.02.003.

      [23] Lai Jim, ZC & Tsung-Jen Huang, “An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list”, Information Sciences, vol. 181, 2011, pp. 1722-1734. https://doi.org/10.1016/j.ins.2011.01.011.

      [24] Lance, G & Williams, W, “A general theory of classification sorting strategies: Hierarchical system”, Computer Journal, vol. 9, 1967, pp. 373-380. https://doi.org/10.1093/comjnl/9.4.373.

      [25] Lin, CR & Chen, MS, “Combining partitional and hierarchical algorithms for robust and efficient data clustering with chesion self-merging”, IEEE Transaction on Knowledge and Data Engineering, vol. 17, 2005, pp. 145-159. https://doi.org/10.1109/TKDE.2005.21.

      [26] Manoranjan Dash, Huan Liu, Peter Scheuermann & Kian Lee Tan, “Fast Hierarchical Clustering and its validation”, Data and Knowledge Engineering, vol. 44, 2003, pp. 109-138. https://doi.org/10.1016/S0169-023X(02)00138-6.

      [27] Mark Junjie Li, Michael K Ng, Yiu-ming Cheung & Joshua Zhexue Huang, “Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters”, IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, 2008, pp. 1519-1534. https://doi.org/10.1109/TKDE.2008.88.

      [28] Martin Ester, Hans-peter Kriegel, Jorgs & Xiaowei Xu, “A denisty-based algorithm for discovering cluster in large spatial data base with noise”, 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226- 231.

      [29] Mcquitty, L, “Similarity analysis by reciprocal pairs for discrete and continuous data”, Educational and Psychological Measurement, vol. 27, 1966, pp. 21-46. https://doi.org/10.1177/001316446702700103.

      [30] Murtagh, F, “Survey of recent advances in hierarchical clustering algorithms”, The Computer Journal, vol. 26, no. 4, 1983, pp. 354-359. https://doi.org/10.1093/comjnl/26.4.354.

      [31] Murtagh, F 1984, “Complexities of Hierarchic Clustering Algorithms: the state of the art”, Computational Statistics Quarterly, vol. 1, 1984, pp. 101–113.

      [32] Ng, RT & Jiawei Han, “CLARANS: A Method for clustering objects for spatial data mining”, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 1, 2002, pp. 1003-1016. https://doi.org/10.1109/TKDE.2002.1033770.

      [33] Pasi Franti, Olli Virmajoki & Ville Hautamaki, “Fast Agglomerative Clustering using a k-Nearest Neighbor Graph”, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.28, 2006, pp. 1875-1881. https://doi.org/10.1109/TPAMI.2006.227.

      [34] Rui, XU & Donald Wunsch, C, “Clustering”, IEEE Press, 2009.

      [35] Sibson, R, “SLINK: an optimally efficient algorithm for the single-link cluster method, The Computer Journal (British Computer Society), vol. 16, no. 1, 1973, pp. 30–34. https://doi.org/10.1093/comjnl/16.1.30.

      [36] Sneath, P, “The application of computers to taxonomy”, Journal of general microbiology, vol. 17, 1957, pp. 201-226.

      [37] Sokal, R & Michener, C, “A statistical method for evaluating systematic relationships”, University of Kansas Science Bulletin, vol. 38, 1958, pp. 1409-1438.

      [38] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, “CURE: An Effective Clustering Algorithm for Large Database”, Information Systems, vol. 26, 2001, pp. 35-58. https://doi.org/10.1016/S0306-4379(01)00008-4.

      [39] Sudipto Guha, Rajeev Rastogi & Kyuseok Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes”, Information System, vol. 25, no. 5, 2000, pp. 512-521. https://doi.org/10.1016/S0306-4379(00)00022-3.

      [40] Szekely Gabor, J & Rizzo Maria, L, “Hierarchical clustering via joint between-within distance: extending wards minimum variance method”, Journal of Classification, vol. 22, 2005, pp. 151-183. https://doi.org/10.1007/s00357-005-0012-9.

      [41] Vijaya, PA, Narasimha Murty, M & Subramanian, DK, “Leader-Subleaders: An Efficient Hierarchical Clustering Algorithm for Large datasets”, Pattern Recognition Letter, vol. 25, 2004, pp. 505-513. https://doi.org/10.1016/j.patrec.2003.12.013.

      [42] Ward, J, “Hierarchical grouping to optimize an objective function”, Journal of the American statistical association, vol. 58, 1963, pp. 236-244. https://doi.org/10.1080/01621459.1963.10500845.

      [43] Wei Zhang, Gongxuan Zhang, Yongli Wang, Zhaomeng Zhu & Tao Li, “NNB: An Efficient Nearest Neighbor Search Method for Hierarchical Clustering on Large Datasets”, IEEE International Conference on Semantic Computing (ICSC), 2015, pp. 405-412. https://doi.org/10.1109/ICOSC.2015.7050840.

      [44] Zhang, T, Ramakrishnan, R & Livny, M, “BIRCH: an efficient data clustering method for very large databases”, International Journal of Windom (Ed.) ACM SIGMOD International Conference on Management of data, 1996, pp. 103-114. https://doi.org/10.1145/233269.233324.

      [45] Krishnamoorthy, K & Sreedhar kumar, S, “An Improved Agglomerative Clustering Algorithm for Outlier Detection”, Applied Mathematics and Information Sciences, vol.10, no. 3, 2016, pp.1125-1138, doi.org/10.18576/amis/100332, https://doi.org/10.18576/amis/100332.

      [46] Madheswaran M & Sreedhar kumar, S, “An Improved Frequency Based Agglomerative Clustering Algorithm for Detecting Distinct Clusters on Two Dimensional Dataset.”, Applied Mathematics and Information Sciences, vol. 9, no. 4, 2017, pp. 30-41, 10.5897/JETR2017.0628, 2017.

      [47] https://en.wikipedia.org/wiki/Hierarchical_clustering.

      [48] https://en.wikipedia.org/wiki/Euclidean_distance.

      [49] https://en.wikipedia.org/wiki/Single-Linkage_clustering.

      [50] https://en.wikipedia.org/wiki/Complete-linkage_clustering.

      [51] https://en.wikipedia.org/wiki/UPGMA.

      [52] Sreedhar Kumar S, Madheswaran M, “An improved Partitioned Clustering Technique for Identifying Optimum Number of Dissimilar Groups in Multimedia Dataset,” European Journal of Scientific Research, vol. 151, no. 1, pp. 5-21, 2018.




Article ID: 13971
DOI: 10.14419/ijet.v8i1.13971

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.