Identifying homogeneous rainfall catchments for non- stationary time series using tops is algorithm and bootstrap k-sample Anderson darling test
The reliability of extreme estimates of hydro-meteorological events such as extreme rainfalls may be questionable due to limited historical rainfall records. The problem of limited rainfall records, however, can be overcome by extrapolating information from gauged to ungauged rainfall catchments, which requires information on the homogeneity among rainfall catchments. The purpose of this study is to introduce a new regionalization algorithm to identify the most suitable agglomerative hierarchical clustering (AHC) algorithm and the optimum number of homogeneous rainfall catchments for non-stationary rainfall time series. The new algorithm is based on the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. This study also suggests the use of Bootstrap K-sample Anderson Darling (BKAD) test for validating regionalized homogeneous rainfall catchments. The Cophenetic Correlation Coefficients (CCC) from ten similarity measures are used as attributes for the TOPSIS algorithm to identify the most suitable AHC algorithm out of seven algorithms considered. The C-index (Î´CI), Davies-Bouldin index (Î´DB), Dunn index (Î´DI) and Gamma index (Î´GI) are then used as attributes for the TOPSIS algorithm to determine the optimum number of homogeneous rainfall catchments. The results show that the most suitable AHC algorithm is able to cluster twenty rainfall catchments in Kuantan River Basin, Malaysia into two optimum significant homogeneous clusters. The results also imply that the BKAD test is invariant towards the number of Bootstrap samples in the validation of homogeneous rainfall catchments.
 V. Agilan, N.V. Umamahesh, Is the covariate based non-stationary rainfall IDF curve capable of encompassing future rainfall changes, Journal of Hydrology 541(B) (2016) 1441-1455.
 N.H. Ahmad, I.R. Othman, S.M. Deni, Hierarchical cluster approach for regionalization of Peninsular Malaysia based on the precipitation amount, Proceedings of the International Conference on Science & Engineering in Mathematics, Chemistry and Physics (2013), https://doi.org/10.1088/1742-6596/423/1/012018.
 M.M. Alam, G. Morshed, C. Siwar, M.W. Murad, Initiatives and challenges of agricultural crop sector in East Coast Economic Region (ECER) development projects in Malaysia, American-Eurasian Journal Agriculture & Environmental Sciences 12(7) (2012) 922-931.
 N. Anuar, Z. Zakaria, Electricity load profile determination by using fuzzy C-Means and probability neural network, Energy Procedia 14 (2012) 1861-1869. https://doi.org/10.1016/j.egypro.2011.12.1180.
 P.A. Baeriswyl, M. Rebetez, Regionalisation of precipitation in Switzerland by means of principal component analysis, Theoretical and Applied Climatology 58(1-2) (1997) 31-41. https://doi.org/10.1007/BF00867430.
 D.H. Burn, Z. Zrinji, M. Kowalchuk, Regionalization of catchments for regional flood frequency analysis, Journal Hydrologic Engineering 2(2) (1997) 76-82. https://doi.org/10.1061/(ASCE)1084-0699(1997)2:2(76).
 G.S. Cavadias, T.B.M.J. Ouarda, B. BobÃ©e, C. Girard, A canonical correlation approach to the determination of homogeneous regions for regional flood estimation of ungauged basins, Hydrological Sciences Journal 46(4) (2001) 499-511. https://doi.org/10.1080/02626660109492846.
 M. Charrad, N. Ghazzali, V. Boiteau, A. Niknafs, NbClust: an R package for determining the relevant number of clusters in a data set, Journal of Statistical Software 61(6) (2014), https://doi.org/10.18637/jss.v061.i06.
 J.U. Chowdhury, J.R. Stedinger, L-H. Lu, Goodness-of-fit tests for regional generalized extreme value flood distributions, Water Resources Research 27(7) (1991) 1765-1776. https://doi.org/10.1029/91WR00077.
 Z.L. Chuan, N. Ismail, W.L. Shinyie, T.L. Ken, S.-F. Fam, A. Senawi, W.N.S.W. Yusoff, The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments, IOP Conference Series: Materials Science and Engineering 342 (2018) 012070, https://doi.org/10.1088/1757-899X/342/1/012070.
 P.S.P. Cowpertwait, A regionalization method based on a cluster probability model, Water Resources Research 47(11) (2011) W11525, https://doi.org/10.1029/2011WR011084.
 H. Deng, C.H. Yeh, R.J. Willis, Inter-company comparison using modified TOPSIS with objective weights, Computers and Operations Research 27(10) (2000) 963-973. https://doi.org/10.1016/S0305-0548(99)00069-6.
 A. Dudek Cluster quality indexes for symbolic classification-an examination, In: Decker R, Lenz H-J (ed) Advances in Data Analysis, Springer, Heidelberg, 2007. https://doi.org/10.1007/978-3-540-70981-7_4.
 S-F. Fam, A.A. Jemain, W.Z.W. Zin, Spatial analysis of socioeconomic deprivation in Peninsular Malaysia, International Journal of Arts & Sciences 4(17) 241-255.
 N.B. Guttman, The use of L-moments in the determination of regional precipitation climates, Journal of Climate 13 (1993) 547-566. https://doi.org/10.1175/1520-0442(1993)006<2309:TUOLMI>2.0.CO;2.
 M.J. Hall, A.W. Minns, A.K.M. Ashrafuzzaman, The application of data mining techniques for the regionalisation of hydrological variables, Hydrology and Earth System Sciences 6(4) (2002) 685-694. https://doi.org/10.5194/hess-6-685-2002.
 M.F. Hamdan, J. Suhaila, A.A. Jemain, Clustering rainfall pattern in Malaysia using functional data analysis, AIP Conference Proceedings 1643(1) (2015) 349-355. https://doi.org/10.1063/1.4907466.
 J.R.M. Hosking, J.R. Wallis, Some statistics useful in regional frequency analysis, Water Resources Research 29(2) (1993) 271-281. https://doi.org/10.1029/92WR01980.
 C.L. Hwang, K. Yoon, Multiple attribute decision making methods and applications a state-art-of-the-art survey, Springer-Verlag, Heidelberg, 1981.
 R. Jackson, Occupy World Street: A global roadmap for radical economic and political reform, Chelsea green, Hartford, 2012.
 P.A. Jaskowiak, R.J. Campello, I.G. Costa, on the selection of appropriate distances for gene expression data clustering, BMC Informatics 15 (2014) https://doi.org/10.1186/1471-2105-15-S2-S2.
 S. Kannan, S. Ghosh, Prediction of daily rainfall state in a river basin using statistical downscaling from GCM output. Stochastic Environmental Research and Risk Assessment 25(4) (2011) 457-474. https://doi.org/10.1007/s00477-010-0415-y.
 J. Kianfar, P. Edara, A data mining approach to creating fundamental traffic flow diagram, Procedia Social and Behavioral Sciences 104 (2013) 430-439. https://doi.org/10.1016/j.sbspro.2013.11.136.
 G.N. Lance, W.T. Williams, A general theory of classificatory sorting strategies 1. Hierarchical systems, The Computer Journal 9(4) (1967) 373-380. https://doi.org/10.1093/comjnl/9.4.373.
 H. Li, J. Sun, H. Zhang, J. Zhang, K. Jung, J. Kim, Y, Xuan, X. Wang, F. Li, What Large Sample Size Is Sufï¬cient for Hydrologic Frequency Analysis?â€”A Rational Argument for a 30-Year Hydrologic Sample Size in Water Resources Management, Water 10(4) (2018) 430, https://doi.org/10.3390/w10040430.
 G-F. Lin, L-H. Chen, Identification of homogeneous regions for regional frequency analysis using the self-organizing map, Journal of Hydrology 324(1-4) (2006) 1-9. https://doi.org/10.1016/j.jhydrol.2005.09.009.
 L-H. Lu, J.R. Stedinger, Sampling variance of normalized GEV/PWM quantile estimators and a regional homogeneity test, Journal of Hydrology 138(1-2) (1992) 223-245. https://doi.org/10.1016/0022-1694(92)90166-S.
 H.B. Mann, Nonparametric tests against trend, Econometrica 13(3) (1945) 245-259. https://doi.org/10.2307/1907187.
 U. Maulik, S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices, IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12) (2002) 1650-1654. https://doi.org/10.1109/TPAMI.2002.1114856.
 S.P. Mishra, D. Mishra, S. Patnaik, An integrated robust semi-supervised framework for improving cluster reliability using ensemble method for heterogeneous datasets, Karbala International Journal of Modern Science 1(4) (2015) 200-211. https://doi.org/10.1016/j.kijoms.2015.11.004.
 M.F.M, Nasir, M.A. Zali, H. Juahir, H. Hussain, S.M. Zain, N. Ramli, Application of receptor models on water quality data in source apportionment in Kuantan River Basin, Iranian Journal of Environmental Health Science & Engineering 9(1) (2012). https://doi.org/10.1186/1735-2746-9-18.
 C.S. Ngongondo, C-Y. Xu, L.M. Tallaksen, B. Alemaw, T. Chirwa, Regional frequency analysis of rainfall extremes in Southern Malawi using the index rainfall and L-moments approaches, Stochastic Environmental Research and Risk Assessment 25(7) (2011) 939-955. https://doi.org/10.1007/s00477-011-0480-x.
 D.T. Nguyen, Clustering with multiviewpoint-based similarity measure, IEEE Transactions on Knowledge and Data Engineering 24(6) (2012) 988-1001. https://doi.org/10.1109/TKDE.2011.86.
 S. Opricovic, G-H. Tzeng, Comprise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS, European Journal of Operational Research 156(2) (2004) 445-455. https://doi.org/10.1016/S0377-2217(03)00020-1.
 W.A. Pansera, B.M. Gomes, M.A.V. Boas, E.Ld. Mello, Clustering rainfall stations aiming regional frequency analysis, Journal of Food, Agriculture & Environment 11(2) (2013) 877-885.
 G.A.A. Saeed, Z.L. Chuan, R. Zakaria, W.N.S.W. Yusoff, M.Z. Salleh, Determination of the best single imputation algorithm for missing rainfall data treatment, Journal of Quality Measurement and Analysis 12(1-2) (2016) 79-87.
 H. Safari, E. Khanmohammadi, A. Hafezamini, S.S. Ahangari, A new technique for multi criteria decision making based on modified similarity method, Middle-East Journal of Scientific Research 14(5) (2013) 712-719.
 S. SaraÒ«li, N. DoÄŸan, Ä°. DoÄŸan, Comparison of hierarchical cluster analysis methods by cophenetic correlation, Journal of Inequalities and Applications 2013(203) (2013).
 F.W. Scholz, M.A. Stephens, K-sample Anderson-Darling Tests, Journal of American Statistical Association 82(399) (1987) 918-924.
 H-S. Shih, H-J. Shyur, E.S. Lee, An extension of TOPSIS for group decision-making, Mathematical and Computer Modelling 45(7-8) (2007) 801-813. https://doi.org/10.1016/j.mcm.2006.03.023.
 A.S. Shirkhorshidi, S. Aghabozorgi, T.Y. Wah, A comparison study on similarity and dissimilarity measures in clustering continuous data, PLoS One 10(12) (2015). https://doi.org/10.1371/journal.pone.0144059.
 K.K. Singh, S.V. Singh, Space-time variation and regionalization of seasonal and monthly summer monsoon rainfall on sub-Himalayan region and Gangetic plains of India, Climate Research 6(3) (1996) 251-262. https://doi.org/10.3354/cr006251.
 F.T. Tangang, L. Juneng, E. Salimun, K.M. Sei, L.J. Le, H. Muhamad, Climate change and variability over Malaysia: Gaps in science and research information, Sains Malaysiana, 41(11) (2012) 1355-1366.
 B. Venkatesh, M.K. Jose, Identification of homogeneous rainfall regimes in parts of Western Ghats region of Karnataka, Journal of Earth System Science 116(4) (2007) 321-329. https://doi.org/10.1007/s12040-007-0029-z.
 A. Viglione, F. Laio, P. Claps, A comparison of homogeneity tests for regional frequency analysis, Water Resources Research 43 (2007) W03428, https://doi.org/10.1029/2006WR005095.
 S.E. Wiltshire, Regional flood frequency analysis I: Homogeneity statistics, Hydrological Sciences Journal 31(3) (1986a) 321-333. https://doi.org/10.1080/02626668609491051.
 S.E. Wiltshire, Regional flood frequency analysis II: Multivariate classification of drainage basins in Britain, Hydrological Sciences Journal 31(3) (1986b) 335-346. https://doi.org/10.1080/02626668609491052.
 S.E. Wiltshire, Identification of homogeneous regions for flood frequency analysis, Journal of Hydrology 84(3-4) (1986c) 287-302. https://doi.org/10.1016/0022-1694(86)90128-9.
 N.L. Win, K.N. Win, The probability distributions of daily rainfall for Kuantan River Basin in Malaysia, International Journal of Science and Research 3(8) (2014) 977-983.