Effect of Dimensionality Reductions Technique in Modelling and Forecasting River Flow


  • Shuhaida Ismail
  • Ani Shabri
  • Aida Mustapha
  • Siraj Mohammed Pandhiani






Dimensionality Reduction, Forecasting, River Flow, Least Square Support Vector Machine, Principal Component Analysis


The ability of obtain accurate information on future river flow is a fundamental key for water resources planning, and management. Traditionally, single models have been introduced to predict the future value of river flow. This paper investigates the ability of Principal Component Analysis as dimensionality reduction technique and combined with single Support Vector Machine and Least Square Support Vector Machine, referred to as PCA-SVM and PCA-LSSVM. This study also presents comparison between the proposed models with single models of SVM and LSSVM. These models are ranked based on four statistical measures namely Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Correlation Coefficient ( ), and Correlation of Efficiency (CE). The results shows that PCA combined with LSSVM has better performance compared to other models. The best ranked models are then measured using Mean of Forecasting Error (MFE) to determine its forecast rate. PCA-LSSVM proven to be better model as it also indicates a small percentage of under-predicted values compared to the observed river flow values of 0.89% for Tualang river while over-predicted by 2. 08% for Bernam river. The study concludes by recommending the PCA as dimension reduction approach combined with LSSVM for river flow forecasting due to better prediction results and stability than those achieved from single models




[1] Afshin M, Sadeghian A & Raahemifar K (2007), On efficient tuning of LS-SVM hyper-parameters in short-term load forecasting: A comparative study. Proc. of the 2007 IEEE Power Engineering Society General Meeting (IEEE-PES).

[2] Astel A, Mazerski, J, Polkowska Z, Namiesnik J (2004), Application of PCA and time series analysis in studies of precipitation in Tricity (Poland). Advances in Environmental Research. 8(3-4): 337-349.

[3] Bhagwat PP & Maity R (2012), Multistep-Ahead River Flow Prediction Using LS-SVR at Daily Scale. Journal of Water Resource and Protection. 4: 528-539.

[4] Bhagwat PP & Maity R (2013), Hydroclimatic streamflow prediction using Least Square-Support Vector Regression. Journal of Hydraulic Engineering. 19(3): 320-328.

[5] Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003), A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing. 55(1-2): 321-336

[6] Chau KW, Wang WC, Cheng CT, Qiu L (2009), A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. Journal of Hydrology. 374(3-4): 294-306.

[7] Dibike YB, Slavco V, Solomatine DP, Abbott MB (2001), Model Induction with Support Vector Machines: Introduction and Applications. Journal of Computing in Civil Engineering. 15(3): 208-216.

[8] Elshorbagy A, Corzo G, Srinivasalu S, Solomantine DP (2010), Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology. Hydrology and Earth System Sciences. 14(10): 1931-1941.

[9] Gestel TV, Suykens JAK, (2001), Financial time series prediction using least squares support vector machines within the evidence framework. Neural Networks, IEEE Transactions. 12(4): 809-821.

[10] Guo X, Sun X, Ma J (2011), Prediction of daily crop reference evapotranspiration (ET0) values through a least-squares support vector machine model. Hydrology Research. 42(4): 268-274.

[11] Guhathakurta P, Rajeevan M, Thapliyan V (1999), Long Range Forecasting Indian Summer Monsoon Rainfall by a Hybrid Principal Component Neural Network Model. Meteorology and ATM Ospheric Physics. 71(3-4): 255-266.

[12] Hanbay, D. (2009). An expert system based on least square support vector machines for diagnosis of valvular heart disease. Expert Systems with Applications. 36(4): 8368-8374.

[13] Helena B, Pardo R, Vega M, Barrado E, Fernandez JM, Fernandez, L (2000), Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Research. 34(3): 807–816.

[14] Hotelling H (1933), Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. (24): 417–441.

[15] Hu TS, Lam KC, N, ST (2007), Rainfall-Runoff Modelling using Principal Component Analysis and Neural Network. Nordic Hydrology. 38(2): 235-248.

[16] Jolliffe IT (2002), Principal Components Analysis. Second Edition. New York. Springer.

[17] Kang YW, Li J, Cao GY, Tu HY, Li J, Yang J (2008), Dynamic temperature modeling 10 of an SOFC using least square support vector machines. Journal of Power Sources. 179: 683-692.

[18] Khan MS & Coulibaly P (2006), Application of Support Vector Machine in Lake Water Level Prediction. Journal of Hydrologic Engineering. 11(3): 199-205.

[19] Kisi O (2004), River flow modeling using artificial neural networks. Journal of Hydrologic Engineering. 9(1): 60-63.

[20] Kisi O (2008), River flow forecasting and estimation using different artificial neural network technique. Hydrology Research. 39(1): 27-40.

[21] Knight DW & Shamseldin AY (2006), River Basin Modelling for Flood Risk Mitigation. London, UK. Taylor & Francis.

[22] Legates DR & McCabe Jr GJ (1999), Evaluating the use of goodness-of-fit measures in hydrologic and hydroclimatic model validation. Water Resources Research. 35(1): 233–241.

[23] Lin GF, Chen GR, Huang PY & Chou YC (2009), Support vector machine-based models for hourly reservoir inflow forecasting during typhoon-warning periods. Journal of Hydrology. 3(32): 17-29.

[24] Lin JY, Cheng CT, Chau KK (2006), Using support vector machines for long-term discharge prediction. Hydrological Sciences Journal. 51(4): 599-612.

[25] Lin GF, Chen GR, Huang PY (2010), Effective typhoon characteristics and their effects on hourly reservoir inflow forecasting. Advances in Water Resources. 33: 887-898.

[26] Liong SY & Sivapragasam C (2002), Flood stage forecasting with support vector machines. Journal of American Water Resources. 38(1),173 -186.

[27] Liu Z, Wang X, Cui L, Lian ., Xu J (2009), Research on Water Bloom Prediction Based on Least Squares Support Vector Machine. WRI World Congress on Computer Science and Information Engineering, 2009.

[28] Maity R, Bhagwat PP, Bhatnagar A (2010), Potential of support vector regression for prediction of monthly streamflow using endogenous property. Hydrological Processes. 24: 917–923.

[29] Misra D, Oommen T, Agarwal A, Mishra SK, Thompson AM (2009), Application and analysis of support vector machine based simulation for runoff and sediment yield. Biosystems Engineering. 103: 527-535.

[30] Mishra S, Choubey V, Pandey SK, Shukla JP (2014), An Efficient Approach of Support Vector Machine for Runoff Forecasting. International Journal of Scientific & Engineering Research. 5(3): 158-166.

[31] Noori R, Abdoli MA, Ameri A, Jalili GM (2009), Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad. Environmental Progress and Sustainable Energy. 28: (249-258).

[32] Noori R, Khakpour A, Omidva, B, Farokhni, A (2010), Comparison of ANN and Principal Component Analysis-Multivariate Linear Regression models for predicting the river flow based on developed discrepancy ratio statistic. Expert Systems with Applications. 37: 5850-5862.

[33] Okkan U & Serbes ZA (2012), Rainfall–runoff modeling using least squares support vector machines. Environmetrics. 23: 549-564.

[34] Ouyang, Y. (2005). Evaluation of river water quality monitoring stations by principal component analysis. Water Research. 39: 2621-2635.

[35] Page RM, Lischeid G, Epting J, Huggenberger P (2012), Principal component analysis of time series for identifying indicator variables for riverine groundwater extraction management. Journal of Hydrology. 432-433: 137-144.

[36] Parinet B, Lhote A, Legube B (2004), Principal component analysis: an appropriate tool for water quality evaluation and management—application to a tropical lake system. Ecological Modelling. 178: 295-311.

[37] Pearson K (1901), On lines and planes of closest fit to systems of points in space. Phil Mag. (6), 2, 559-572.

[38] Samsudi R, Saad P, Shabri A (2011), River flow time series using least squares support vector machines. Hydrology and Earth System Sciences. 15: 1835-1852.

[39] Sivapragasam C, Liong SY, Pasha MFK (2001), Rainfall and runoff forecasting with SSA–SVM approach. Journal of Hydroinformatics. 3. 141-152.

[40] Shabri A & Suhartono (2012), Streamflow forecasting using least-squares support vector machines. Hydrological Sciences Journal. 57(7): 1275-1293.

[41] Stathis D & Myronidis D (2009), Principal component analysis of precipitation in Thessaly Region (Central Greece). Global NEST Journal. 11(4): 467-476.

[42] Suykens JAK & Vandewalle J (1999), Least squares support vector machine classifiers. Neural Processing Letter. 9(3): 293-300.

[43] Suykens JAK, Gestel TV (2005), Least Square Support Vector Machine. New Jersey: World Scientific.

[44] Tay FEH, & Cao LJ (2001), Improved Financial Time Series Forecasting By Combining Support Vector Machines with Self-Organizing Feature Map. Intelligent Data Analysis. 5. 339–354.

[45] Twining CJ & Taylor CJ (2003), The use of kernel principal component analysis to model data distributions. Pattern Recognition. 36: 217-227.

[46] Vapnik V (1995), The Nature of Statistical Learning Theory. New York: Springer.

[47] Wang H & Hu D (2005), Comparison of SVM and LSSVM for Regression. International Conference on Neural Networks And Brain, 2005. 1: 279-283.

[48] Wang WC, Chau KW (2009), A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. Journal of Hydrology 374(3-4): 294-306.

[49] Wang S, Zhang X, Yu L, Lai KL (2009), Estimating the impact of extreme events on crude oil price: An EMD-based event analysis method. Energy Economics. 31: 768–778.

[50] Ye J & Xiong T (2007), SVM versus Least Squares SVM. The 11th International Conference on Artificial Intelligence and Statistics (AISTATS).640-647.

[51] Yu PS, Chen ST, Chang IF (2006), Support vector regression for real-time flood stage forecasting. Journal of Hydrology. 328(3-4): 704-716.

[52] Yunrong X & Liangzhong J (2009), Water Quality Prediction Using LS-SVM And Particle Swarm Optimization. Second International Workshop on Knowledge Discovery And Data Mining, 900-904.

[53] Zhang GP (2003), Time Series Forecasting Using A Hybrid ARIMA And Neural Network Model. Neurocomputing, 50: 159-175.

[54] Zhao Y, Dong Z, Li Q (2012), Application Study of Least Squares Support Vector Machines in Streamflow Forecast. Applied Mechanics and Materials. 212-213: 436-440.

View Full Article: