Identifying the Ideal Number Components of the Bayesian Principal Component Analysis Model for Missing Daily Precipitation Data Treatment


  • Zun Liang Chuan
  • Azlyna Senawi
  • Wan Nur Syahidah Wan Yusoff
  • Noriszura Ismail
  • Tan Lit Ken
  • Mu Wen Chuan





Bayesian principal component analysis model, Data treatment, TOPSIS, Variational Bayes.


The grassroots of the presence of missing precipitation data are due to the malfunction of instruments, error of recording and meteorological extremes. Consequently, an effective imputation algorithm is indeed much needed to provide a high quality complete time series in assessing the risk of occurrence of extreme precipitation tragedy. In order to overcome this issue, this study desired to investigate the effectiveness of various Q-components of the Bayesian Principal Component Analysis model associates with Variational Bayes algorithm (BPCAQ-VB) in missing daily precipitation data treatment, which the ideal number of Q-components is identified by using The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The effectiveness of BPCAQ-VB algorithm in missing daily precipitation data treatment is evaluated by using four distinct precipitation time series, including two monitoring stations located in inland and coastal regions of Kuantan district, respectively. The analysis results rendered the BPCA5-VB is superior in missing daily precipitation data treatment for the coastal region time series compared to the single imputation algorithms proposed in previous studies. Contrarily, the single imputation algorithm is superior in missing daily precipitation data treatment for an inland region time series rather than the BPCAQ-VB algorithm.   


[1] Ahrens B (2006), Distance in spatial interpolation of daily rain gauge data. Hydrology and Earth System Sciences 10, 197-208.

[2] Bishop CM (1999), Bayesian PCA. Proceedings of the Conference on Advances in Neural Information Processing Systems 11, 382-388.

[3] Burhanuddin SNZA, Deni SM & Ramli NM (2017), Normal ratio in multiple based on bootstrapped sample for rainfall data with missingness. International Journal of GEOMATE 13(36), 131-137.

[4] Burhanuddin SNZA, Deni SM & Ramli NM (2017), Imputation of missing rainfall data using revised normal ratio method. Advanced Science Letters 23(11), 10981-10985.

[5] Chen FW & Liu CW (2012), Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy and Water Environment 10(3), 209-222.

[6] Chuan ZL, Ismail N, Shinyie WL, Ken TL, Fam SF, Senawi A & Yusoff WNSW (2018), The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conference Series: Materials Science and Engineering 342, 012070, doi:10.1088/1757-899X/342/1/012070.

[7] Hwang CL & Yoon K (1981), Multiple attribute decision making methods and applications a state art-of-the-art survey. Springer-Verlag, Heidelberg.

[8] Oba S, Sato M, Takemasa I, Monden M, Matsubara K & Ishii S (2003), A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088-2096.

[9] Saeed GAA, Chuan ZL, Zakaria R, Yusoff WNSW & Salleh MZ (2016), Determination of the best single imputation algorithm for missing rainfall data treatment. Journal of Quality Measurement and Analysis 12(1-2), 79-87.

[10] Teegavarapu RSV & Chandramouli V (2005), Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. Journal of Hydrology 312(1-4), 191-206.

[11] Tipping ME & Bishop CM (1997), Mixtures of principal component analysers. Proceedings of the 5th International Conference on Artificial Neural Networks, 13-18, doi: 10/1049/cp:19970694.

View Full Article: