OUTLIER DETECTION IN RAINFALL DATA USING EXTREME VALUE THEORY
DOI:
https://doi.org/10.11113/jurnalteknologi.v87.22617Keywords:
Extreme value theory, outlier detection, rainfall series, univariate data, synthetic outlierAbstract
Extreme rainfall modelling has gained increased attention in recent decades due to its importance for spatial analysis and risk assessment. Similar to any statistical analysis, stochastic modelling involving extreme data is susceptible to errors due to presence of outliers. However, the precise definition of outliers and extreme events remains vague despite the extensive research on the topic. The current outlier detection method often assumes that the sample data follows a normal distribution, which is implausible for rainfall data due to its positively-skewed and heavy tail characteristics. In this study, we focus on eliminating the presence of outlier in daily rainfall series while ensuring the preservation of observed extreme events through the implementation of Extreme Value Theory. The contribution of this study is two folds; foremost, the Peaks-Over-Threshold (POT) algorithm is demonstrated for outlier detection in univariate rainfall data. Secondly, the study introduces an algorithm for generating synthetic outlier using Gamma distribution. The algorithm's performance was tested in various settings using simulated rainfall data to evaluate its effectiveness and dependability before applying it to real data. The result indicates that the algorithm successfully identified outliers without affecting the extreme daily precipitation values in the sample dataset. This finding will greatly enhance future research by improving data quality management, hence enabling more precise analysis of extreme rainfall events.
References
Mayowa, O. O., Pour, S. H., Shahid, S., Mohsenipour, M., Harun, S. Bin, Heryansyah, A., & Ismail, T. 2015. Trends in Rainfall and Rainfall-related Extremes in the East Coast of Peninsular Malaysia. Journal of Earth System Science. 124(8): 1609–1622.
Doi: https://doi.org/10.1007/s12040-015-0639-9.
Bin Luhaim, Z., Tan, M. L., Tangang, F., Zulkafli, Z., Chun, K. P., Yusop, Z., & Yaseen, Z. M. 2021. Drought Variability and Characteristics in the Muda River Basin of Malaysia from 1985 to 2019. Atmosphere. 12(9): 1–19.
Doi: https://doi.org/10.3390/atmos12091210.
Latif, S., & Mustafa, F. 2020. Parametric Vine Copula Construction for Flood Analysis for Kelantan River Basin in Malaysia. Civil Engineering Journal (Iran). 6(8): 1470–1491.
Doi: https://doi.org/10.28991/cej-2020-03091561.
Nashwan, M. S., Ismail, T., & Ahmed, K. 2019. Non-stationary Analysis of Extreme Rainfall in Peninsular Malaysia. Journal of Sustainability Science and Management. 14(3): 17–34.
Hao, Z., & Singh, V. P. 2013. Modeling Multisite Streamflow Dependence with Maximum Entropy Copula. Water Resources Research. 49(10): 7139–7143.
Doi: https://doi.org/10.1002/wrcr.20523.
Ma, J., Cui, B., Hao, X., He, P., Liu, L., & Song, Z. 2022. Analysis of Hydrologic Drought Frequency using Multivariate Copulas in Shaying River Basin. Water (Switzerland). 14(8): 1–18.
Doi: https://doi.org/10.3390/w14081306.
Radi, N. F. A., Zakaria, R., & Satari, S. Z. 2017. Generating Monthly Rainfall Amount using Multivariate Skew-t Copula. Journal of Physics: Conference Series. 890(1).
Doi: https://doi.org/10.1088/1742-6596/890/1/012133.
Win, N. L., & Win, K. M. 2014. The Probability Distributions of Daily Rainfall for Kuantan River Basin in Malaysia. International Journal of Science and Research. 3(8): 977–983.
Radi, N. F. A., Zakaria, R., Piantadosi, J., Boland, J., Wan Zin, W. Z., & Azman, M. A. zuhri. 2017. Generating Synthetic Rainfall Total Using Multivariate Skew-t and Checkerboard Copula of Maximum Entropy. Water Resources Management. 31(5): 1729–1744.
Doi: https://doi.org/10.1007/s11269-017-1597-6.
Marik, R. 2018. Thresholding using Extreme Value Theory Threshold Models. Proceedings of the 2018 18th International Conference on Mechatronics - Mechatronika, ME 2018. 1–8.
Liao, X., Wang, T., & Zou, G. 2023a. A Method for Detecting Outliers from the gamma Distribution. Axioms. 12(2).
Doi: https://doi.org/10.3390/axioms12020107.
Asikoglu, O. L. 2017. Outlier Detection in Extreme Value Series. Journal of Multidisciplinary Engineering Science and Technology (JMEST). 4(5): 2458–9403. www.jmest.org.
Gbenro, N. 2020. Using Extreme Value Theory to Test for Outliers. https://ssrn.com/abstract=3516056.
Kim, Y., Kim, D., Park, J., & Jun, C. 2024. An Effective Algorithm of Outlier Correction in Space-time Radar Rainfall Data based on the Iterative Localized Analysis. IEEE Transactions on Geoscience and Remote Sensing. 1–1.
Doi: https://doi.org/10.1109/tgrs.2024.3366400.
Zakaria, R., Ahmad Radi, N. F., & Satari, S. Z. 2017. Extraction Method of Extreme Rainfall Data. Journal of Physics: Conference Series. 890(1).
Doi: https://doi.org/10.1088/1742-6596/890/1/012154.
Mallick, J., Talukdar, S., Alsubih, M., Salam, R., Ahmed, M., Kahla, N. Ben, & Shamimuzzaman, M. 2021. Analysing the Trend of Rainfall in Asir Region of Saudi Arabia using the Family of Mann-Kendall Tests, Innovative Trend Analysis, and Detrended Fluctuation Analysis. Theoretical and Applied Climatology. 143(1–2): 823–841.
Doi: https://doi.org/10.1007/s00704-020-03448-1.
Mahajan, M., Kumar, S., Pant, B., & Khan, R. 2021. Improving Accuracy of Air Pollution Prediction by Two Step Outlier Detection. Proceedings of the 2021 1st International Conference on Advances in Electrical, Computing, Communications and Sustainable Technologies, ICAECT 2021.
DOI: https://doi.org/10.1109/ICAECT49130.2021.9392404.
Suhaila, J. 2023. Tweedie Models for Malaysia Rainfall Simulations with Seasonal Variabilities. Journal of Water and Climate Change. 14(10): 3648–3670.
Doi: https://doi.org/10.2166/wcc.2023.275.
Walfish, S. 2006. A Review of Statistical Outlier Methods. Pharmaceutical Technology. 30(11): 82–86.
Bhattacharya, S., Kamper, F., & Beirlant, J. 2023. Outlier Detection based on Extreme Value Theory and Applications. Scandinavian Journal of Statistics. 50(3): 1466–1502.
Doi: https://doi.org/10.1111/sjos.12665.
Siffer, A., Fouque, P. A., Termier, A., & Largouet, C. 2017. Anomaly Detection in Streams with Extreme Value Theory. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F1296. 1067–1075.
Doi: https://doi.org/10.1145/3097983.3098144.
Ghani, N. A. A. abdul, Mohamad, N. A., & Hui, T. W. 2016. Rainfall Analysis to Determine the Potential of Rainwater Harvesting Site in Kuantan, Pahang. Journal of Engineering and Applied Sciences. 11
Wong, C. L., Venneker, R., Uhlenbrook, S., Jamil, a. B. M., & Zhou, Y. 2009. Variability of Rainfall in Peninsular Malaysia. Hydrology and Earth System Sciences Discussions. 6(4): 5471–5503.
Doi: https://doi.org/10.5194/hessd-6-5471-2009.
Lebay, M., & Le, M. 2020. Techniques of Filling Missing Values of Daily and Monthly Rain Fall Data: A Review. SF Journal of Environmental and Earth Science. 3(1): 1036. https://scienceforecastoa.com/
Zakaria, R., Boland, J. W., & Moslim, N. H. 2013. Comparison of Sum of Two Correlated Gamma Variables for Alouini’s Model and McKay Distribution. Proceedings - 20th International Congress on Modelling and Simulation, MODSIM 2013, December, 408–414.
Doi: https://doi.org/10.36334/modsim.2013.a9.zakaria.
Husak, G. J., Michaelsen, J., & Funk, C. 2008. Use of the Gamma Distribution to Represent Monthly Rainfall in Africa for Drought Monitoring Applications. International Journal of Climatology. 2029(March 2008). 2011–2029.
Doi: https://doi.org/10.1002/joc.
Soleh, A. M., Wigena, A. H., Djuraidah, A., & Saefuddin, A. 2016. gamma Distribution Linear Modeling with Statistical Downscaling to Predict Extreme Monthly Rainfall in Indramayu. International Conference on Mathematics, Statistics, and Their Applications (ICMSA).
Doi: https://doi.org/10.1109/ICMSA.2016.7954325.
Song, S., & Singh, V. P. 2010. Meta-elliptical Copulas for Drought Frequency Analysis of Periodic Hydrologic Data. Stochastic Environmental Research and Risk Assessment, 24(3): 425–444.
DOI: https://doi.org/10.1007/s00477-009-0331-1.
Boluwade, A., Sheridan, P., & Farooque, A. A. 2024. Spatial Modeling of Extreme Temperature in the Canadian Prairies using Max-Stable Processes. Results in Engineering. 101879. Doi: https://doi.org/10.1016/j.rineng.2024.101879.
Tingfeng Liu, Hui Gao, & Jianjun Wu. 2020. Review of Outlier Detection Algorithms Based on Grain Storage Temperature Data. International Conference on Artificial Intelligence and Computer Applications (ICAICA).
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.













