PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!
Tytuł artykułu

Investigating the effects of local weather, streamflow lag, and global climate information on 1 month ahead streamflow forecasting by using XGBoost and SHAP: two case studies involving the contiguous USA

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The use of machine learning (ML) models for streamflow forecasting has recently proved highly successful. However, ML is typically criticized for a lack of interpretability. Here, we develop an interpretable ML model for 1-month-ahead streamflow forecasting using extreme gradient boosting (XGBoost) and Shapley additive explanations (SHAP). In addition to a performance evaluation of XGBoost compared to regression tree and random forest approaches, the effects of input variables, including local weather, streamflow lag, and global climate, on streamflow were interpreted in terms of SHAP total effect values, main effect values, interaction values, and loss values. The experimental results at two catchments in the contiguous USA are significant in four ways. First, XGBoost was superior to the other two models in terms of Nash–Sutclife efficiency, mean absolute error, root mean square error, and correlation coefficient. Second, by aggregating SHAP values, we found that the contributions of these variables to streamflow differed according to the investigated local perspectives, including streamflow at different months, low streamflow, medium streamflow, high streamflow, and peak streamflow. Third, the SHAP main effect and interaction values revealed that nonmonotonic relationships may occur between the input variables and streamflow, and the strength of variable interaction effects might be related to the variable values rather than their correlations. Fourth, variable drifts in the testing set were deduced from SHAP loss values. These findings exhibit positive significance for understanding ML for monthly streamflow forecasting.
Czasopismo
Rocznik
Strony
905--925
Opis fizyczny
Bibliogr. 55 poz.
Twórcy
autor
  • Key Laboratory of the Pearl River Estuary Regulation and Protection of Ministry of Water Resources, Guangzhou 510611, China
  • Pearl River Water Resources Research Institute, Guangzhou 510611, China
autor
  • Key Laboratory of the Pearl River Estuary Regulation and Protection of Ministry of Water Resources, Guangzhou 510611, China
  • North China University of Water Resources and Electric Power, Zhengzhou 450046, China
  • School of Civil Engineering and Architecture, Wuhan University of Technology, Wuhan 430070, China
autor
  • North China University of Water Resources and Electric Power, Zhengzhou 450046, China
autor
  • North China University of Water Resources and Electric Power, Zhengzhou 450046, China
autor
  • Henan Province Tobacco Company Luoyang Company, Luoyang 471012, Chin
Bibliografia
  • 1. Addor N, Newman AJ, Mizukami N, Clark MP (2017) The CAMELS data set: catchment attributes and meteorology for large-sample studies. Hydrol Earth Syst Sci 21:5293–5313. https://doi.org/10.5194/hess-21-5293-2017
  • 2. Adler J, Parmryd I (2010) Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytom Part A 77:733–742
  • 3. Albanese D, Filosi M, Visintainer R et al (2013) Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics 29:407–408. https://doi.org/10.1093/bioinformatics/bts707
  • 4. Althoff D, Rodrigues LN (2021) Goodness-of-fit criteria for hydrological models: model calibration and performance assessment. J Hydrol 600:126674. https://doi.org/10.1016/j.jhydrol.2021.126674
  • 5. Apaydin H, Sibtain M (2021) A multivariate streamflow forecasting model by integrating improved complete ensemble empirical mode decomposition with additive noise, sample entropy, Gini index and sequence-to-sequence approaches. J Hydrol 603:126831. https://doi.org/10.1016/j.jhydrol.2021.126831
  • 6. Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260
  • 7. Benninga HJF, Booij MJ, Romanowicz RJ, Rientjes THM (2017) Performance of ensemble streamflow forecasts under varied hydrometeorological conditions. Hydrol Earth Syst Sci 21:5273–5291. https://doi.org/10.5194/hess-21-5273-2017
  • 8. Bojer CS, Meldgaard JP (2021) Kaggle forecasting competitions: an overlooked learning opportunity. Int J Forecast 37:587–603. https://doi.org/10.1016/j.ijforecast.2020.07.007
  • 9. Chakraborty D, Başağaoğlu H, Winterle J (2021) Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling. Expert Syst Appl 170. https://doi.org/10.1016/j.eswa.2020.114498
  • 10. Charles SP, Wang QJ, Ahmad MUD et al (2018) Seasonal streamflow forecasting in the upper Indus Basin of Pakistan: an assessment of methods. Hydrol Earth Syst Sci 22:3533–3549. https://doi.org/10.5194/hess-22-3533-2018
  • 11. Chatzimparmpas A, Martins RM, Jusufi I, Kerren A (2020) A survey of surveys on the use of visualization for interpreting machine learning models. Inf vis 19:207–233. https://doi.org/10.1177/1473871620904671
  • 12. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, pp 785–794
  • 13. Du M, Liu N, Hu X (2020) Techniques for interpretable machine learning. Commun ACM 63:68–77. https://doi.org/10.1145/3359786
  • 14. Elshawi R, Al-Mallah MH, Sakr S (2019) On the interpretability of machine learning-based model for predicting hypertension. BMC Med Inform Decis Mak 19. https://doi.org/10.1186/s12911-019-0874-0
  • 15. Fang W, Huang S, Huang Q et al (2018) Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information. J Hydrol 561:764–779. https://doi.org/10.1016/j.jhydrol.2018.04.038
  • 16. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232. https://doi.org/10.1214/aos/1013203451
  • 17. Galelli S, Humphrey GB, Maier HR et al (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51. https://doi.org/10.1016/j.envsoft.2014.08.015
  • 18. Gao G, Ning Z, Li Z, Fu B (2021) Prediction of long-term inter-seasonal variations of streamflow and sediment load by state-space model in the Loess Plateau of China. J Hydrol 600:126534. https://doi.org/10.1016/j.jhydrol.2021.126534
  • 19. Gauch M, Lin J (2020) A data scientist’s guide to streamflow prediction. arXiv preprint arXiv:2006.12975
  • 20. Gauch M, Mai J, Lin J (2021) The proper care and feeding of CAMELS: How limited training data affects streamflow prediction. Environ Model Softw 135:104926. https://doi.org/10.1016/j.envsoft.2020.104926
  • 21. Hadi SJ, Tombul M (2018) Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination. J Hydrol 561:674–687. https://doi.org/10.1016/j.jhydrol.2018.04.036
  • 22. Hagen JS, Leblois E, Lawrence D et al (2021) Identifying major drivers of daily streamflow from large-scale atmospheric circulation with machine learning. J Hydrol 596:126086. https://doi.org/10.1016/j.jhydrol.2021.126086
  • 23. Kalra A, Ahmad S, Nayak A (2013) Increasing streamflow forecast lead time for snowmelt-driven catchment based on large-scale climate patterns. Adv Water Resour 53:150–162. https://doi.org/10.1016/j.advwatres.2012.11.003
  • 24. Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69:66138
  • 25. Lavers DA, Hannah DM, Bradley C (2015) Connecting large-scale atmospheric circulation, river flow and groundwater levels in a chalk catchment in southern England. J Hydrol 523:179–189. https://doi.org/10.1016/j.jhydrol.2015.01.060
  • 26. Lundberg SM, Erion G, Chen H et al (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2:2522–5839
  • 27. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg U V, Bengio S et al (eds) Advances in neural information processing systems 30. Curran Associates, Inc., Red Hook, pp 4765–4774
  • 28. Maity R, Kashid SS (2011) Importance analysis of local and global climate inputs for basin-scale streamflow prediction. Water Resour Res 47:1–17. https://doi.org/10.1029/2010WR009742
  • 29. Masrur Ahmed AA, Deo RC, Feng Q et al (2021) Deep learning hybrid model with Boruta-Random forest optimiser algorithm for streamflow forecasting with climate mode indices, rainfall, and periodicity. J Hydrol 599:126350. https://doi.org/10.1016/j.jhydrol.2021.126350
  • 30. May RJ, Maier HR, Dandy GC, Fernando TMKG (2008) Non-linear variable selection for artificial neural networks using partial mutual information. Environ Model Softw 23:1312–1326. https://doi.org/10.1016/j.envsoft.2008.03.007
  • 31. Murdoch WJ, Singh C, Kumbier K et al (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116:22071–22080. https://doi.org/10.1073/pnas.1900654116
  • 32. Nearing GS, Kratzert F, Sampson AK et al (2021) What role does hydrological science play in the age of machine learning? Water Resour Res 57. https://doi.org/10.1029/2020WR028091
  • 33. Ni L, Wang D, Wu J et al (2020) Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J Hydrol 586:124901. https://doi.org/10.1016/j.jhydrol.2020.124901
  • 34. Ogunleye A, Wang Q-G (2019) XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans Comput Biol Bioinforma 17:2131–2140
  • 35. Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning methods with weather and climate inputs. J Hydrol 414–415:284–293. https://doi.org/10.1016/j.jhydrol.2011.10.039
  • 36. Ren K, Fang W, Qu J et al (2020) Comparison of eight filter-based feature selection methods for monthly streamflow forecasting - three case studies on CAMELS data sets. J Hydrol 586:124897. https://doi.org/10.1016/j.jhydrol.2020.124897
  • 37. Ren K, Wang X, Shi X et al (2021) Examination and comparison of binary metaheuristic wrapper-based input variable selection for local and global climate information-driven one-step monthly streamflow forecasting. J Hydrol 597:126152. https://doi.org/10.1016/j.jhydrol.2021.126152
  • 38. Reshef DN, Reshef YA, Finucane HK et al (2011) Detecting novel associations in large data sets. Science 80(334):1518–1524. https://doi.org/10.1126/science.1205438
  • 39. Ribeiro MT, Singh S, Guestrin C (2016) “Why Should I Trust You?” Explaining the predictions of any classifier. In: NAACL-HLT 2016–2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session. pp 97–101
  • 40. Sagarika S, Kalra A, Ahmad S (2015) Interconnections between oceanic-atmospheric indices and variability in the U.S. streamflow. J Hydrol 525:724–736. https://doi.org/10.1016/j.jhydrol.2015.04.020
  • 41. Shapley LS (2016) A value for n-Person Games. In: Kuhn HW, Tucker AW (eds) Contributions to the Theory of Games (AM-28), Volume II. Princeton University Press, New Jersey, pp 307–318
  • 42. Shortridge JE, Guikema SD, Zaitchik BF (2016) Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrol Earth Syst Sci 20:2611–2628. https://doi.org/10.5194/hess-20-2611-2016
  • 43. Stein M (1987) Large sample properties of simulations using Latin hypercube sampling. Technometrics 29:143–151. https://doi.org/10.1080/00401706.1987.10488205
  • 44. Thornton PE, Thornton MM, Mayer BW et al (2014) Daymet: daily surface weather data on a 1-km grid for North America, Version 2. Data set. Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, USA.
  • 45. Vega García M, Aznarte JL (2020) Shapley additive explanations for NO2 forecasting. Ecol Inform 56:101039. https://doi.org/10.1016/j.ecoinf.2019.101039
  • 46. Wang J, Wang X, Lei X et al (2020) Teleconnection analysis of monthly streamflow using ensemble empirical mode decomposition. J Hydrol 582:124411. https://doi.org/10.1016/j.jhydrol.2019.124411
  • 47. Wang K, Tian J, Zheng C et al (2021) Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 137:104813. https://doi.org/10.1016/j.compbiomed.2021.104813
  • 48. Wang S, Peng H, Liang S (2022) Prediction of estuarine water quality using interpretable machine learning approach. J Hydrol 605:127320. https://doi.org/10.1016/j.jhydrol.2021.127320
  • 49. Wen X, Xie Y, Wu L, Jiang L (2021) Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accid Anal Prev 159:106261. https://doi.org/10.1016/j.aap.2021.106261
  • 50. Yang T, Asanjan AA, Welles E et al (2017) Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information. Water Resour Res 53:2786–2812. https://doi.org/10.1002/2017WR020482
  • 51. Yang S, Yang D, Chen J et al (2020) A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data. J Hydrol 590:125206. https://doi.org/10.1016/j.jhydrol.2020.125206
  • 52. Yaseen ZM, El-shafie A, Jaafar O et al (2015) Artificial intelligence based models for stream-flow forecasting: 2000–2015. J Hydrol 530:829–844. https://doi.org/10.1016/j.jhydrol.2015.10.038
  • 53. Yu X, Wang Y, Wu L et al (2020) Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. J Hydrol 582:124293. https://doi.org/10.1016/j.jhydrol.2019.124293
  • 54. Zhang H, Yang Q, Shao J, Wang G (2019) Dynamic streamflow simulation via online gradient-boosted regression tree. J Hydrol Eng 24:04019041. https://doi.org/10.1061/(asce)he.1943-5584.0001822
  • 55. Zhu X, Chu J, Wang K et al (2021) Prediction of rockhead using a hybrid N-XGBoost machine learning framework. J Rock Mech Geotech Eng 13:1231–1245. https://doi.org/10.1016/j.jrmge.2021.06.012
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6766ae01-63fd-4c40-8212-efc54d077a81
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.