PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Assessing the efficiency of a random forest regression model for estimating water quality indicators

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This work evaluates the efficiency of Random Forest (RF) regression for predicting water quality indicators and investigates factors affecting water quality in 11 watersheds in Virginia, District of Columbia, and Maryland. Ten years of daily water quality data along with hydro-meteorological information (such as precipitation) and watershed physiology and characteristics (e.g., size, soil type, land use) are used to predict dissolved oxygen (DO), specific conductivity (K), and turbidity (Tu) across the selected watersheds. The RF regression model is developed for six scenarios, with an increasing number of predictors introduced in each scenario. The first scenario contains the smallest amount of information (water quality indicators DO, K and Tu), while scenario 6 contains all the available variables. The RF model is evaluated based on three statistical metrics: the relative root mean square error, the correlation coefficient, and the percentage of variance explained. In addition, the degree of importance for each predictor is used to rank their importance within each scenario. The model shows excellent performance for DO as the predicted variable. The model predicting K slightly outperforms the one predicting Tu. Scenario 4 (built based on water quality indicators, hydro-meteorological data, watershed physiology and land cover information) provided the best tradeoff between performance and efficiency (quantified in terms of the amount of information needed to develop the model). In conclusion, based on the RF model, land cover plays a significant role in predicting water quality indicators. In addition, the developed RF regression model is adaptable to watersheds in this region over a range of climates.
Twórcy
  • Department of Civil, Environmental, and Infrastructure Engineering, George Mason University
  • Department of Civil, Environmental, and Infrastructure Engineering, George Mason University
  • Department of Civil, Environmental, and Infrastructure Engineering, George Mason University
  • Eversource Energy Center, University of Connecticut, Storrs, CT
Bibliografia
  • Akoto O., Abankwa E., 2014, Evaluation of Owabi Reservoir (Ghana) water quality using factor analysis, Lakes & Reservoirs: Science, Policy and Management for Sustainable Use, 19 (3), 174-182, DOI: 10.1111/lre.12066.
  • Al-Abadi A.M., Fryar A.E., Rasheed A.A., Pradhan B., 2021, Assessment of groundwater potential in terms of the availability and quality of the resource: a case study from Iraq, Environmental Earth Sciences, 80 (12), DOI: 10.1007/s12665-021-09725-0.
  • Amiri B.J., Nakane K., 2009, Comparative prediction of stream water total nitrogen from land cover using artificial neural network and multiple linear regression, Polish Journal of Environmental Studies, 18 (2), 151-160.
  • Boulesteix A.-L., Janitza S., Kruppa J., König I.R., 2012, Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics, WIREs Data Mining and Knowledge Discovery, 2 (6), 493-507, DOI: 10.1002/widm.1072.
  • Breiman L., 2001, Random forests, Machine Learning, 45 (1), 5-32, DOI: 10.1023/A:1010933404324.
  • Breiman L., Friedman J.H., Olshen R.A., Stone C.J., 1993, Classification and Regression Trees, Wadsworth Statistics/Probability Series, Chapman & Hall, New York, N.Y., 368 pp.
  • Burkholder J., Libra B., Weyer P., Heathcote S., Kolpin D., Thorne P.S., Wichman M., 2007, Impacts of waste from concentrated animal feeding operations on water quality, Environmental Health Perspectives, 115 (2), 308-312, DOI: 10.1289/ehp.8839.
  • Chen G., Long T., Xiong J., Bai Y., 2017., Multiple random forests modelling for urban water consumption forecasting, Water Resources Management, 31 (15), 4715-4729, DOI: 10.1007/s11269-017-1774-7.
  • Chen S., Fang G., Huang X., Zhang Y., 2018, Water quality prediction model of a water diversion project based on the improved artificial bee colony-backpropagation neural network, Water, 10 (6), DOI: 10.3390/w10060806.
  • Dalwadi N., Padole M., 2019, The Internet of Things based water quality monitoring and control, Smart Innovation, Systems and Technologies. Innovations in Computing, 141, 409-417, DOI: 10.1007/978-981-13-8406-6_39.
  • Devi G., 2019, Random forest advice for water quality prediction in the regions of Kadapa District, International Journal of Innovative Technology and Exploring Engineering, 8 (6S4), 1464-1466, DOI: 10.35940/ijitee.F1298.0486S419.
  • Díaz-Uriarte R., Alvarez de Andrés A., 2006, Gene selection and classification of microarray data using random forest, BMC Bioinformatics, 7 (1), DOI: 10.1186/1471-2105-7-3.
  • Dubois D., Prade H., 1992, Putting rough sets and fuzzy sets together, [in:] Intelligent Decision Support, R. Słowiński (ed.), Springer Netherlands, Dordrecht, 203-232, DOI: 10.1007/978-94-015-7975-9_14.
  • Dufour A., Bartram J., Bos R., 2012, Animal Waste, Water Quality and Human Health, IWA Publishing, London, 489.
  • Fox E.W., Ver Hoef J.M., Olsen A.R., 2020, Comparing spatial regression to random forests for large environmental data sets, PLOS ONE, 15 (3), e0229509, DOI: 10.1371/journal.pone.0229509.
  • Galloway J.M., 2002, Simulation of Hydrodynamics, Temperature, and Dissolved Oxygen in Norfork Lake, Arkansas, 1994-1995, Water-Resources Investigations Report 02, Little Rock, Ark: USDeptof the Interior, USGeological Survey.
  • Golkarian A., Naghibi S.A., Kalantar B., Pradhan B., 2018, Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS, Environmental Monitoring and Assessment, 190 (3), 1-16, DOI: 10.1007/s10661- 018-6507-8.
  • Han H., Guo X., Yu H., 2016, Variable selection using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest, [in:] 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), 219-224, DOI: 10.1109/ICSESS.2016.7883053.
  • Hulten G., 2018, Building Intelligent Systems: A Guide to Machine Learning Engineering, Apress, New York, 339 pp., DOI: 10.1007/978-1-4842-3432-7.
  • Imani M., Hasan M.M., Bittencourt L.F., McClymont K., Kapelan Z., 2021, A novel machine learning application: water quality resilience prediction model, Science of The Total Environment, 768, DOI: 10.1016/j.scitotenv.2020.144459.
  • Inserillo E.A., Green M.B., Shanley J.B., Boyer J.N., 2017, Comparing catchment hydrologic response to a regional storm using specific conductivity sensors, Hydrological Processes, 31 (5), 1074-1085, DOI: 10.1002/hyp.11091.
  • Jadhav M.S., Khare K.C., Warke A.S., 2015, Water quality prediction of Gangapur Reservoir (India) using LS-SVM and genetic programming, Lakes & Reservoirs: Science, Policy and Management for Sustainable Use, 20 (4), 275-284, DOI: 10.1111/lre.12113.
  • Jeong K.-S., Joo G.-J., Kim H.-W., Ha K., Recknagel F., 2001, Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network, Ecological Modelling, 146 (1-3), 115-129, DOI: 10.1016/S0304- 3800(01)00300-3.
  • Karamizadeh S., Abdullah S.M., Manaf A.A., Zamani M., Hooman A., 2013, An overview of principal component analysis, Journal of Signal and Information Processing, 4 (3B), 173-175, DOI: 10.4236/jsip.2013.43B031.
  • Kelly V.J., 1997, Dissolved oxygen in the Tualatin River, Oregon, during winter flow conditions, 1991 and 1992, United States Geological Survey Water-Supply Paper, 2465, U.S. Geological Survey, 74 pp., DOI: 10.3133/ofr95451.
  • Kijewski T., Zbawicka M., Strand J., Kautsky H., Kotta J., Rätsep M., Wenne R., 2019, Random forest assessment of correlation between environmental factors and genetic differentiation of populations: case of marine mussels Mytilus, Oceanologia, 61 (1), 131-142, DOI: 10.1016/j.oceano.2018.08.002.
  • Kumar S., Moglen G.E., Godrej A.N., Grizzard T.J., Post H.E., 2018, Trends in water yield under climate change and urbanization in the US Mid-Atlantic region, Journal of Water Resources Planning and Management, 144 (8), DOI: 10.1061/(ASCE)WR.1943-5452.0000937.
  • Lagomarsino D., Tofani V, Segoni S, Catani F., Casagli N., 2017, A tool for classification and regression using random forest methodology: applications to landslide susceptibility mapping and soil thickness modeling, Environmental Modeling & Assessment, 22 (3), 201-214, DOI: 10.1007/s10666-016-9538-y.
  • Li M., Zhang Y., Wallace J., Campbell E., 2020, Estimating annual runoff in response to forest change: a statistical method based on random forest, Journal of Hydrology, 589, DOI: 10.1016/j.jhydrol.2020.125168.
  • Liaw A., Wiener M., 2002, Classification and regression by randomForest, R News, 2-3, 18-22.
  • Long W.J., Griffith J.L., Selker H.P., D’Agostino R.B., 1993, A comparison of logistic regression to decision-tree induction in a medical domain, Computers and Biomedical Research, 26 (1), 74-97, DOI: 10.1006/cbmr.1993.1005.
  • Mezrich J.J., 1994, When is a tree a hedge?, Financial Analysts Journal, 50 (6), 75-81, DOI: 10.2469/faj.v50.n6.75.
  • Mitchell T.M., 2013, Machine Learning, McGraw-Hill Series in Computer Science, McGraw-Hil New York.
  • Najah A.A., Othman F.B., Afan H.A., Ibrahim R.K., Fai C.M., Hossain M.S., Ehteram M., Elshafie A., 2019, Machine learning methods for better water quality prediction, Journal of Hydrology, 578, DOI: 10.1016/j.jhydrol.2019.124084.
  • Norouzi H., Moghaddam A.A., 2020, Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran), Arabian Journal of Geosciences, 13 (18), DOI: 10.1007/s12517-020-05904-8.
  • Papacharalampous G.A., Tyralis H., 2018, Evaluation of random forests and Prophet for daily streamflow forecasting, Advances in Geosciences, 45, 201-218, DOI: 10.5194/adgeo-45-201-2018.
  • Parkhurst D.F., Brenner K.P., Dufour A.P., Wymer L.J., 2005, Indicator bacteria at five swimming beaches - analysis using random forests, Water Research 39 (7), 1354-1360, DOI: 10.1016/j.watres.2005.01.001.
  • Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Prrot M., Duchesnay E., 2011, Scikit-learn: machine learning in Python, The Journal of Machine Learning Research, 12 (85), 2825-2830.
  • Rokach L., 2010, Ensemble-based classifiers, Artificial Intelligence Review, 33 (1-2), 1-39, DOI: 10.1007/s10462-009-9124-7.
  • Saadi M., Oudin L., Ribstein P., 2019, Random forest ability in regionalizing hourly hydrological model parameters, Water, 11 (8), DOI: 10.3390/w11081540.
  • Sameen M.I., Pradhan B., Lee S., 2019, Self-learning random forests model for mapping groundwater yield in data-scarce areas, Natural Resources Research, 28 (3), 757-775, DOI: 10.1007/s11053-018-9416-1.
  • Singh B., Sihag P., Singh K., 2017, Modelling of impact of water quality on infiltration rate of soil by random forest regression, Modeling Earth Systems and Environment, 3 (3), 999-1004, DOI: 10.1007/s40808-017-0347-3.
  • Smith D.E., Leffler M., Mackiernan G., 1992, Oxygen Dynamics in the Chesapeake Bay: A Synthesis of Recent Research, technical report, College Park, Md: Maryland Sea Grant College in cooperation with the Virginia Sea Grant College.
  • Solakian J., Maggioni V., Godrej A.N., 2020, On the performance of satellite-based precipitation products in simulating streamflow and water quality during hydrometeorological extremes, Frontiers in Environmental Science, (8), DOI: 10.3389/fenvs.2020.585451.
  • Tesoriero A.J., Gronberg J.A., Juckem P.F., Miller M.P., Austin B.P., 2017, Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification, Water Resources Research, 53 (8), 7316-7331, DOI: 10.1002/2016WR020197.
  • Tiyasha T.M.T., Yaseen Z.M., 2020, A survey on river water quality modelling using artificial intelligence models: 2000-2020, Journal of Hydrology, 585, DOI: 10.1016/j.jhydrol.2020.124670.
  • Tyralis H., Papacharalampous G., Langousis A., 2019, A brief review of random forests for water scientists and practitioners and their recent history in water resources, Water, 11 (5), 910, DOI: 10.3390/w11050910.
  • Wang F., Wang Y., Zhang K., Hu M., Wenig Q., Zhang H., 2021, Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation, Environmental Research, 202, DOI: 10.1016/j.envres.2021.111660.
  • Wang X., Liu T., Zheng X., Peng H., Xin J., Zhang B., 2018, Short-term prediction of groundwater level using improved random forest regression with a combination of random features, Applied Water Science, 8 (5), DOI: 10.1007/s13201-018-0742-6.
  • Wang X., Zhang F., Ding J., 2017, Evaluation of water quality based on a machine learning algorithm and water quality index for the Ebinur Lake watershed, China, Scientific Reports, 7 (1), DOI: 10.1038/s41598-017-12853-y.
  • Wu D., Wang H., Seidu R., 2020, Smart data driven quality prediction for urban water source management, Future Generation Computer Systems, 107, 418-432, DOI: 10.1016/j.future.2020.02.022.
  • Yu X., Shen J., Du J., 2020, A machine-learning-based model for water quality in coastal waters, taking dissolved oxygen and hypoxia in Chesapeake Bay as an example, Water Resources Research, 56 (9), DOI: 10.1029/2020WR027227.
  • Zabihi M., Pourghasemi H.R., Pourtaghi Z.S., Behzadfar M., 2016, GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran, Environmental Earth Sciences, 75 (8), DOI: 10.1007/s12665-016-5424-9.
  • Zavareh M., Maggioni V., 2018, Application of rough set theory to water quality analysis: a case study, Data, 3 (4), DOI: 10.3390/data3040050.
  • Zavareh M., Maggioni V., Sokolov V., 2021, Investigating water quality data using principal component analysis and granger causality, Water, 13 (3), DOI: 10.3390/w13030343.
  • Zhang Q., Murphy R.R., Tian R., Forsyth M.K., Trentacoste E.M., Keisman J., Tango P.J., 2018, Chesapeake Bay’s water quality condition has been recovering: insights from a multimetric indicator assessment of thirty years of tidal monitoring data, Science of the Total Environment, 637-638, 1617-1625, DOI: 10.1016/j.scitotenv.2018.05.025.
  • Zhao D., Wu Q., Cui F., Xu H., Zeng Y., Cao Y., Du Y., 2018, Using random forest for the risk assessment of coal-floor water inrush in Panjiayao coal mine, Northern China, Hydrogeology Journal, 26 (7), 2327-2340, DOI: 10.1007/s10040-018-1767-5.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1ef5d96e-6d0c-4726-8eb9-aa15cda1a9cc
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.