Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear

Szeląg, B.; Bartkiewicz, L.; Studziński, J.; Barbusiński, K.

doi:10.1515/aep-2017-0030

Artykuł - szczegóły

Tytuł artykułu

Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear

Autorzy

Szeląg B. , Bartkiewicz L. , Studziński J. , Barbusiński K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.1515/aep-2017-0030

Warianty tytułu

Ocena wpływu zmiennych objaśniających na dokładność predykcji dobowego dopływu do oczyszczalni ścieków wybranymi modelami nieliniowymi

Języki publikacji

Abstrakty

The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.

Celem pracy jest ocena możliwości zastosowania różnych metod data mining do modelowania dopływu ścieków do komunalnej oczyszczalni ścieków. Do opracowania modeli statystycznych metodą wektorów nośnych, lasów losowych, k – najbliższego sąsiada i regresji Kernela wykorzystano szeregi pomiarowe dobowych wartości opadów deszczu, stanów wody w odbiorniku oraz dopływów do komunalnej oczyszczalni ścieków w Rzeszowie. Z obliczeń wykonanych metodami SVM, RF, k-NN i K wynika, że dla modeli z jedną zmienną objaśniającą opóźnioną o dobę w stosunku do wartości dopływu, najlepsze wyniki otrzymano modelem autoregresyjnym bazującym na metodzie k-NN a najgorsze regresją Kernela. W przypadku modeli z dwoma zmiennymi objaśniającymi najmniejsze wartości błędów uzyskano, dla modeli uwzględniających dopływ ścieków i całkowitą wysokość opadu deszczu z jednodobowym opóźnieniem; najlepsze wyniki uzyskano metodą RF a najgorsze regresji Kernela. Dla modeli z dwiema zmiennymi objaśniającymi, ale trzema sygnałami wejściowymi, najmniejsze błędy dopływu ścieków do OŚ uzyskano metodą SVM, a najgorsze regresji Kernela. Z wykonanych symulacji stwierdzono, że w większości przypadków najmniejsze wartości błędów dopływu ścieków do oczyszczalni otrzymano metodą SVM a największe metodą K. W przypadku najprostszego modelu z jednym sygnałem wejściowym opóźnionym o 1 dobę najlepsze wyniki obliczeń uzyskano metodą k-NN, a w dwóch przypadkach modeli, gdzie ujęto 2 sygnały wejściowe, najlepsza okazała się metoda RF.

Słowa kluczowe

wastewater treatment plant data mining random forest forecasting inflow k-nearest neighbour Kernel regression

oczyszczalnia ścieków wydobywanie danych las losowy dopływ ścieków modelowanie k-najbliższy sąsiad regresja Kernela

Wydawca

Institute of Environmental Engineering, Polish Academy of Sciences

Czasopismo

Archives of Environmental Protection

Rocznik

2017

Tom

Vol. 43, no. 3

Strony

74--81

Opis fizyczny

Bibliogr. 31 poz., tab., wykr.

Twórcy

autor

Szeląg B.

bszelag@tu.kielce.pl

Kielce University of Technology, Poland

autor

Bartkiewicz L.

Kielce University of Technology, Poland

autor

Studziński J.

Systems Research Institute PAN, Poland

autor

Barbusiński K.

Silesian University of Technology, Poland

Bibliografia

[1]. Abhart, R.J. & See L. (2002). Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchments, Hydrology and Earth System Sciences, 6, 4, pp. 655-670.
[2]. Abyaneh, H.Z. (2014). Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters, Journal of Environmental Health Science & Engineering, 12, 1, pp. 1-8.
[3]. Adamowski, J., Chan, H.F., Prasher, S.O. & Sharda, V.N. (2012). Comparison of multivariate adaptive regression splines with copuled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro-watersheds with limited data, Journal of Hydroinformatics, 14, 3, pp. 731-744.
[4]. Banasik, K., Krajewski, A., Sikorska, A. & Hejduk, L. (2014). Curve number estimation for a small urban catchment from recorded rainfall-runoff events, Archives of Environmental Protection, 40, 3, pp. 75-86.
[5]. Bartkiewicz, L. & Studziński, J. (2010). Mathematical modeling of the hydraulic load of communal wastewater networks, in: Modeling and Simulation 2010, G.K. Janssens, K. Ramakers, A. Caris, (eds), EUROSIS-ETI, Hasselt Belgium 2010, pp. 156-160.
[6]. Bartkiewicz, L., Szeląg, B. & Studziński, J. (2016). Impact assessment of input variables and ANN model structure on forecasting wastewater inflow into sewage treatment plants, Ochrona Środowiska, 38, 2, pp. 29-36. (in Polish)
[7]. Borowa, A., Brdyś, M.A. & Mazur, K. (2007). Modeling of wastewater treatment plant for monitoring and control purposes by state-space wavelet networks, International Journal of Computers, Communications & Control, 2, 2, pp. 121-131.
[8]. Box, G.E.P. & Jenkins, G.M. (1976). Time series analysis: Forecasting and control, Holden-Day, San Francisco 1976.
[9]. Breiman, L. (2000). Random forests. Journal Machine Learning, 45, 1, pp. 5-32.
[10]. Chuchro, M. (2009). Prediction of the sewage treatement plant inflow parameters, Akademia Górniczo-Hutnicza, Wydział Geologii, Geofizyki i Ochrony Środowiska, Kraków 2009. (in Polish)
[11]. Dellana, S.A. & West, D. (2009). Predictive modeling for wastewater applications: Linear and nonlinear approaches, Environmental Modelling and Software, 24, 1, pp. 96-106.
[12]. El-Din A.G. & Smith D.W. (2002). Modelling approach for high flow rate in wastewater treatment operation, Journal of Environmental Engineering and Science, 1, 4, pp. 275-291.
[13]. Fernandez, F.J., Seco, A., Ferrer, J. & Rodrigo, M.A. (2009). Use of neurofuzzy networks to improve wastewater flow-rate forecasting, Environmental Modelling and Software, 24, 6, pp. 686-693.
[14]. Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 5, pp. 1189-1232.
[15]. Friedman, J.H. (2002). Stochastic gradient boosting, Computational Statistics and Data Analysis, 38, 4, pp. 367-378.
[16]. Han, H., Li, Y., Guo, Y. & Qiao, J. (2016). A soft computing method to predict sludge volume index based on a recurrent self-organizing neural network, Applied Soft Computing, 38, pp. 477-486.
[17]. Henze, M., Gujer, W., Mino, T. & Loosdrecht, M. (2000). Activated Sludge Models, IWA Publishing, London 2000.
[18]. IMGW. The daily time series of precipitation of the Airport Meteorological Station Rzeszów from the period 2005-2008.
[19]. Jonsdottir, H., Nielse, H.A., Madsen, H., Eliasson, J., Palsson, O.P. & Nielse, M.K. (2007). Conditional parametric models for storm sewer runoff, Water Resources Research, 43, 5, pp. 1-9.
[20]. Koza, J.R. (1992). Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, Cambridge 1992.
[21]. Kulczycki, P. (2005). Nuclear estimators in system analysis, WNT, Warszawa 2005.
[22]. Licznar, P. (2004). Rainfall erosivity prediction in Poland on the basis of monthly precipitation totals, Archives of Environmental Protection, 30, 4, pp. 29-39. (in Polish)
[23]. Nesmerak, I. & Blazkova, S.D. (2014). Analysis of the time series of waste water quality at the inflow of the wastewater treatment plant and transfer functions, Journal of Hydrology and Hydromechanics, 62, 1, pp. 55-59.
[24]. Piotrowski, A., Napiorkowski, J.J. & Rowiński, P.M. (2006). Flash-flood forecasting by means of neural networks and nearest neighbour approach - a comparative study, Nonlinear Processes Geophysics, 13, 4, pp. 443-448.
[25]. Piotrowski, A., Osuch M., Napiórkowski, M.J., Rowiński P.M. & Napiórkowski, J.J. (2014). Comparing large number of metaheurestics for artificial neural networks training to predict water temperature in a natural river, Computers & Geosciences, 64, pp. 136-151.
[26]. Simonoff, J.S. (1996). Smoothing Methods in Statistics, Springer Series in Statistics, New York 1996.
[27]. Szeląg, B. & Gawdzik, J. (2016). Application of selected methods of artificial intelligence to activated sludge settleability predictions, Polish Journal of Environmental Studies, 25, 4, pp. 1709-1714.
[28]. Wei, X. & Kusiak, A. (2015). Short-term prediction of influent flow in wastewater treatment plant, Stochastic Environmental Research and Risk Assessment, 29, 1, pp. 241-249.
[29]. Young, P.C. (2001). Data-based mechanistic modeling and validation of rainfall-flow processes, in: Model validation: perspectives in hydrological science, M.G. Anderson, P.D. Bates, (eds). Wiley 2001.
[30]. Rutkowski, L. (2006). Computational Intelligence: Methods and Techniques, PWN, Warszawa 2006. (in Polish)
[31]. Vapnik, V. (1998). Statistical Learning Theory, John Wiley and Sons, New York, 1998.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-664c0d73-f503-4d53-934a-edb4a35982de