Wyniki wyszukiwania - BazTech

1

Analysis of the Effectiveness of Selected Machine Learning Algorithms in the Classification of Satellite Image Content Depending on the Size of the Training Sample

Kupidura Przemysław, Niemyski Stanisław

Teledetekcja Środowiska

|

2024

|

T. 64

24--38

EN

The article presents an analysis of the accuracy of 3 popular machine learning (ML) methods: Maximum Likelihood Classifier (MLC), Support Vector Machine (SVM), and Random Forest (RF) depending on the size of the training sample. The analysis involved performing the classification of the content of a Landsat 8 satellite image (divided into 6 basic land cover classes) in 10 different variants of the number of training samples (from 2664 to 34711 pixels), estimating individual results, and a comparative analysis of the obtained results. For each classification variant, an error matrix was developed and on their basis, accuracy metrics were calculated: f1-score, precision and recall (for individual classes) as well as overall accuracy and kappa index of agreement (generally for the entire classification). The analysis showed a stimulating effect of the size of the training sample on the accuracy of the obtained classification results in all analyzed cases, with the most sensitive to this factor being MLC, showing the best effectiveness with the largest training sample and the smallest - with the smallest, and the least SVM, characterized by the highest accuracy with the smallest training sample, comparing to other algorithms.

PL

Artykuł przedstawia analizę dokładności 3 popularnych metod uczenia maszynowego: Maximum Likelihood Classifier (MLC), Support Vector Machine (SVM) oraz Random Forest (RF) w zależności od liczebności próbki treningowej. Analiza polegała na wykonaniu klasyfikacji treści zdjęcia satelitarnego Landsat 8 (w podziale na 6 podstawowych klas pokrycia terenu) w 10 różnych wariantach liczebności próbek uczących (od 2664 do 34711 pikseli), oszacowaniu poszczególnych wyników oraz analizie porównawczej uzyskanych wyników. Dla każdego wariantu klasyfikacji opracowano macierz błędów, a na ich podstawie obliczono metryki dokładności: F1-score, precision and recall (dla pojedynczych klas) oraz ogólną dokładność i wskaźnik zgodności Kappa (ogólnie dla całej klasyfikacji). Analiza wykazała stymulujący wpływ rozmiaru próbki uczącej na dokładność uzyskiwanych wyników klasyfikacji we wszystkich analizowanych przypadkach, przy czym najbardziej wrażliwym na ten czynnik był MLC, wykazujący się najlepszą skutecznością przy największej próbce treningowej i najmniejszą - przy najmniejszej, a najmniej SVM, cechujący się największą dokładnością przy najmniejszej próbce treningowej, w porównaniu do pozostałych algorytmów.

2

Comparison of Statistical and Machine-Learning Model for Analyzing Landslide Susceptibility in Sumedang Area, Indonesia

Fitriana Hana Listi, Ismanto Rido Dwi, Tulus Jessica Stephanie, Julzarika Atriyon, Nugroho Jalu Tejo, Manalu Johanes

Geomatics and Environmental Engineering

|

2024

|

Vol. 18, no. 2

73--95

EN

Landslides have produced several recurrent dangers, including losses of life and property, losses of agricultural land, erosion, population relocation, and others. Landslide mitigation is critical since population and economic expansion are rapidly followed by significant infrastructure development, increasing the risk of catastrophes. At an early stage in landslide-disaster mitigation, landslide-risk mapping must give critical information to help policies limit the potential for landslide damage. This study will utilize the comparative frequency ratio (FR) and random forest (RF) techniques; they will be utilized to properly investigate the distribution of flood vulnerability in the Sumedang area. This study has identified 12 criteria for developing a landslide-susceptibility model in the research region based on the features of past disasters in the research area. The FR and RF models scored 88 and 81% of the AUC value, respectively. Based on the McNemar test, the FR and RF models featured the same performance in determining the landslide-vulnerability level performances in Sumedang. They performed well in assessing landslides in the research region; therefore, they may be used as references in landslide prevention and references in future regional development plans by the stakeholders.

3

Diagnosis of inter-turn short circuit fault in IPMSMs based on the combined use of greedy tracking and random forest

Wang Jianping, Ma Jian, Meng Dean, Zhao Xuan, Zhang Kai, Liu Qiquan, Xu Kejie

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2024

|

Vol. 72, nr 2

art. no. e148943

EN

Inter-turn short circuit (ITSC) is a frequent fault of interior permanent magnet synchronous motors (IPMSM). If ITSC faults are not promptly monitored, it may result in secondary faults or even cause extensive damage to the entire motor. To enhance the reliability of IPMSMs, this paper introduces a fault diagnosis method specifically designed for identifying ITSC faults in IPMSMs. The sparse coefficients of phase current and torque are solved by clustering shrinkage stage orthogonal matching tracking (CcStOMP) in the greedy tracking algorithm.The CcStOMP algorithm can extract multiple target atoms at one time, which greatly improves the iterative efficiency. The multiple features are utilized as input parameters for constructing the random forest classifier. The constructed random forest model is used to diagnose ITSC faults with the results showing that the random forest model has a diagnostic accuracy of 98.61% using all features, and the diagnostic accuracy of selecting three of the most important features is still as high as 97.91%. The random forest classification model has excellent robustness that maintains high classification accuracy despite the reduction of feature vectors, which is a great advantage compared to other classification algorithms. The combination of greedy tracing and the random forest is not only a fast diagnostic model but also a model with good generalisation and anti-interference capability. This non-invasive method is applicable to monitoring and detecting failures in industrial PMSMs.

4

Prediction of flyrock distance induced by blasting using particle swarm optimization and multiple regression analysis: an engineering perspective

Chen Yong, Wang Minghua, Yin Heng, Zhang Tianbao

Acta Geophysica

|

2024

|

Vol. 72, no. 1

287--301

EN

Flyrock is one of the major safety hazards induced by blasting operations. However, few studies were for predicting blasting-induced flyrock distance from the perspective of engineers. The present paper attempts to provide an engineer-friendly equation predicting blasting-induced flyrock distance. Data used in the present study contains s seven blasting parameters including borehole diameter, blasthole length, powder factor, stemming length, maximum charge per delay, burden, and flyrock distance is obtained. Data is inputted into Random Forest for feature selection. The selected features are formulated as two candidate equations, including Multiple Linear Regression (MLR) equation and Multiple Nonlinear Regression (MNR) equation. Those two candidates are respectively referred by Particle Swarm Optimization for searching optimum values for the coefficients of selected features. It is proved that MLR equation has better accuracy. MLR equation is compared with two empirical equations and the MLR equation based on least squares method. It is found that the coefficient of correlation of the proposed MLR equation reaches 0.918, which is the highest compared with the scores of other three equations. The present study utilizes feature selection process to screen inputs, which effectively excludes irrelevant parameters from being considered. Plus the contribution of Particle Swarm Optimization, the accuracy of the obtained equation can be guaranteed.

5

Developing a data-driven soft sensor to predict silicate impurity in iron ore flotation concentrate

Pural Yusuf Enes

Physicochemical Problems of Mineral Processing

|

2023

|

Vol. 59, iss. 5

art. no. 169823

EN

Soft sensors are mathematical models that estimate the value of a process variable that is difficult or expensive to measure directly. They can be based on first principle models, data-based models, or a combination of both. These models are increasingly used in mineral processing to estimate and optimize important performance parameters such as mill load, mineral grades, and particle size. This study investigates the development of a data-driven soft sensor to predict the silicate content in iron ore reverse flotation concentrate, a crucial indicator of plant performance. The proposed soft sensor model employs a dataset obtained from Kaggle, which includes measurements of iron and silicate content in the feed to the plant, reagent dosages, weight and pH of pulp, as well as the amount of air and froth levels in the flotation units. To reduce the dimensionality of the dataset, Principal Component Analysis, an unsupervised machine learning method, was applied. The soft sensor model was developed using three machine learning algorithms, namely, Ridge Regression, Multi-Layer Perceptron, and Random Forest. The Random Forest model, created with non-reduced data, demonstrated superior performance, with an R-squared value of 96.5% and a mean absolute error of 0.089. The results suggest that the proposed soft sensor model can accurately predict the silicate content in the iron ore flotation concentrate using machine learning algorithms. Moreover, the study highlights the importance of selecting appropriate algorithms for soft sensor developments in mineral processing plants.

6

Predicting sea surface salinity in a tidal estuary with machine learning

Guillou Nicolas, Chapalain Georges, Petton Sébastien

Oceanologia

|

2023

|

Vol. 65 (2)

318--332

EN

As an indicator of exchanges between watersheds, rivers and coastal seas, salinity may provide valuable information about the exposure, ecological health and robustness of marine ecosystems, including especially estuaries. The temporal variations of salinity are traditionally approached with numerical models based on a physical description of hydrodynamic and hydrological processes. However, as these models require large computational resources, such an approach is, in practice, rarely considered for rapid turnaround predictions as requested by engineering and operational applications dealing with the ecological monitoring of estuaries. As an alternative efficient and rapid solution, we investigated here the potential of machine learning algorithms to mimic the non-linear complex relationships between salinity and a series of input parameters (such as tide-induced free-surface elevation, river discharges and wind velocity). Beyond regression methods, the attention was dedicated to popular machine learning approaches including MultiLayer Perceptron, Support Vector Regression and Random Forest. These algorithms were applied to six-year observations of sea surface salinity at the mouth of the Elorn estuary (bay of Brest, western Brittany, France) and compared to predictions from an advanced ecological numerical model. In spite of simple input data, machine learning algorithms reproduced the seasonal and semi-diurnal variations of sea surface salinity characterised by noticeable tide-induced modulations and low-salinity events during the winter period. Support Vector Regression provided the best estimations of surface salinity, improving especially predictions from the advanced numerical model during low-salinity events. This promotes the exploitation of machine learning algorithms as a complementary tool to process-based physical models.

7

Development of Flood-Hazard-Mapping Model Using Random Forest and Frequency Ratio in Sumedang Regency, West Java, Indonesia

Ismanto Rido Dwi, Fitriana Hana Listi, Manalu Johanes, Purboyo Alvian Aji, Prasasti Indah

Geomatics and Environmental Engineering

|

2023

|

Vol. 17, no. 6

129--157

EN

Flooding, often triggered by heavy rainfall, is a common natural disaster in Indonesia, and is the third most common type of disaster in Sumedang Regency. Hence, flood-susceptibility mapping is essential for flood management. The primary challenge in this lies in the complex, non-linear relationships between indices and risk levels. To address this, the application of random forest (RF) and frequency ratio (FR) methods has been explored. Ten flood-conditioning factors were determined from the references: the distance from a river, elevation, geology, geomorphology, lithology, land use/land cover, rainfall, slope, soil type, and topographic wetness index (TWI). The 35 flood locations from the flood-inventory map were selected, and the remaining 18 flood locations were used for justifying the outcomes. The flooded areas from the RF model were 28.39%; the rest (71.61%) were non-flooded areas. Also, the flooded areas from the FR method were 8.02%, and the non-flooded areas were 91.98%. The AUC for both methods was a similar value – 83.0%. This result is quite accurate and can be used by policymakers to prevent and manage future flooding in the Sumedang area. These results can also be used as materials for updating existing flood-susceptibility maps.

8

Automated malarial retinopathy detection using transfer learning and multi-camera retinal images

Rajendra Kurup Aswathy, Wigdahl Jeff, Benson Jeremy, Martínez-Ramón Manel, Solíz Peter, Joshi Vinayak

Biocybernetics and Biomedical Engineering

|

2023

|

Vol. 43, no. 1

109--123

EN

Cerebral malaria (CM) is a fatal syndrome found commonly in children less than 5 years old in Sub-saharan Africa and Asia. The retinal signs associated with CM are known as malarial retinopathy (MR), and they include highly specific retinal lesions such as whitening and hemorrhages. Detecting these lesions allows the detection of CM with high specificity. Up to 23% of CM, patients are over-diagnosed due to the presence of clinical symptoms also related to pneumonia, meningitis, or others. Therefore, patients go untreated for these pathologies, resulting in death or neurological disability. It is essential to have a low-cost and high-specificity diagnostic technique for CM detection, for which We developed a method based on transfer learning (TL). Models pre-trained with TL select the good quality retinal images, which are fed into another TL model to detect CM. This approach shows a 96% specificity with low-cost retinal cameras.

9

Bivariate simulation of river fow using hybrid intelligent models in sub basins of Lake Urmia, Iran

Eslamitabar Vahed, Ahmadi Farshad, Sharafati Ahmad, Rezaverdinejad Vahid

Acta Geophysica

|

2023

|

Vol. 71, no. 2

873--892

EN

In this study, the performance of continuous autoregressive moving average (CARMA), CARMA-generalized autoregressive conditional heteroscedasticity (CARMA-GARCH), random forest, support vector regression and ant colony optimization (SVR-ACO), and support vector regression and ant lion optimizer (SVR-ALO) models in bivariate simulating of discharge based on the rainfall variables in monthly time scale was evaluated over four sub-basins of Lake Urmia, located in northwestern Iran. The models were assessed in two stages: train and test. The results showed that the CARMA-GARCH hybrid model offered better performance in all cases than the stand-alone CARMA. The improvement percentages of the error rate in the CARMA model compared to the CARMA-GARCH hybrid model in the Mahabad Chai, Nazlu Chai, Siminehrood, and Zola Chai sub-basins were 9, 20, 17, and 6.4%, respectively, in the training phase. Among the models, the hybrid SVR models integrated with ACO and ALO optimization algorithms presented the best performance based on the Taylor diagram and evaluation criteria. Considering the use of ant colony and ant lion optimization algorithms to optimize the support vector regression model’s parameters, these models offered the best performance in the study area to simulate the discharge. The improvement percentages of the error rate in the SVR-ACO model compared to the CARMA-GARCH hybrid model in the Mahabad Chai, Nazlu Chai, Siminehrood, and Zola Chai sub-basins were 11, 10, 19, and 21%, respectively, in the training phase. In contrast, the random forest model provided the lowest accuracy and the highest error in discharge simulation.

10

Analyzing trend and forecast of rainfall and temperature in Valmiki Tiger Reserve, India, using non parametric test and random forest machine learning algorithm

Roshani, Sajjad Haroon, Saha Tamal Kanti, Rahaman Md Hibjur, Masroor Md, Sharma Yatendra, Pal Swades

Acta Geophysica

|

2023

|

Vol. 71, no. 1

531--552

EN

Assessment of spatiotemporal dynamics of meteorological variables and their forecast is essential in the context of climate change. Such analysis can help suggest possible solutions for flora and fauna in protected areas and adaptation strategies to make forests and communities more resilient. The present study attempts to analyze climate variability, trend and forecast of temperature and rainfall in the Valmiki Tiger Reserve, India. We utilized rainfall and temperature gridded data obtained from the Indian Meteorological Department during 1981–2020. The Mann–Kendall test and Sen’s slope estimator were employed to examine the time series trend and magnitude of change at the annual, monthly and seasonal levels. Random forest machine learning algorithm was used to estimate seasonal prediction and forecasting of rainfall and temperature trend for the next ten years (2021–2030). The predictive capacity of the model was evaluated by statistical performance assessors of coefficient of correlation, mean absolute error, mean absolute percentage error and root mean squared error. The findings revealed a significant decreasing trend in rainfall and an increasing trend in temperature. However, a declining trend for maximum temperature has been observed for winter and post-monsoon seasons. The results of seasonal forecasting exhibited a considerable decrease in rainfall and temperature across the Reserve during all the seasons. However, the temperature will increase during the summer season. The random forest machine learning algorithm has shown its effectiveness in forecasting the temperature and rainfall variables. The findings suggest that these approaches may be used at various spatial scales in different geographical locations.

11

Random forest method to identify seepage in flood embankments

Król Krzysztof, Rymarczyk Tomasz, Gołąbek Michał, Wójcik Dariusz, Niderla Konrad, Kozłowski Edward

Przegląd Elektrotechniczny

|

2022

|

R. 98, nr 2

191--194

EN

he paper presents research on the effectiveness of testing infiltration in flood embankments using electrical impedance tomography. The usefulness of the algorithm was verified and also the best results were checked. In order to test the reconstructive algorithms obtained during the research, images were generated based on simulation measurements. For this purpose, a special model of the embankment was built. In order to obtain feedback on the degree of infiltration in the flood embankment, prediction by means of the Random Forest method was used.

PL

W artykule przedstawiono badania nad efektywnością badania infiltracji w wałach przeciwpowodziowych za pomocą elektrycznej tomografii impedancyjnej. Zweryfikowano przydatność algorytmu, a także sprawdzono najlepsze wyniki. W celu przetestowania uzyskanych w trakcie badań algorytmów rekonstrukcyjnych wygenerowano obrazy na podstawie pomiarów symulacyjnych. W tym celu zbudowano specjalny model wału przeciwpowodziowego. W celu uzyskania informacji zwrotnej o stopniu przesiąkania w wale przeciwpowodziowym zastosowano predykcję za pomocą metody Random Forest.

12

Random forest based power sustainability and cost optimization in smart grid

Durairaj Danalakshmi, Wróblewski Łukasz, Sheela A., Hariharasudan A., Urbański Mariusz

Production Engineering Archives

|

2022

|

Vol. 28, Iss. 1

82--92

EN

Presently power control and management play a vigorous role in information technology and power management. Instead of non-renewable power manufacturing, renewable power manufacturing is preferred by every organization for controlling resource consumption, price reduction and efficient power management. Smart grid efficiently satisfies these requirements with the integration of machine learning algorithms. Machine learning algorithms are used in a smart grid for power requirement prediction, power distribution, failure identification etc. The proposed Random Forest-based smart grid system classifies the power grid into different zones like high and low power utilization. The power zones are divided into number of sub-zones and map to random forest branches. The sub-zone and branch mapping process used to identify the quantity of power utilized and the non-utilized in a zone. The non-utilized power quantity and location of power availabilities are identified and distributed the required quantity of power to the requester in a minimal response time and price. The priority power scheduling algorithm collect request from consumer and send the request to producer based on priority. The producer analysed the requester existing power utilization quantity and availability of power for scheduling the power distribution to the requester based on priority. The proposed Random Forest based sustainability and price optimization technique in smart grid experimental results are compared to existing machine learning techniques like SVM, KNN and NB. The proposed random forest-based identification technique identifies the exact location of the power availability, which takes minimal processing time and quick responses to the requestor. Additionally, the smart meter based smart grid technique identifies the faults in short time duration than the conventional energy management technique is also proven in the experimental results.

13

Application of machine learning tools for seismic reservoir characterization study of porosity and saturation type

Topór Tomasz, Sowiżdżał Krzysztof

Nafta-Gaz

|

2022

|

R. 78, nr 3

165--175

EN

The application of machine learning (ML) tools and data-driven modeling became a standard approach for solving many problems in exploration geology and contributed to the discovery of new reservoirs. This study explores an application of machine learning ensemble methods – random forest (RF) and extreme gradient boosting (XGBoost) to derive porosity and saturation type (gas/water) in multihorizon sandstone formations from Miocene deposits of the Carpathian Foredeep. The training of ML algorithms was divided into two stages. First, the RF algorithm was used to compute porosity based on seismic attributes and well location coordinates. The obtained results were used as an extra feature to saturation type modeling using the XGBoost algorithm. The XGBoost was run with and without well location coordinates to evaluate the influence of the spatial information for the modeling performance. The hyperparameters for each model were tuned using the Bayesian optimization algorithm. To check the training models' robustness, 10-fold cross-validation was performed. The results were evaluated using standard metrics, for regression and classification, on training and testing sets. The residual mean standard error (RMSE) for porosity prediction with RF for training and testing was close to 0.053, providing no evidence of overfitting. Feature importance analysis revealed that the most influential variables for porosity prediction were spatial coordinates and seismic attributes sweetness. The results of XGBoost modeling (variant 1) demonstrated that the algorithm could accurately predict saturation type despite the class imbalance issue. The sensitivity for XGBoost on training and testing data was high and equaled 0.862 and 0.920, respectively. The XGBoost model relied on computed porosity and spatial coordinates. The obtained sensitivity results for both training and testing sets dropped significantly by about 10% when well location coordinates were removed (variant 2). In this case, the three most influential features were computed porosity, seismic amplitude contrast, and iso-frequency component (15 Hz) attribute. The obtained results were imported to Petrel software to present the spatial distribution of porosity and saturation type. The latter parameter was given with probability distribution, which allows for identifying potential target zones enriched in gas.

PL

Metody uczenia maszynowego stanowią obecnie rutynowe narzędzie wykorzystywane przy rozwiązywaniu wielu problemów w geologii poszukiwawczej i przyczyniają się do odkrycia nowych złóż. Prezentowana praca pokazuje zastosowanie dwóch algorytmów uczenia maszynowego – lasów losowych (RF) i drzew wzmocnionych gradientowo (XGBoost) do wyznaczenia porowatości i typu nasycenia (gaz/woda) w formacjach piaskowców będących potencjalnymi horyzontami gazonośnymi w mioceńskich osadach zapadliska przedkarpackiego. Proces uczenia maszynowego został podzielony na dwa etapy. W pierwszym etapie użyto RF do obliczenia porowatości na podstawie danych pochodzących z atrybutów sejsmicznych oraz współrzędnych lokalizacji otworów. Uzyskane wyniki zostały wykorzystane jako dodatkowa cecha przy modelowaniu typu nasycenia z zastosowaniem algorytmu XGBoost. Modelowanie za pomocą XGBoost został przeprowadzone w dwóch wariantach – z wykorzystaniem lokalizacji otworów oraz bez nich w celu oceny wpływu informacji przestrzennych na wydajność modelowania. Proces strojenia hiperparametrów dla poszczególnych modeli został przeprowadzony z wykorzystaniem optymalizacji Bayesa. Wyniki procesu modelowania zostały ocenione na zbiorach treningowym i testowym przy użyciu standardowych metryk wykorzystywanych do rozwiązywania problemów regresyjnych i klasyfikacyjnych. Dodatkowo, aby wzmocnić wiarygodność modeli treningowych, przeprowadzona została 10-krotna kroswalidacja. Pierwiastek błędu średniokwadratowego (RMSE) dla wymodelowanej porowatości na zbiorach treningowym i testowym był bliski 0,053 co wskazuje na brak nadmiernego dopasowania modelu (ang. overfitting). Analiza istotności cech ujawniła, że zmienną najbardziej wpływającą na prognozowanie porowatości były współrzędne lokalizacji otworów oraz atrybut sejsmiczny sweetness. Wyniki modelowania XGBoost (wariant 1) wykazały, że algorytm jest w stanie dokładnie przewidywać typ nasycenia pomimo problemu z nierównowagą klas. Czułość wykrywania potencjalnych stref gazowych w przypadku modelu XGBoost była wysoka zarówno dla zbioru treningowego, jak i testowego (0,862 i 0,920). W swoich predykcjach model opierał się głównie na wyliczonej porowatości oraz współrzędnych otworów. Czułość dla uzyskanych wyników na zbiorze treningowym i testowym spadła o około 10%, gdy usunięto współrzędne lokalizacji otworów (wariant 2 XGBoost). W tym przypadku trzema najważniejszymi cechami były obliczona porowatość oraz atrybut sejsmiczny amplitude contrast i atrybut iso-frequency component (15 Hz). Uzyskane wyniki zostały zaimportowane do programu Petrel, aby przedstawić przestrzenny rozkład porowatości i typu nasycenia. Ten ostatni parametr został przedstawiony wraz z rozkładem prawdopodobieństwa, co dało wgląd w strefy o najwyższym potencjale gazowym.

14

Predicting the stability of open stopes using Machine Learning

Szmigiel Alicja, Apel Derek B.

Journal of Sustainable Mining

|

2022

|

Vol. 21, iss. 3

241--248

EN

The Mathews stability graph method was presented for the first time in 1980. This method was developed to assess the stability of open stopes in different underground conditions, and it has an impact on evaluating the safety of underground excavations. With the development of technology and growing experience in applying computer sciences in various research disciplines, mining engineering could significantly benefit by using Machine Learning. Applying those ML algorithms to predict the stability of open stopes in underground excavations is a new approach that could replace the original graph method and should be investigated. In this research, a Potvin database that consisted of 176 historical case studies was passed to the two most popular Machine Learning algorithms: Logistic Regression and Random Forest, to compare their predicting capabilities. The results obtained showed that those algorithms can indicate the stability of underground openings, especially Random Forest, which, in examined data, performed slightly better than Logistic Regression.

15

An assessment of machine learning and data balancing techniques for evaluating downgrade truck crash severity prediction in Wyoming

Ampadu Vincent-Michael Kwesi, Haq Muhammad Tahmidul, Ksaibati Khaled

Journal of Sustainable Development of Transport and Logistics

|

2022

|

Vol. 7, No. 2

6--24

EN

This study involved the investigation of various machine learning methods, including four classification tree-based ML models, namely the Adaptive Boosting tree, Random Forest, Gradient Boost Decision Tree, Extreme Gradient Boosting tree, and three non-tree-based ML models, namely Support Vector Machines, Multi-layer Perceptron and k-Nearest Neighbors for predicting the level of severity of large truck crashes on Wyoming road networks. The accuracy of these seven methods was then compared. The Final ROC AUC score for the optimized random forest model is 95.296 %. The next highest performing model was the k-NN with 92.780 %, M.L.P. with 87.817 %, XGBoost with 86.542 %, Gradboost with 74.824 %, SVM with 72.648 % and AdaBoost with 67.232 %. Based on the analysis, the top 10 predictors of severity were obtained from the feature importance plot. These may be classified into whether safety equipment was used, whether airbags were deployed, the gender of the driver and whether alcohol was involved.

16

Random forest in the tests of small caliber ammunition

Ampuła Dariusz

Journal of KONBiN

|

2022

|

Vol. 52, iss. 1

73--85

EN

In the introduction of this article the method of building a random forest model is presented, which can be used for both classification and regression tasks. The process of designing the random forest module was characterized, paying attention to the classification tasks module, which was used to build the author’s model. Based on the test results, a random forest model was designed for 7,62 mm ammunition with T-45 tracer projectile. Predictors were specified and values of stop parameters and process stop formulas were determined, on the basis of which a random forest module was built. An analysis of the resulting random forest model was made in terms of assessing its prediction and risk assessment. Finally, the designed random forest model has been refined by adding another 50 trees to the model. The enlarged random forest model occurred to be slightly stronger and it should be implemented.

PL

W artykule we wstępie przedstawiono metodę budowy modelu losowy las, którą można stosować zarówno do zadań klasyfikacyjnych, jak i do zadań regresyjnych. Scharakteryzowano proces projektowania modułu losowego lasu, zwracając uwagę na moduł zadań klasyfikacyjnych, który posłużył do budowy autorskiego modelu. Na podstawie posiadanych wyników badań, zaprojektowano model losowego lasu dla amunicji strzeleckiej kalibru 7,62 mm z pociskiem smugowym T-45. Wyszczególniono predyktory oraz określono wartości parametrów zatrzymania oraz formuły stopu procesu, na podstawie których zbudowano moduł losowego lasu. Dokonano analizy otrzymanego modelu losowego lasu pod kątem oceny jego trafności predykcji oraz oceny ryzyka. Na końcu, udoskonalono zaprojektowany model losowego lasu poprzez dodanie do modelu kolejnych 50 drzew. Powiększony model losowego lasu okazał się nieznacznie silniejszy i to on powinien być wdrożony do użytkowania.

17

PM 2.5 modelling during paddy stubble burning months using artificial intelligence techniques

Sangwan V., Deswal S.

Journal of Achievements in Materials and Manufacturing Engineering

|

2022

|

Vol. 110, nr 1

16--26

EN

Purpose: In this study, the artificial intelligence techniques namely Artificial Neural Network, Random Forest, and Support Vector Machine are employed for PM 2.5 modelling. The study is carried out in Rohtak city of India during paddy stubble burning months i.e., October and November. The different models are compared to check their respective efficacies and also sensitivity analysis is performed to know about the most vital parameter in PM 2.5 modelling. Design/methodology/approach: The air pollution data of October and November months from the year 2016 to 2020 was collected for the study. The months of October and November are chosen as paddy stubble burning and major festivities using fireworks occur during these months. The untoward data entries viz. zero values, blank data, etc. were eliminated from the gathered data set and thereafter 231 observations of each parameter were left for the conduct of the presented study. The different models i.e., ANN, RF, SVM, etc. had PM 2.5 as an output variable while relative humidity, sulfur dioxide, nitrogen dioxide, nitric oxide, carbon monoxide, ozone, temperature, solar radiation, wind direction and wind speed acted as input variables. The prototypes created from the training data set are verified on the testing data set. A sensitivity analysis is also done to quantify impact of various parameters on output variable i.e., PM 2.5. Findings: The performance of the SVM_RBF based model turned out to be the best with the performance parameters being the coefficient of determination, root mean square error, and mean absolute error. In the sensitivity test, sulphur dioxide (SO2) was adjudged as the most vital variable. Research limitations/implications: The quantification capacity of the generated models may go beyond the used data set of observations. Practical implications: The artificial intelligence techniques provide precise estimation and forecasting of PM 2.5 in the air during paddy stubble burning months of October and November. Originality/value: Unlike the past research work that focus on modelling of various air pollution parameters, this study in specific focuses on the modelling of most vital air pollutant i.e., PM 2.5 that too specifically during the paddy stubble burning months of October and November when the air pollution is at its peak in northern India.

18

Prediction model of public houses’ heating systems:a comparison of support vector machine methodand random forest method

Perekrest Andrii, Chenchevoi Vladimir, Chencheva Olga, Kovalenko Alexandr, Kushch-Zhyrko Mykhailo, Kalizhanova Aliya, Amirgaliyev Yedilkhan

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2022

|

T. 12, nr 3

34--39

EN

Data analysis and predicting play an important role in managing heat-supplying systems. Applying the models of predicting the systems’ parameters is possible for qualitative management, accepting appropriate decisions relating control that will be aimed at increasing energy efficiency and decreasing the amount of the consumed power source, diagnosing and defining non-typical processes in the functioning of the systems. The article deals with comparing two methods of ma-chine learning: random forest (RF) and support vector machine (SVM) for predicting the temperature of the heat-carrying agent in the heating system based on the data of electronic weather-dependent controller. The authors use the following parameters to compare the models: accuracy, source cost and the opportunity to interpret the results and non-obvious interrelations. The time spent for defining the optimal hyperparameters and conducting the SVM model training is deter-mined to exceed significantly the data of the RF parameter despite the close meanings of the root mean square error (RMSE). The change from 15-min data to once-a-minute ones is done to improve the RF model accuracy. RMSE of the RF model on the test data equals 0.41°С. The article studies the importance of the contribution of variables to the prediction accuracy.

PL

Analiza danych i prognozowanie odgrywają ważną rolę w zarządzaniu systemami zaopatrzenia w ciepło. Wykorzystanie modeli do przewidywania parametrów systemu jest możliwe do zarządzania jakością, podejmowania odpowiednich decyzji sterujących, które będą miały na celu poprawę efektywności energetycznej i zmniejszenie ilości zużywanego źródła energii elektrycznej, diagnozowania i wykrywania nietypowych procesów w funkcjonowaniu systemu. W artykule porównano dwie metody uczenia maszynowego: Random Forest (RF) i Support Vector Machine (SVM) do przewidywania temperatury czynnika grzewczego w systemie grzewczym na podstawie danych elektronicznego regulatora pogodowego. Do porównania modeli autorzy wykorzystują następujące parametry: dokładność, koszt początkowy oraz możliwość interpretacji wyników i nieoczywistych zależności. Ustalono, że czas poświęcony na wyznaczenie optymalnych hiperparametrów i wytrenowanie modelu SVM znacznie przekracza dane parametru RF, pomimo zbliżonych wartości błędu średniokwadratowego (RMSE). Zmiana z danych 15-minutowych na dane raz na minutę została dokonana w celu poprawy dokładności modelu RF. RMSE modelu RF z danych testowych wynosi 0,41°C. W pracy zbadano znaczenie wkładu zmiennych w dokładność prognozy.

19

A Machine Learning Model for Improving Building Detection in Informal Areas: A Case Study of Greater Cairo

Taha Lamyaa Gamal El-deen, Ibrahim Rania Elsayed

Geomatics and Environmental Engineering

|

2022

|

Vol. 16, no. 2

39--59

EN

Building detection in Ashwa’iyyat is a fundamental yet challenging problem, mainly because it requires the correct recovery of building footprints from images with high-object density and scene complexity. A classification model was proposed to integrate spectral, height and textural features. It was developed for the automatic detection of the rectangular, irregular structure and quite small size buildings or buildings which are close to each other but not adjoined. It is intended to improve the precision with which buildings are classified using scikit learn Python libraries and QGIS. WorldView-2 and Spot-5 imagery were combined using three image fusion techniques. The Grey-Level Co-occurrence Matrix was applied to determine which attributes are important in detecting and extracting buildings. The Normalized Digital Surface Model was also generated with 0.5-m resolution. The results demonstrated that when textural features of colour images were introduced as classifier input, the overall accuracy was improved in most cases. The results show that the proposed model was more accurate and efficient than the state-of-the-art methods and can be used effectively to extract the boundaries of small size buildings. The use of a classifier ensample is recommended for the extraction of buildings.

20

Sparse data classifier based on first-past-the-post voting system

Cudak Magdalena, Piech Mateusz, Marcjan Robert

Computer Science

|

2022

|

T. 23 (2)

277--296

EN

A point of interest (POI) is a general term for objects that describe places from the real world. The concept of POI matching (i.e., determining whether two sets of attributes represent the same location) is not a trivial challenge due to the large variety of data sources. The representations of POIs may vary depending on the basis of how they are stored. A manual comparison of objects is not achievable in real time; therefore, there are multiple solutions for automatic merging. However, there is no yet the efficient solution solves the missing of the attributes. In this paper, we propose a multi-layered hybrid classifier that is composed of machine-learning and deep-learning techniques and supported by a first-past-the-post voting system. We examined different weights for the constituencies that were taken into consideration during a majority (or supermajority) decision. As a result, we achieved slightly higher accuracy than the best current model (random forest), which also is based on voting.