Wyniki wyszukiwania - BazTech

1

Assessing the efficiency of a random forest regression model for estimating water quality indicators

Zavareh Maryam, Maggioni Viviana, Zhang Xinxuan

Meteorology Hydrology and Water Management. Research and Operational Applications

|

2023

|

Vol. 11, Iss. 2

52--69

EN

This work evaluates the efficiency of Random Forest (RF) regression for predicting water quality indicators and investigates factors affecting water quality in 11 watersheds in Virginia, District of Columbia, and Maryland. Ten years of daily water quality data along with hydro-meteorological information (such as precipitation) and watershed physiology and characteristics (e.g., size, soil type, land use) are used to predict dissolved oxygen (DO), specific conductivity (K), and turbidity (Tu) across the selected watersheds. The RF regression model is developed for six scenarios, with an increasing number of predictors introduced in each scenario. The first scenario contains the smallest amount of information (water quality indicators DO, K and Tu), while scenario 6 contains all the available variables. The RF model is evaluated based on three statistical metrics: the relative root mean square error, the correlation coefficient, and the percentage of variance explained. In addition, the degree of importance for each predictor is used to rank their importance within each scenario. The model shows excellent performance for DO as the predicted variable. The model predicting K slightly outperforms the one predicting Tu. Scenario 4 (built based on water quality indicators, hydro-meteorological data, watershed physiology and land cover information) provided the best tradeoff between performance and efficiency (quantified in terms of the amount of information needed to develop the model). In conclusion, based on the RF model, land cover plays a significant role in predicting water quality indicators. In addition, the developed RF regression model is adaptable to watersheds in this region over a range of climates.

2

An Approach to License Plate Recognition in Real Time Using Multi-stage Computational Intelligence Classifier

Kekez Michał

International Journal of Electronics and Telecommunications

|

2023

|

Vol. 69, No. 2

275--280

EN

Automatic car license plate recognition (LPR) is widely used nowadays. It involves plate localization in the image, character segmentation and optical character recognition. In this paper, a set of descriptors of image segments (characters) was proposed as well as a technique of multi-stage classification of letters and digits using cascade of neural network and several parallel Random Forest or classification tree or rule list classifiers. The proposed solution was applied to automated recognition of number plates which are composed of capital Latin letters and Arabic numerals. The paper presents an analysis of the accuracy of the obtained classifiers. The time needed to build the classifier and the time needed to classify characters using it are also presented.

3

Integrating Vegetation Indices and Spectral Features for Vegetation Mapping from Multispectral Satellite Imagery Using AdaBoost and Random Forest Machine Learning Classifiers

Saini Rashmi

Geomatics and Environmental Engineering

|

2023

|

Vol. 17, no. 1

57--74

EN

Vegetation mapping is an active research area in the domain of remote sensing. This study proposes a methodology for the mapping of vegetation by integrating several vegetation indices along with original spectral bands. The Land Use Land Cover classification was performed by two powerful Machine Learning techniques, namely Random Forest and AdaBoost. The Random Forest algorithm works on the concept of building multiple decision trees for the final prediction. The other Machine Learning technique selected for the classification is AdaBoost (adaptive boosting), converts a set of weak learners into strong learners. Here, multispectral satellite data of Dehradun, India, was utilised. The results demonstrate an increase of 3.87% and 4.32% after inclusion of selected vegetation indices by Random Forest and AdaBoost respectively. An Overall Accuracy (OA) of 91.23% (kappa value of 0.89) and 88.59% (kappa value of 0.86) was obtained by means of the Random Forest and AdaBoost classifiers respectively. Although Random Forest achieved greater OA as compared to AdaBoost, interestingly AdaBoost provided better class-specific accuracy for the Shrubland class compared to Random Forest. Furthermore, this study also evaluated the importance of each individual feature used in the classification. Results demonstrated that the NDRE, GNDVI, and RTVIcore vegetation indices, and spectral bands (NIR, and Red-Edge), obtained higher importance scores.

4

A comparative study on performance of basic and ensemble classifiers with various datasets

Gunakala Archana, Shahid Afzal Hussain

Applied Computer Science

|

2023

|

Vol. 19, no 1

107--132

EN

Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen based on the model's performance and execution time. This paper compares and analyzes the performance of basic as well as ensemble classifiers utilizing 10-fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from Kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01%. The proposed ensemble combinations outperformed the conven¬tional models for few datasets.

5

The decline of Svalbard land-fast sea ice extent as a result of climate change

Urbański Jacek A., Litwicka Dagmara

Oceanologia

|

2022

|

No. 64 (3)

535--547

EN

The Svalbard Archipelago has experienced some of the most severe temperature increases in the Arctic in the last three decades. This temperature rise has accelerated sea-ice melting along the coast of the archipelago, thus bringing changes to the local environment. In view of the importance of the near-future distribution of land-fast sea ice along the Svalbard coast, the available observation data on the ice extent between 1973 and 2018 are used herein to create a random forest (RF) model for predicting the daily ice extent and its spatial distribution according to the cumulative number of freezing and thawing degree days and the duration of the ice season. Two RF models are constructed by using either regression or classification algorithms. The regression model makes it possible to estimate the extent of land-fast ice with a root mean square error (RMSE) of 800 km2, while the classification model creates a cluster of submodels in order to forecast the spatial distribution of land-fast ice with less than 10% error. The models also enable the reconstruction of the past ice extent, and the prediction of the near-future extent, from standard meteorological data, and can even analyze the real-time spatial variability of land-fast ice. On average, the minimum two-monthly extent of land-fast sea ice along the Svalbard coast was about 12,000 km2 between 1973 and 2000. In 2005–2019, however, the ice extent declined to about 6,000 km2. A further increase in mean winter air temperatures by two degrees, which is forecast in 10 to 20 years, will result in a minimum two-monthly land-fast ice extent of about 1,500 km2, thus indicating a trend of declining land-fast ice extent in this area.

6

Different Classifier Approaches Used For Fingerprint Classification

Tiwari Meena, Mishra Ashish

Annals of Computer Science and Information Systems

|

2022

|

Vol. 33

249--253

EN

Fingerprints play an important role in public safety and criminal investigations such as: B. Legal Investigations, Law Enforcement, Cultural Access, and Social Security. It can also help to give people a comfortable and secure life. Various gender segregation strategies have been proposed. In this article, the fingerprint algorithm uses a variety of Naive Bayes classifiers, SVM, Logistics Regression and Random Forest which they use to obtain the best results of gender segregation, a new fingerprint method can be created by Naive Bayes classifier, SVM, Logistics Regression and The Random Forest used and compiled proposed from different divisions obtained the best possible division of results by Random Forest, with 98\% accuracy compared to Naive Bayes, SVM and Logistics Regression, based on Random. The forest is the most sensitive to gender segregation.

7

Monitoring Vegetation Cover Changes by Sentinel-1 Radar Images Using Random Forest Classification Method

Tran Van Anh, Le Thi Le, Nguyen Nhu Hung, Le Thanh Nghi, Tran Hong Hanh

Inżynieria Mineralna

|

2021

|

no. 2

441--451

EN

Vietnam is an Asian country with hot and humid tropical climate throughout the year. Forests account for more than 40% of the total land area and have a very rich and diverse vegetation. Monitoring the changes in the vegetation cover is obviously important yet challenging, considering such large varying areas and climatic conditions. A traditional remote sensing technique to monitor the vegetation cover involves the use of optical satellite images. However, in presence of the cloud cover, the analyses done using optical satellite image are not reliable. In such a scenario, radar images are a useful alternative due to the ability of radar pulses in penetrating through the clouds, regardless of day or night. In this study, we have used multi temporal C band satellite images to monitor vegetation cover changes for an area in Dau Tieng and Ben Cat districts of Binh Duong province, Mekong Delta, Vietnam. With a collection of 46 images between March 2015 and February 2017, the changes of five land cover types including vegetation loss and replanting in 2017 were analyzed by selecting two cases, using 9 images in the dry season of 3 years 2015, 2016 and 2017 and using all of 46 images to conduct Random Forest classifier with 100, 200, 300 and 500 trees respectively. The result in which the model with nine images and 300 trees gave the best accuracy with an overall accuracy of 98.4% and a Kappa of 0.97. The results demonstrated that using VH polarization, Sentinel-1 gives quite a good accuracy for vegetation cover change. Therefore, Sentinel-1 can also be used to generate reliable land cover maps suitable for different applications.

8

Ensemble-based Method of Fraud Detection at Self-checkouts in Retail

Vitynskyi P., Tkachenko R., Izonin I.

ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes

|

2019

|

Vol. 8, No 2

3--8

EN

The authors consider the problem of fraud detection at self-checkouts in retail in condition of unbalanced data set. A new ensemble-based method is proposed for its effective solution. The developed method involves two main steps: application of the preprocessing procedures and the Random Forest algorithm. The step-by-step implementation of the preprocessing stage involves the sequential execution of such procedures over the input data: scaling by maximal element in a column with row-wise scaling by Euclidean norm, weighting by correlation and applying polynomial extension. For polynomial extension Ito decomposition of the second degree is used. The simulation of the method was carried out on real data. Evaluating performance was based on the use of cost matrix. The experimental comparison of the effectiveness of the developed ensemble-based method with a number of existing (simples and ensembles) demonstrates the best performance of the developed method. Experimental studies of changing the parameters of the Random Forest both for the basic algorithm and for the developed method demonstrate a significant improvement of the investigated efficiency measures of the latter. It is the result of all steps of the preprocessing stage of the developed method use.

9

Comparative analysis of data mining algorithms applied to the context of school dropout

Oliveira Vasconcelos Nathanael, Colaço Júnior Methanias, S. Almeida Thiago, Matheus da Silva Victor

Annals of Computer Science and Information Systems

|

2019

|

Vol. 20

3--10

EN

Students' dropout is certainly one of the major problems that afflict educational institutions, the losses caused by the student's abandonment are social, academic and economic waste. The quest for its causes has been subject of work and educational research around the world. Several organizations seek strategic decisions to control the dropout rate. This work's goal is to evaluate the effectiveness of the most used data mining algorithms in the education area. An "in vivo'' controlled experiment was planned and performed to compare the efficacy selected classifiers. The Random Forest and SVM algorithms have stood out in this context, having, statistically similar accuracy (80.36%, 81.18%), precision (80.79%, 80.25%), recall (76.50%, 77.51%) and f-measure (78.86%, 78.81%) averages. The results showed evidence of significant differences between the algorithms, and also showed that, although the SVM had the best metric of accuracy and recall, it results were statistically similar with Random Forest results.