Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 11

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  Random Forest
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
This work evaluates the efficiency of Random Forest (RF) regression for predicting water quality indicators and investigates factors affecting water quality in 11 watersheds in Virginia, District of Columbia, and Maryland. Ten years of daily water quality data along with hydro-meteorological information (such as precipitation) and watershed physiology and characteristics (e.g., size, soil type, land use) are used to predict dissolved oxygen (DO), specific conductivity (K), and turbidity (Tu) across the selected watersheds. The RF regression model is developed for six scenarios, with an increasing number of predictors introduced in each scenario. The first scenario contains the smallest amount of information (water quality indicators DO, K and Tu), while scenario 6 contains all the available variables. The RF model is evaluated based on three statistical metrics: the relative root mean square error, the correlation coefficient, and the percentage of variance explained. In addition, the degree of importance for each predictor is used to rank their importance within each scenario. The model shows excellent performance for DO as the predicted variable. The model predicting K slightly outperforms the one predicting Tu. Scenario 4 (built based on water quality indicators, hydro-meteorological data, watershed physiology and land cover information) provided the best tradeoff between performance and efficiency (quantified in terms of the amount of information needed to develop the model). In conclusion, based on the RF model, land cover plays a significant role in predicting water quality indicators. In addition, the developed RF regression model is adaptable to watersheds in this region over a range of climates.
2
89%
EN
Students' dropout is certainly one of the major problems that afflict educational institutions, the losses caused by the student's abandonment are social, academic and economic waste. The quest for its causes has been subject of work and educational research around the world. Several organizations seek strategic decisions to control the dropout rate. This work's goal is to evaluate the effectiveness of the most used data mining algorithms in the education area. An "in vivo'' controlled experiment was planned and performed to compare the efficacy selected classifiers. The Random Forest and SVM algorithms have stood out in this context, having, statistically similar accuracy (80.36%, 81.18%), precision (80.79%, 80.25%), recall (76.50%, 77.51%) and f-measure (78.86%, 78.81%) averages. The results showed evidence of significant differences between the algorithms, and also showed that, although the SVM had the best metric of accuracy and recall, it results were statistically similar with Random Forest results.
3
Content available remote The decline of Svalbard land-fast sea ice extent as a result of climate change
88%
EN
The Svalbard Archipelago has experienced some of the most severe temperature increases in the Arctic in the last three decades. This temperature rise has accelerated sea-ice melting along the coast of the archipelago, thus bringing changes to the local environment. In view of the importance of the near-future distribution of land-fast sea ice along the Svalbard coast, the available observation data on the ice extent between 1973 and 2018 are used herein to create a random forest (RF) model for predicting the daily ice extent and its spatial distribution according to the cumulative number of freezing and thawing degree days and the duration of the ice season. Two RF models are constructed by using either regression or classification algorithms. The regression model makes it possible to estimate the extent of land-fast ice with a root mean square error (RMSE) of 800 km2, while the classification model creates a cluster of submodels in order to forecast the spatial distribution of land-fast ice with less than 10% error. The models also enable the reconstruction of the past ice extent, and the prediction of the near-future extent, from standard meteorological data, and can even analyze the real-time spatial variability of land-fast ice. On average, the minimum two-monthly extent of land-fast sea ice along the Svalbard coast was about 12,000 km2 between 1973 and 2000. In 2005–2019, however, the ice extent declined to about 6,000 km2. A further increase in mean winter air temperatures by two degrees, which is forecast in 10 to 20 years, will result in a minimum two-monthly land-fast ice extent of about 1,500 km2, thus indicating a trend of declining land-fast ice extent in this area.
EN
While building predictive models in analytical CRM, researchers often encounter the problem of imbalanced classes (skewed distributions of dependent variables), which consists in the fact that the number of observations belonging to one category of the dependent variable is much lower than the number of observations belonging to the second category of that variable. This is related to such areas as churn analysis, customer acquisition models and cross and up-selling models. The purpose of the paper is to present a predictive model that was built to predict the response of Internet users to banner advertising. The dataset used in the study came from an online social network which offers advertisers banner campaigns targeting its users. The advertising campaign of a cosmetics company was carried out in the autumn of 2010 and was mainly targeted at young women. A user of this service was described by 115 independent variables – 3 out of which were demographic variables (sex, age, education), and the remaining 112 referred to the user’s online activity. While building the model there appeared the problem of imbalanced classes due to the low number of users who clicked on the banner ad. The number of cases amounted to 81,000, while the number of positive reactions to the banner was 207, which constitutes approximately 0.25% of the dependent variable. During the study, two popular data mining tools were utilized – the decision trees C&RT and Random Forest. The second goal of this paper is to compare the performance of the predictive models based on both these analytical tools.
EN
The authors consider the problem of fraud detection at self-checkouts in retail in condition of unbalanced data set. A new ensemble-based method is proposed for its effective solution. The developed method involves two main steps: application of the preprocessing procedures and the Random Forest algorithm. The step-by-step implementation of the preprocessing stage involves the sequential execution of such procedures over the input data: scaling by maximal element in a column with row-wise scaling by Euclidean norm, weighting by correlation and applying polynomial extension. For polynomial extension Ito decomposition of the second degree is used. The simulation of the method was carried out on real data. Evaluating performance was based on the use of cost matrix. The experimental comparison of the effectiveness of the developed ensemble-based method with a number of existing (simples and ensembles) demonstrates the best performance of the developed method. Experimental studies of changing the parameters of the Random Forest both for the basic algorithm and for the developed method demonstrate a significant improvement of the investigated efficiency measures of the latter. It is the result of all steps of the preprocessing stage of the developed method use.
EN
Automatic car license plate recognition (LPR) is widely used nowadays. It involves plate localization in the image, character segmentation and optical character recognition. In this paper, a set of descriptors of image segments (characters) was proposed as well as a technique of multi-stage classification of letters and digits using cascade of neural network and several parallel Random Forest or classification tree or rule list classifiers. The proposed solution was applied to automated recognition of number plates which are composed of capital Latin letters and Arabic numerals. The paper presents an analysis of the accuracy of the obtained classifiers. The time needed to build the classifier and the time needed to classify characters using it are also presented.
EN
Vegetation mapping is an active research area in the domain of remote sensing. This study proposes a methodology for the mapping of vegetation by integrating several vegetation indices along with original spectral bands. The Land Use Land Cover classification was performed by two powerful Machine Learning techniques, namely Random Forest and AdaBoost. The Random Forest algorithm works on the concept of building multiple decision trees for the final prediction. The other Machine Learning technique selected for the classification is AdaBoost (adaptive boosting), converts a set of weak learners into strong learners. Here, multispectral satellite data of Dehradun, India, was utilised. The results demonstrate an increase of 3.87% and 4.32% after inclusion of selected vegetation indices by Random Forest and AdaBoost respectively. An Overall Accuracy (OA) of 91.23% (kappa value of 0.89) and 88.59% (kappa value of 0.86) was obtained by means of the Random Forest and AdaBoost classifiers respectively. Although Random Forest achieved greater OA as compared to AdaBoost, interestingly AdaBoost provided better class-specific accuracy for the Shrubland class compared to Random Forest. Furthermore, this study also evaluated the importance of each individual feature used in the classification. Results demonstrated that the NDRE, GNDVI, and RTVIcore vegetation indices, and spectral bands (NIR, and Red-Edge), obtained higher importance scores.
EN
Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen based on the model's performance and execution time. This paper compares and analyzes the performance of basic as well as ensemble classifiers utilizing 10-fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from Kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01%. The proposed ensemble combinations outperformed the conven¬tional models for few datasets.
EN
Fingerprints play an important role in public safety and criminal investigations such as: B. Legal Investigations, Law Enforcement, Cultural Access, and Social Security. It can also help to give people a comfortable and secure life. Various gender segregation strategies have been proposed. In this article, the fingerprint algorithm uses a variety of Naive Bayes classifiers, SVM, Logistics Regression and Random Forest which they use to obtain the best results of gender segregation, a new fingerprint method can be created by Naive Bayes classifier, SVM, Logistics Regression and The Random Forest used and compiled proposed from different divisions obtained the best possible division of results by Random Forest, with 98\% accuracy compared to Naive Bayes, SVM and Logistics Regression, based on Random. The forest is the most sensitive to gender segregation.
EN
In the age of social media, every second thousands of messages are exchanged. Analyzing those unstructured data to find out specific emotions is a challenging task. Analysis of emotions involves evaluation and classification of text into emotion classes such as Happy, Sad, Anger, Disgust, Fear, Surprise, as defined by emotion dimensional models which are described in the theory of psychology (www 1; Russell, 2005). The main goal of this paper is to cover the COVID-19 pandemic situation in India and its impact on human emotions. As people very often express their state of the mind through social media, analyzing and tracking their emotions can be very effective for government and local authorities to take required measures. We have analyzed different machine learning classification models, such as Naïve Bayes, Support Vector Machine, Random Forest Classifier, Decision Tree and Logistic Regression with 10-fold cross validation to find out top ML models for emotion classification. After tuning the Hyperparameter, we got Logistic regression as the best suited model with accuracy 77% with the given datasets. We worked on algorithm based supervised ML technique to get the expected result. Although multiple studies were conducted earlier along the same lines, none of them performed comparative study among different ML techniques or hyperparameter tuning to optimize the results. Besides, this study has been done on the dataset of the most recent COVID-19 pandemic situation, which is itself unique. We captured Twitter data for a duration of 45 days with hashtag #COVID19India OR #COVID19 and analyzed the data using Logistic Regression to find out how the emotion changed over time based on certain social factors
EN
Vietnam is an Asian country with hot and humid tropical climate throughout the year. Forests account for more than 40% of the total land area and have a very rich and diverse vegetation. Monitoring the changes in the vegetation cover is obviously important yet challenging, considering such large varying areas and climatic conditions. A traditional remote sensing technique to monitor the vegetation cover involves the use of optical satellite images. However, in presence of the cloud cover, the analyses done using optical satellite image are not reliable. In such a scenario, radar images are a useful alternative due to the ability of radar pulses in penetrating through the clouds, regardless of day or night. In this study, we have used multi temporal C band satellite images to monitor vegetation cover changes for an area in Dau Tieng and Ben Cat districts of Binh Duong province, Mekong Delta, Vietnam. With a collection of 46 images between March 2015 and February 2017, the changes of five land cover types including vegetation loss and replanting in 2017 were analyzed by selecting two cases, using 9 images in the dry season of 3 years 2015, 2016 and 2017 and using all of 46 images to conduct Random Forest classifier with 100, 200, 300 and 500 trees respectively. The result in which the model with nine images and 300 trees gave the best accuracy with an overall accuracy of 98.4% and a Kappa of 0.97. The results demonstrated that using VH polarization, Sentinel-1 gives quite a good accuracy for vegetation cover change. Therefore, Sentinel-1 can also be used to generate reliable land cover maps suitable for different applications.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.