Wyniki wyszukiwania - BazTech

1

Stochastic schemata exploiter-based optimization of hyper-parameters for XGBoost

Makino Hiroya, Kita Eisuke

Computer Assisted Methods in Engineering and Science

|

2024

|

Vol. 31, no. 1

113--132

EN

XGBoost is well-known as an open-source software library that provides a regularizing gradient boosting framework. Although it is widely used in the machine learning field, its performance depends on the determination of hyper-parameters. This study focuses on the optimization algorithm for hyper-parameters of XGBoost by using Stochastic Schemata Exploiter (SSE). SSE, which is one of Evolutionary Algorithms, is successfully applied to combinatorial optimization problems. SSE is applied for optimizing hyper-parameters of XGBoost in this study. The original SSE algorithm is modified for hyper-parameter optimization. When comparing SSE with a simple Genetic Algorithm, there are two interesting features: quick convergence and a small number of control parameters. The proposed algorithm is compared with other hyper-parameter optimization algorithms such as Gradient Boosted Regression Trees (GBRT), Tree-structured Parzen Estimator (TPE), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and Random Search in order to confirm its validity. The numerical results show that SSE has a good convergence property, even with fewer control parameters than other methods.

2

Classification of cognitive states using clustering-split time series framework

Ramakrishna J. Siva, Ramasangu Hariharan

Computer Assisted Methods in Engineering and Science

|

2024

|

Vol. 31, no. 2

241--260

EN

Over the last two decades, functional Magnetic Resonance Imaging (fMRI) has provided immense data about the dynamics of the brain. Ongoing developments in machine learning suggest improvements in the performance of fMRI data analysis. Clustering is one of the critical techniques in machine learning. Unsupervised clustering techniques are utilized to partition the data objects into different groups. Supervised classification techniques applied to fMRI data facilitate the decoding of cognitive states while a subject is engaged in a cognitive task. Due to the high dimensional, sparse, and noisy nature of fMRI data, designing a classifier model for estimating cognitive states becomes challenging. Feature selection and feature extraction techniques are critical aspects of fMRI data analysis. In this work, we present one such synergy, a combination of Hierarchical Consensus Clustering (HCC) and the Statistics of Split Timeseries (SST) framework to estimate cognitive states. The proposed HCC-SST model’s performance has been verified on StarPlus fMRI data. The obtained experimental results show that the proposed classifier model achieves 99% classification accuracy with a smaller number of voxels and lower computational cost.

3

Overcoming Overfitting Challenges with HOG Feature Extraction and XGBoost-Based Classification for Concrete Crack Monitoring

Barkiah Ida, Sari Yuslena

International Journal of Electronics and Telecommunications

|

2023

|

Vol. 69, No. 3

571--577

EN

This study proposes a method that combines Histogram of Oriented Gradients (HOG) feature extraction and Extreme Gradient Boosting (XGBoost) classification to resolve the challenges of concrete crack monitoring. The purpose of the study is to address the common issue of overfitting in machine learning models. The research uses a dataset of 40,000 images of concrete cracks and HOG feature extraction to identify relevant patterns. Classification is performed using the ensemble method XGBoost, with a focus on optimizing its hyperparameters. This study evaluates the efficacy of XGBoost in comparison to other ensemble methods, such as Random Forest and AdaBoost. XGBoost outperforms the other algorithms in terms of accuracy, precision, recall, and F1-score, as demonstrated by the results. The proposed method obtains an accuracy of 96.95% with optimized hyperparameters, a recall of 96.10%, a precision of 97.90%, and an F1-score of 97%. By optimizing the number of trees hyperparameter, 1200 trees yield the greatest performance. The results demonstrate the efficacy of HOG-based feature extraction and XGBoost for accurate and dependable classification of concrete fractures, overcoming the overfitting issues that are typically encountered in such tasks.

4

Comprehensive machine learning and deep learning approaches for Parkinson's disease classification and severity assessment

Majdoubi Oumaima, Benba Achraf, Hammouch Ahmed

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2023

|

T. 13, nr 4

15--20

EN

In this study, we aimed to adopt a comprehensive approach to categorize and assess the severity of Parkinson's disease by leveraging techniques from both machine learning and deep learning. We thoroughly evaluated the effectiveness of various models, including XGBoost, Random Forest, Multi-Layer Perceptron (MLP), and Recurrent Neural Network (RNN), utilizing classification metrics. We generated detailed reports to facilitate a comprehensive comparative analysis of these models. Notably, XGBoost demonstrated the highest precision at 97.4%. Additionally, we took a step further by developing a Gated Recurrent Unit (GRU) model with the purpose of combining predictions from alternative models. We assessed its ability to predict the severity of the ailment. To quantify the precision levels of the models in disease classification, we calculated severity percentages. Furthermore, we created a Receiver Operating Characteristic (ROC) curve for the GRU model, simplifying the evaluation of its capability to distinguish among various severity levels. This comprehensive approach contributes to a more accurate and detailed understanding of Parkinson's disease severity assessment.

PL

W tym badaniu naszym celem było przyjęcie kompleksowego podejścia do kategoryzacji i oceny ciężkości choroby Parkinsona poprzez wykorzystanie technik zarówno uczenia maszynowego, jak i głębokiego uczenia. Dokładnie oceniliśmy skuteczność różnych modeli, w tym XGBoost, Random Forest, Multi-Layer Perceptron (MLP) i Recurrent Neural Network (RNN), wykorzystując wskaźniki klasyfikacji. Wygenerowaliśmy szczegółowe raporty, aby ułatwić kompleksową analizę porównawczą tych modeli. Warto zauważyć, że XGBoost wykazał najwyższą precyzję na poziomie 97,4%. Ponadto poszliśmy o krok dalej, opracowując model Gated Recurrent Unit (GRU) w celu połączenia przewidywań z alternatywnych modeli. Oceniliśmy jego zdolność do przewidywania nasilenia dolegliwości. Aby określić ilościowo poziomy dokładności modeli w klasyfikacji chorób, obliczyliśmy wartości procentowe nasilenia. Ponadto stworzyliśmy krzywą charakterystyki operacyjnej odbiornika (ROC) dla modelu GRU, upraszczając ocenę jego zdolności do rozróżniania różnych poziomów nasilenia. To kompleksowe podejście przyczynia się do dokładniejszego i bardziej szczegółowego zrozumienia oceny ciężkości choroby Parkinsona.

5

Ensemble of feature extraction methods to improve the structural damage classification in a wind turbine foundation

Leon-Medina Jersson X., Parés Núria, Anaya Maribel, Tibaduiza Diego A., Pozo Francesc

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2023

|

Vol. 71, nr 3

art. no. e144606

EN

The condition monitoring of offshore wind power plants is an important topic that remains open. This monitoring aims to lower the maintenance cost of these plants. One of the main components of the wind power plant is the wind turbine foundation. This study describes a data-driven structural damage classification methodology applied in a wind turbine foundation. A vibration response was captured in the structure using an accelerometer network. After arranging the obtained data, a feature vector of 58 008 features was obtained. An ensemble approach of feature extraction methods was applied to obtain a new set of features. Principal Component Analysis (PCA) and Laplacian eigenmaps were used as dimensionality reduction methods, each one separately. The union of these new features is used to create a reduced feature matrix. The reduced feature matrix is used as input to train an Extreme Gradient Boosting (XGBoost) machine learning-based classification model. Four different damage scenarios were applied in the structure. Therefore, considering the healthy structure, there were 5 classes in total that were correctly classified. Five-fold cross validation is used to obtain a final classification accuracy. As a result, 100% of classification accuracy was obtained after applying the developed damage classification methodology in a wind-turbine offshore jacket-type foundation benchmark structure.

6

The Use of the XGBoost and Kriging Methods in the Prediction of the Microstructure of CGI Cast Iron

Sztangret Łukasz, Olejarczyk-Wożeńska Izabela, Regulski Krzysztof, Gumienny Grzegorz, Mrzygłód Barbara

Archives of Foundry Engineering

|

2023

|

Vol. 23, iss. 4

22--33

EN

Compacted Graphite Iron (CGI) is a unique casting material characterized by its graphite form and extensive matrix contact surface. This type of cast iron has a tendency towards direct ferritization and possesses a complex set of intriguing properties. The use of data mining methods in modern foundry material development facilitates the achievement of improved product quality parameters. When designing a new product, it is always necessary to have a comprehensive understanding of the influence of alloying elements on the microstructure and consequently on the properties of the analyzed material. Empirical studies allow for a qualitative assessment of the above-mentioned relationships, but it is the use of intelligent computational techniques that allows for the construction of an approximate model of the microstructure and, consequently, precise predictions. The formulated prognostic model supports technological decisions during the casting design phase and is considered as the first step in the selection of the appropriate material type.

7

Application of machine learning tools for seismic reservoir characterization study of porosity and saturation type

Topór Tomasz, Sowiżdżał Krzysztof

Nafta-Gaz

|

2022

|

R. 78, nr 3

165--175

EN

The application of machine learning (ML) tools and data-driven modeling became a standard approach for solving many problems in exploration geology and contributed to the discovery of new reservoirs. This study explores an application of machine learning ensemble methods – random forest (RF) and extreme gradient boosting (XGBoost) to derive porosity and saturation type (gas/water) in multihorizon sandstone formations from Miocene deposits of the Carpathian Foredeep. The training of ML algorithms was divided into two stages. First, the RF algorithm was used to compute porosity based on seismic attributes and well location coordinates. The obtained results were used as an extra feature to saturation type modeling using the XGBoost algorithm. The XGBoost was run with and without well location coordinates to evaluate the influence of the spatial information for the modeling performance. The hyperparameters for each model were tuned using the Bayesian optimization algorithm. To check the training models' robustness, 10-fold cross-validation was performed. The results were evaluated using standard metrics, for regression and classification, on training and testing sets. The residual mean standard error (RMSE) for porosity prediction with RF for training and testing was close to 0.053, providing no evidence of overfitting. Feature importance analysis revealed that the most influential variables for porosity prediction were spatial coordinates and seismic attributes sweetness. The results of XGBoost modeling (variant 1) demonstrated that the algorithm could accurately predict saturation type despite the class imbalance issue. The sensitivity for XGBoost on training and testing data was high and equaled 0.862 and 0.920, respectively. The XGBoost model relied on computed porosity and spatial coordinates. The obtained sensitivity results for both training and testing sets dropped significantly by about 10% when well location coordinates were removed (variant 2). In this case, the three most influential features were computed porosity, seismic amplitude contrast, and iso-frequency component (15 Hz) attribute. The obtained results were imported to Petrel software to present the spatial distribution of porosity and saturation type. The latter parameter was given with probability distribution, which allows for identifying potential target zones enriched in gas.

PL

Metody uczenia maszynowego stanowią obecnie rutynowe narzędzie wykorzystywane przy rozwiązywaniu wielu problemów w geologii poszukiwawczej i przyczyniają się do odkrycia nowych złóż. Prezentowana praca pokazuje zastosowanie dwóch algorytmów uczenia maszynowego – lasów losowych (RF) i drzew wzmocnionych gradientowo (XGBoost) do wyznaczenia porowatości i typu nasycenia (gaz/woda) w formacjach piaskowców będących potencjalnymi horyzontami gazonośnymi w mioceńskich osadach zapadliska przedkarpackiego. Proces uczenia maszynowego został podzielony na dwa etapy. W pierwszym etapie użyto RF do obliczenia porowatości na podstawie danych pochodzących z atrybutów sejsmicznych oraz współrzędnych lokalizacji otworów. Uzyskane wyniki zostały wykorzystane jako dodatkowa cecha przy modelowaniu typu nasycenia z zastosowaniem algorytmu XGBoost. Modelowanie za pomocą XGBoost został przeprowadzone w dwóch wariantach – z wykorzystaniem lokalizacji otworów oraz bez nich w celu oceny wpływu informacji przestrzennych na wydajność modelowania. Proces strojenia hiperparametrów dla poszczególnych modeli został przeprowadzony z wykorzystaniem optymalizacji Bayesa. Wyniki procesu modelowania zostały ocenione na zbiorach treningowym i testowym przy użyciu standardowych metryk wykorzystywanych do rozwiązywania problemów regresyjnych i klasyfikacyjnych. Dodatkowo, aby wzmocnić wiarygodność modeli treningowych, przeprowadzona została 10-krotna kroswalidacja. Pierwiastek błędu średniokwadratowego (RMSE) dla wymodelowanej porowatości na zbiorach treningowym i testowym był bliski 0,053 co wskazuje na brak nadmiernego dopasowania modelu (ang. overfitting). Analiza istotności cech ujawniła, że zmienną najbardziej wpływającą na prognozowanie porowatości były współrzędne lokalizacji otworów oraz atrybut sejsmiczny sweetness. Wyniki modelowania XGBoost (wariant 1) wykazały, że algorytm jest w stanie dokładnie przewidywać typ nasycenia pomimo problemu z nierównowagą klas. Czułość wykrywania potencjalnych stref gazowych w przypadku modelu XGBoost była wysoka zarówno dla zbioru treningowego, jak i testowego (0,862 i 0,920). W swoich predykcjach model opierał się głównie na wyliczonej porowatości oraz współrzędnych otworów. Czułość dla uzyskanych wyników na zbiorze treningowym i testowym spadła o około 10%, gdy usunięto współrzędne lokalizacji otworów (wariant 2 XGBoost). W tym przypadku trzema najważniejszymi cechami były obliczona porowatość oraz atrybut sejsmiczny amplitude contrast i atrybut iso-frequency component (15 Hz). Uzyskane wyniki zostały zaimportowane do programu Petrel, aby przedstawić przestrzenny rozkład porowatości i typu nasycenia. Ten ostatni parametr został przedstawiony wraz z rozkładem prawdopodobieństwa, co dało wgląd w strefy o najwyższym potencjale gazowym.

8

Flight delay prediction based with machine learning

Hatıpoğlu Irmak, Tosun Ömür, Tosun Nedret

LogForum

|

2022

|

Vol. 18, no. 1

97--107

EN

Background: The delay of a planned flight causes many undesirable situations such as cost, customer satisfaction, environmental pollution. There is only one way to prevent these problems before they occur, and that is to know which flights will be delayed. The aim of this study is to predict delayed flights. For this, the use of machine learning techniques, which have become widespread with the development of computer capacities and data storage systems, is preferred. Methods: Estimations are made with three up-to-date techniques XGBoost, LightGBM, and CatBoost techniques based on Gradient Boosting from machine learning techniques. The bayesian technique is used for hyper-parameter settings. In addition, the Synthetic Minority Over-Sampling Technique (SMOTE) technique is also used, as the majority of flights are on time and delayed flights, which constitute a minority class, may adversely affect the results. The results are analyzed and shared with and without SMOTE. Results: As a consequence of the application, which was run on a data set containing all of an international airline's flights [18148 flights] for a year, it was discovered that flights may be predicted with high accuracy. Conclusions: The application of machine learning techniques to anticipate flight delays is new, but it has a lot of potential. Companies will be able to avert problems before they develop if delays are correctly estimated, which can generate plenty of issues. As a result, concrete advantages such as lower costs and higher customer satisfaction will emerge. Improvements will be made at the most vulnerable place in the aviation business.

9

Data analysis-based time series forecast for managing household electricity consumption

Bezzar Nour El-Houda, Laimeche Lakhdar, Meraoumia Abdallah, Houam Lotfi

Demonstratio Mathematica

|

2022

|

Vol. 55, nr 1

900--921

EN

Recently, electricity consumption forecasting has attracted much research due to its importance in our daily life as well as in economic activities. This process is seen as one of the ways to manage future electricity needs, including anticipating the supply-demand balance, especially at peak times, and helping the customer make real-time decisions about their consumption. Therefore, based on statistical techniques (ST) and/or artificial intelligence (AI), many forecasting models have been developed in the literature, but unfortunately, in addition to poor choice of the appropriate model, time series datasets were used directly without being seriously analyzed. In this article, we have proposed an efficient electricity consumption prediction model that takes into account the shortcomings mentioned earlier. Therefore, the database was analyzed to address all anomalies such as non-numeric values, aberrant, and missing values. In addition, by analyzing the correlation between the data, the possible periods for forecasting electricity consumption were determined. The experimental results carried out on the Individual Household Electricity Power Consumption dataset showed a clear superiority of the proposed model over most of the ST and/or AI-based models proposed in the literature.

10

Geological mapping using extreme gradient boosting and the deep neural networks: application to silet area, central Hoggar, Algeria

Elbegue Abderrahmane Aref, Allek Karim, Zeghouane Hocine

Acta Geophysica

|

2022

|

Vol. 70, no 4

1581--1599

EN

Nowadays, machine learning algorithms are considered a powerful tool for analyzing big and complex data due to their ability to deliver accurate and fast results. The main objective of the present study is to prove the effectiveness of the extreme gradient boosting (XGBoost) method as well as employed data types in the Saharan region mapping. To reveal the potential of the XGBoost, we conducted two experiments. The first was to use different combinations of: airborne gamma-ray spectrometry data, airborne magnetic data, Landsat 8 data and digital elevation model. The objective is to train 9 XGBoost models in order to determine each data type sensitivity in capturing the lithological rock classes. The second experiment was to compare the XGBoost to deep neural networks (DNN) to display its potential against other machine learning algorithms. Compared to the existing geological map, the application of XGBoost reveals a great potential for geological mapping as it was able to achieve a correlation score of (78%) where igneous and metamorphic rocks are easily identified compared to sedimentary rocks. In addition, using different data combinations reveals airborne magnetic data utility to discriminate some lithological units. It also reveals the potential of the apparent density, derived from airborne magnetic data, to improve the algorithm’s accuracy up to 20%. Furthermore, the second experiment in this study indicates that the XGBoost is a better choice for the geological mapping task compared to the DNN. The obtained predicted map shows that the XGBoost method provides an efficient tool to update existing geological maps and to edit new geological maps in the region with well outcropped rocks.

11

Feature engineering combined with 1-D convolutional neural network for improved mortality prediction

Verma Rohit, Maheshwari Saumil, Shukla Anupam

Bio-Algorithms and Med-Systems

|

2020

|

Vol. 16, no. 4

art. no. 20200056

EN

Objectives: The appropriate care for patients admitted in Intensive care units (ICUs) is becoming increasingly prominent, thus recognizing the use of machine learning models. The real-time prediction of mortality of patients admitted in ICU has the potential for providing the physician with the interpretable results. With the growing crisis including soaring cost, unsafe care, misdirected care, fragmented care, chronic diseases and evolution of epidemic diseases in the domain of healthcare demands the application of automated and real-time data processing for assuring the improved quality of life. The intensive care units (ICUs) are responsible for generating a wealth of useful data in the form of Electronic Health Record (EHR). This data allows for the development of a prediction tool with perfect knowledge backing. Method: We aimed to build the mortality prediction model on 2012 Physionet Challenge mortality prediction database of 4,000 patients admitted in ICU. The challenges in the dataset, such as high dimensionality, imbalanced distribution and missing values, were tackled with analytical methods and tools via feature engineering and new variable construction. The objective of the research is to utilize the relations among the clinical variables and construct new variables which would establish the effectiveness of 1- Dimensional Convolutional Neural Network (1-D CNN) with constructed features. Results: Its performance with the traditional machine learning algorithms like XGBoost classifier, Light Gradient Boosting Machine (LGBM) classifier, Support Vector Machine (SVM), Decision Tree (DT), K-Neighbours Classifier (K-NN), and Random Forest Classifier (RF) and recurrent models like Long Short-Term Memory (LSTM) and LSTMattention is compared for Area Under Curve (AUC). The investigation reveals the best AUC of 0.848 using 1-D CNN model. Conclusion: The relationship between the various features were recognized. Also, constructed new features using existing ones. Multiple models were tested and compared on different metrics.

12

Developing an XGBoost model to predict blast‑induced peak particle velocity in an open‑pit mine: a case study

Nguyen Hoang, Bui Xuan‑Nam, Bui Hoang‑Bac, Cuong Dao Trong

Acta Geophysica

|

2019

|

Vol. 67, no. 2

477--490

EN

Ground vibration is one of the most undesirable effects induced by blasting operations in open-pit mines, and it can cause damage to surrounding structures. Therefore, predicting ground vibration is important to reduce the environmental effects of mine blasting. In this study, an eXtreme gradient boosting (XGBoost) model was developed to predict peak particle velocity (PPV) induced by blasting in Deo Nai open-pit coal mine in Vietnam. Three models, namely, support vector machine (SVM), random forest (RF), and k-nearest neighbor (KNN), were also applied for comparison with XGBoost. To employ these models, 146 datasets from 146 blasting events in Deo Nai mine were used. Performance of the predictive models was evaluated using root-mean-squared error (RMSE) and coefficient of determination (R2). The results indicated that the developed XGBoost model with RMSE = 1.554, R2 = 0.955 on training datasets, and RMSE = 1.742, R2 = 0.952 on testing datasets exhibited higher performance than the SVM, RF, and KNN models. Thus, XGBoost is a robust algorithm for building a PPV predictive model. The proposed algorithm can be applied to other open-pit coal mines with conditions similar to those in Deo Nai.