Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 156

Liczba wyników na stronie
first rewind previous Strona / 8 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  feature selection
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 8 next fast forward last
EN
Feature Selection (FS) is an essential research topic in the area of machine learning. FS, which is the process of identifying the relevant features and removing the irrelevant and redundant ones, is meant to deal with the high dimensionality problem for the sake of selecting the best performing feature subset. In the literature, many feature selection techniques approach the task as a research problem, where each state in the search space is a possible feature subset. In this paper, we introduce a new feature selection method based on reinforcement learning. First, decision tree branches are used to traverse the search space. Second, a transition similarity measure is proposed so as to ensure exploit-explore trade-off. Finally, the informative features are the most involved ones in constructing the best branches. The performance of the proposed approaches is evaluated on nine standard benchmark datasets. The results using the AUC score show the effectiveness of the proposed system.
EN
The scope of this paper is that it investigates and proposes a new clustering method thattakes into account the timing characteristics of frequently used feature words and thesemantic similarity of microblog short texts as well as designing and implementing mi-croblog topic detection and detection based on clustering results. The aim of the proposedresearch is to provide a new cluster overlap reduction method based on the divisions ofsemantic memberships to solve limited semantic expression and diversify short microblogcontents. First, by defining the time-series frequent word set of the microblog text, a fea-ture word selection method for hot topics is given; then, for the existence of initial clusters,according to the time-series recurring feature word set, to obtain the initial clustering ofthe microblog.
EN
With the advent of social media, the volume of photographs uploaded on the internet has increased exponentially. The task of efficiently recognizing and retrieving human facial images is inevitable and essential at this time. In this work, a feature selection approach for recognizing and retrieving human face images using hybrid cheetah optimization algorithm is proposed. The deep feature extraction from the images is done using deep convolutional neural networks. Hybrid cheetah optimization algorithm, an improvised version of cheetah optimization algorithm fused with genetic algorithm is used, to choose optimum features from the extracted deep features. The chosen features are used for finding the best-matching images from the image database. The image matching is performed by approximate nearest neighbor search for the query image over the image database and similar images are retrieved. By constructing a k-NN graph for the images, the efficiency of image retrieval is enhanced. The proposed system performance is evaluated against benchmark datasets such as LFW, MultiePie, ColorFERET, DigiFace-1M and CelebA. The evaluation results show that the proposed methodology is superior to various existing methodologies.
EN
The paper presents special forms of an ensemble of classifiers for analysis of medical images based on application of deep learning. The study analyzes different structures of convolutional neural networks applied in the recognition of two types of medical images: dermoscopic images for melanoma and mammograms for breast cancer. Two approaches to ensemble creation are proposed. In the first approach, the images are processed by a convolutional neural network and the flattened vector of image descriptors is subjected to feature selection by applying different selection methods. As a result, different sets of a limited number of diagnostic features are generated. In the next stage, these sets of features represent input attributes for the classical classifiers: support vector machine, a random forest of decision trees, and softmax. By combining different selection methods with these classifiers an ensemble classification system is created and integrated by majority voting. In the second approach, different structures of convolutional neural networks are directly applied as the members of the ensemble. The efficiency of the proposed classification systems is investigated and compared to medical data representing dermoscopic images of melanoma and breast cancer mammogram images. Thanks to fusion of the results of many classifiers forming an ensemble, accuracy and all other quality measures have been significantly increased for both types of medical images.
EN
This article presents a model based on machine learning for the selection of the characteristics that most influence the low industrial yield of cane sugar production in Cuba. The set of data used in this work corresponds to a period of ten years of sugar harvests from 2010 to 2019. A pro‐ cess of understanding the business and of understand‐ ing and preparing the data is carried out. The accuracy of six rule learning algorithms is evaluated: CONJUNC‐ TIVERULE, DECISIONTABLE, RIDOR, FURIA, PART and JRIP. The results obtained allow us to identify: R417, R379, R378, R419a, R410, R613, R1427 and R380, as the indi‐ cators that most influence low industrial performance.
EN
Reliability is one of the key factors used to gauge software quality. Software defect prediction (SDP) is one of the most important factors which affectsmeasuring software's reliability. Additionally, the high dimensionality of the features has a direct effect on the accuracy of SDP models.The objective of this paper is to propose a hybrid binary whale optimization algorithm (BWOA) based on taper-shape transfer functions for solving feature selection problems and dimension reduction with a KNN classifier as a new software defect prediction method. In this paper, the values of a real vector that representsthe individual encoding have been converted to binary vector by using the four types of Taper-shaped transfer functionsto enhance the performance of BWOA to reduce the dimension of the search space. The performance of the suggestedmethod (T-BWOA-KNN)was evaluatedusing eleven standard software defect prediction datasets from the PROMISE and NASA repositories depending on the K-Nearest Neighbor (KNN) classifier. Seven evaluation metrics have been used to assess the effectiveness of the suggested method. The experimental results have shownthat the performanceof T-BWOA-KNNproduced promising results compared to other methods including ten methods from the literature, four typesof T-BWOAwith the KNN classifier. In addition, the obtained results are compared and analyzed with other methods from the literature in termsof the average numberof selected features (SF) and accuracy rate (ACC) using the Kendall W test. In this paper, a new hybrid software defect prediction methodcalledT-BWOA-KNNhas been proposed which is concerned with the feature selection problem. The experimental results have provedthatT-BWOA-KNN produced promising performance compared with other methods for most datasets.
PL
Niezawodność jest jednym z kluczowych czynników stosowanych do oceny jakości oprogramowania.Przewidywanie defektów oprogramowania SDP (ang. Software Defect Prediction) jest jednym z najważniejszych czynników wpływających na pomiar niezawodności oprogramowania. Dodatkowo, wysoka wymiarowość cech ma bezpośredni wpływ na dokładność modeli SDP.Celemartykułu jest zaproponowanie hybrydowego algorytmu optymalizacji BWOA (ang. Binary Whale Optimization Algorithm) w oparciu o transmitancję stożkową do rozwiązywania problemów selekcji cech i redukcji wymiarów za pomocą klasyfikatora KNN jako nowej metody przewidywania defektów oprogramowania.W artykule, wartości wektora rzeczywistego, reprezentującego indywidualne kodowanie zostały przekonwertowane na wektor binarny przy użyciu czterech typów funkcji transferu w kształcie stożka w celu zwiększenia wydajności BWOA i zmniejszenia wymiaru przestrzeni poszukiwań.Wydajność sugerowanej metody (T-BWOA-KNN) oceniano przy użyciu jedenastu standardowych zestawów danych do przewidywania defektów oprogramowania z repozytoriów PROMISE i NASA w zależności od klasyfikatora KNN. Do oceny skuteczności sugerowanej metody wykorzystano siedemwskaźników ewaluacyjnych. Wyniki eksperymentów wykazały, że działanie rozwiązania T-BWOA-KNN pozwoliło uzyskaćobiecujące wyniki w porównaniu z innymi metodami, w tym dziesięcioma metodami na podstawie literatury, czterema typami T-BWOA z klasyfikatorem KNN. Dodatkowo, otrzymane wyniki zostały porównanei przeanalizowane innymi metodami z literatury pod kątem średniej liczby wybranych cech (SF) i współczynnika dokładności (ACC), z wykorzystaniem testu W.Kendalla. W pracy, zaproponowano nową hybrydową metodę przewidywania defektów oprogramowania, nazwaną T-BWOA-KNN, która dotyczy problemu wyboru cech. Wyniki eksperymentów wykazały, że w przypadku większości zbiorów danych T-BWOA-KNN uzyskała obiecującą wydajnośćw porównaniu z innymi metodami.
EN
Many countries have adopted a public health approach that aims to address the particular challenges faced during the pandemic Coronavirus disease 2019 (COVID-19). Researchers mobilized to manage and limit the spread of the virus, and multiple artificial intelligence-based systems are designed to automatically detect the disease. Among these systems, voice-based ones since the virus have a major impact on voice production due to the respiratory system's dysfunction. In this paper, we investigate and analyze the effectiveness of cough analysis to accurately detect COVID-19. To do so, we distinguished positive COVID patients from healthy controls. After the gammatone cepstral coefficients (GTCC) and the Mel-frequency cepstral coefficients (MFCC) extraction, we have done the feature selection (FS) and classification with multiple machine learning algorithms. By combining all features and the 3-nearest neighbor (3NN) classifier, we achieved the highest classification results. The model is able to detect COVID-19 patients with accuracy and an f1-score above 98 percent. When applying FS, the higher accuracy and F1-score were achieved by the same model and the ReliefF algorithm, we lose 1 percent of accuracy by mapping only 12 features instead of the original 53.
EN
This paper introduces an early prognostic model for attempting to predict the severity of patients for ICU admission and detect the most significant features that affect the prediction process using clinical blood data. The proposed model predicts ICU admission for high-severity patients during the first two hours of hospital admission, which would help assist clinicians in decision-making and enable the efficient use of hospital resources. The Hunger Game search (HGS) meta-heuristic algorithm and a support vector machine (SVM) have been integrated to build the proposed prediction model. Furthermore, these have been used for selecting the most informative features from blood test data. Experiments have shown that using HGS for selecting features with the SVM classifier achieved excellent results as compared with four other meta-heuristic algorithms. The model that used the features that were selected by the HGS algorithm accomplished the topmost results (98.6 and 96.5%) for the best and mean accuracy, respectively, as compared to using all of the features that were selected by other popular optimization algorithms.
EN
Snoring is a typical and intuitive symptom of the obstructive sleep apnea hypopnea syndrome (OSAHS), which is a kind of sleep-related respiratory disorder having adverse effects on people’s lives. Detecting snoring sounds from the whole night recorded sounds is the first but the most important step for the snoring analysis of OSAHS. An automatic snoring detection system based on the wavelet packet transform (WPT) with an eXtreme Gradient Boosting (XGBoost) classifier is proposed in the paper, which recognizes snoring sounds from the enhanced episodes by the generalization subspace noise reduction algorithm. The feature selection technology based on correlation analysis is applied to select the most discriminative WPT features. The selected features yield a high sensitivity of 97.27% and a precision of 96.48% on the test set. The recognition performance demonstrates that WPT is effective in the analysis of snoring and non-snoring sounds, and the difference is exhibited much more comprehensively by sub-bands with smaller frequency ranges. The distribution of snoring sound is mainly on the middle and low frequency parts, there is also evident difference between snoring and non-snoring sounds on the high frequency part.
EN
Industrial Internet of Things (IIoT) is a rapidly growing field, where interconnected devices and systems are used to improve operational efficiency and productivity. However, the extensive connectivity and data exchange in the IIoT environment make it vulnerable to cyberattacks. Intrusion detection systems (IDS) are used to monitor IIoT networks and identify potential security breaches. Feature selection is an essential step in the IDS process, as it can reduce computational complexity and improve the accuracy of the system. In this research paper, we propose a hybrid feature selection approach for intrusion detection in the IIoT environment using Shapley values and a genetic algorithm-based automated preprocessing technique which has three automated steps including imputation, scaling and feature selection. Shapley values are used to evaluate the importance of features, while the genetic algorithm-based automated preprocessing technique optimizes feature selection. We evaluate the proposed approach on a publicly available dataset and compare its performance with existing state-of-the-art methods. The experimental results demonstrate that the proposed approach outperforms existing methods, achieving high accuracy, precision, recall, and F1-score. The proposed approach has the potential to enhance the performance of IDS in the IIoT environment and improve the overall security of critical industrial systems.
PL
Bezinwazyjny monitoring obciążenia (Non-IntrusiveLoad Monitoring - NILM) jest systemem wspomagającym decyzje ukierunkowane na zmniejszenie zużycia energii elektrycznej w gospodarstwach domowych i obiektach komercyjnych. Głównym zadaniem w tym systemie jest identyfikacja urządzeń elektrycznych wykorzystująca analizę zdarzeń występujących w instalacji domowej lub poprzez analizę jej stanu ustalonego. W przypadku analizy stanu ustalonego istotny jest dobór parametrów elektrycznych, które w jednoznaczny sposób opisują pracujące urządzenia. W pracy przedstawiono analizę szerokiego spektrum parametrów elektrycznych (prąd, napięcie, moce oraz harmoniczne tych sygnałów, THD, CF, PF) w celu wskazania, które z nich charakteryzują się największą stabilnością w obrębie danego urządzenia oraz jak największą separowalnością wobec innych urządzeń. Tak wybrane parametry w kolejnym kroku wykorzystano do identyfikacji pracujących urządzeń elektrycznych.
EN
The main objective of Non-Intrusive Load Monitoring (NILM) electrical appliance identification is to reduce residential and commercial electricity consumption. This identification can be based on the analysis of events occurring in the home system or by analyzing its steady state. In the case of steady-state analysis, it is necessary to select electrical parameters that uniquely describe the electrical equipment in operation. This paper presents an analysis of a wide spectrum of electrical parameters (current, voltage, powers and harmonics of these signals, THD, CF, PF) in order to indicate those that are characterized by the greatest consistency within a given device and the greatest separability from other devices. Parameters selected in this way were used in the next step to identify working electrical devices.
EN
The growing amount of collected and processed data means that there is a need to control access to these resources. Very often, this type of control is carried out on the basis of biometric analysis. The article proposes a new user authentication method based on a spatial analysis of the movement of the finger’s position. This movement creates a sequence of data that is registered by a motion recording device. The presented approach combines spatial analysis of the position of all fingers at the time. The proposed method is able to use the specific, often different movements of fingers of each user. The experimental results confirm the effectiveness of the method in biometric applications. In this paper, we also introduce an effective method of feature selection, based on the Hotelling T2 statistic. This approach allows selecting the best distinctive features of each object from a set of all objects in the database. It is possible thanks to the appropriate preparation of the input data.
EN
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
EN
For mitigating and managing risk failures due to Internet of Things (IoT) attacks, many Machine Learning (ML) and Deep Learning (DL) solutions have been used to detect attacks but mostly suffer from the problem of high dimensionality. The problem is even more acute for resource starved IoT nodes to work with high dimension data. Motivated by this problem, in the present work a priority based Gray Wolf Optimizer is proposed for effectively reducing the input feature vector of the dataset. At each iteration all the wolves leverage the relative importance of their leader wolves’ position vector for updating their own positions. Also, a new inclusive fitness function is hereby proposed which incorporates all the important quality metrics along with the accuracy measure. In a first, SVM is used to initialize the proposed PrGWO population and kNN is used as the fitness wrapper technique. The proposed approach is tested on NSL-KDD, DS2OS and BoTIoT datasets and the best accuracies are found to be 99.60%, 99.71% and 99.97% with number of features as 12,6 and 9 respectively which are better than most of the existing algorithms.
EN
There lacks an automated decision-making method for soil conditioning of EPBM with high accuracy and efficiency that is applicable to changeable geological conditions and takes drive parameters into consideration. A hybrid method of Gradient Boosting Decision Tree (GBDT) and random forest algorithm to make decisions on soil conditioning using foam is proposed in this paper to realize automated decision-making. Relevant parameters include decision parameters (geological parameters and drive parameters) and target parameters (dosage of foam). GBDT, an efficient algorithm based on decision tree, is used to determine the weights of geological parameters, forming 3 parameters sets. Then 3 decision-making models are established using random forest, an algorithm with high accuracy based on decision tree. The optimal model is obtained by Bayesian optimization. It proves that the model has obvious advantages in accuracy compared with other methods. The model can realize real-time decision-making with high accuracy under changeable geological conditions and reduce the experiment cost.
EN
This paper provides a comprehensive assessment of basic feature selection (FS) methods that have originated from nature-inspired (NI) meta-heuristics; two well-known filter-based FS methods are also included for comparison. The performances of the considered methods are compared on four balanced highdimensional and real-world text data sets regarding the accuracy, the number of selected features, and computation time. This study differs from existing studies in terms of the extent of experimental analyses that were performed under different circumstances where the classifier, feature model, and term-weighting scheme were different. The results of the extensive experiments indicated that basic NI algorithms produce slightly different results than filter-based methods for the text FS problem. However, filter-based methods often provide better results by using lower numbers of features and computation times.
EN
The paper considers the problem of increasing the generalization ability of classification systems by creating an ensemble of classifiers based on the CNN architecture. Different structures of the ensemble will be considered and compared. Deep learning fulfills an important role in the developed system. The numerical descriptors created in the last locally connected convolution layer of CNN flattened to the form of a vector, are subjected to a few different selection mechanisms. Each of them chooses the independent set of features, selected according to the applied assessment techniques. Their results are combined with three classifiers: softmax, support vector machine, and random forest of the decision tree. All of them do simultaneously the same classification task. Their results are integrated into the final verdict of the ensemble. Different forms of arrangement of the ensemble are considered and tested on the recognition of facial images. Two different databases are used in experiments. One was composed of 68 classes of greyscale images and the second of 276 classes of color images. The results of experiments have shown high improvement of class recognition resulting from the application of the properly designed ensemble.
EN
The purpose of this study is to develop a hybrid algorithm for feature selection and classification of masses in digital mammograms based on the Crow search algorithm (CSA) and Harris hawks optimization (HHO). The proposed CSAHHO algorithm finds the best features depending on their fitness value, which is determined by an artificial neural network. Using an artificial neural network and support vector machine classifiers, the best features determined by CSAHHO are utilized to classify masses in mammograms as benign or malignant. The performance of the suggested method is assessed using 651 mammograms. Experimental findings show that the proposed CSAHHO tends to be the best as compared to the original CSA and HHO algorithms when evaluated using ANN. It achieves an accuracy of 97.85% with a kappa value of 0.9569 and area under curve AZ = 0.982 ± 0.006. Furthermore, benchmark datasets are used to test the feasibility of the suggested approach and then compared with four state-of-the-art algorithms. The findings indicate that CSAHHO achieves high performance with the least amount of features and support to enhance breast cancer diagnosis.
19
Content available remote Diagnosis of Parkinson’s disease based on SHAP value feature selection
EN
To address the problem of high feature dimensionality of Parkinson’s disease medical data, this paper introduces SHapley Additive exPlanations (SHAP) value for feature selection of Parkinson’s disease medical dataset. This paper combines SHAP value with four classifiers, namely deep forest (gcForest), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and random forest (RF), respectively. Then this paper applies them to Parkinson’s disease diagnosis. First, the classifier is used to calculate the magnitude of contribution of SHAP value to the features, then the features with significant contribution in the classification task are selected, and then the data after feature selection is used as input to classify the Parkinson’s disease dataset for diagnosis using the classifier. The experimental results show that compared to Fscore, analysis of variance (Anova-F) and mutual information (MI) feature selection methods, the four models based on SHAP-value feature selection achieved good classification results. The SHAP-gcForest model combined with gcForest achieves classification accuracy of 91.78% and F1-score of 0.945 when 150 features are selected. The SHAP-LightGBM model combined with LightGBM achieves classification accuracy and F1-score of 91.62% and 0.945 when 50 features are selected, respectively. The classification effectiveness is second only to the SHAP-gcForest model, but the SHAP-LightGBM model is more computationally efficient than the SHAP-gcForest model. Finally, the effectiveness of the proposed method is verified by comparing it with the results of existing literature. The findings demonstrate that machine learning with SHAP value feature selection method has good classification performance in the diagnosis of Parkinson’s disease, and provides a reference for physicians in the diagnosis and prevention of Parkinson’s disease.
EN
Health problems, directly or indirectly caused by cardiac arrhythmias, may threaten life. The analysis of electrocardiogram (ECG) signals is an important diagnostic tool for assessing cardiac function in clinical research and disease diagnosis. Until today various Soft Computing methods and techniques have been proposed for the analysis of ECG signals. In this study, a new Ensemble Learning based method is proposed that automatically classifies the arrhythmic heartbeats of ECG signal according to the category-based and patient-based evaluation plan. A two-stage median filter was used to remove the baseline wander from the ECG signal. The locations of fiducial points of the ECG signal were determined using the developed QRS complex detection method. Within the scope of this study, four different feature extraction methods were utilized. A new feature extraction technique based on the Power Spectral Density has been proposed. Hybrid sub-feature sets were constructed using a Wrapper-based feature selection algorithm. A new method based on Ensemble Learning (EL) has been proposed by using a stacking algorithm. Multi-layer Perceptron (MLP) and Random Forest (RF) as base learners and Linear Regression (LR) as meta learner were utilized. Average performance values for the category-based arrhythmic heartbeat classification of the proposed new method based on Ensemble Learning; accuracy was 99,88%, sensitivity was 99,08%, specificity was 99,94% and positive predictivity (+P) was 99,08%. Average performance values for patient-based arrhythmic heartbeat classification were 99,72% accuracy, 99,30% sensitivity, 99,83% specificity and 99,30% positive predictivity (+P). Thus, it is concluded that the proposed method has higher performance results than similar studies in the literature.
first rewind previous Strona / 8 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.