Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 20

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  wybór funkcji
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
With the advent of social media, the volume of photographs uploaded on the internet has increased exponentially. The task of efficiently recognizing and retrieving human facial images is inevitable and essential at this time. In this work, a feature selection approach for recognizing and retrieving human face images using hybrid cheetah optimization algorithm is proposed. The deep feature extraction from the images is done using deep convolutional neural networks. Hybrid cheetah optimization algorithm, an improvised version of cheetah optimization algorithm fused with genetic algorithm is used, to choose optimum features from the extracted deep features. The chosen features are used for finding the best-matching images from the image database. The image matching is performed by approximate nearest neighbor search for the query image over the image database and similar images are retrieved. By constructing a k-NN graph for the images, the efficiency of image retrieval is enhanced. The proposed system performance is evaluated against benchmark datasets such as LFW, MultiePie, ColorFERET, DigiFace-1M and CelebA. The evaluation results show that the proposed methodology is superior to various existing methodologies.
EN
There are two main approaches to tackle the challenge of finding the best filter or embedded feature selection (FS) algorithm: searching for the one best FS algorithm and creating an ensemble of all available FS algorithms. However, in practice, these two processes usually occur as part of a larger machine learning pipeline and not separately. We posit that, due to the influence of the filter FS on the embedded FS, one should aim to optimize both of them as a single FS pipeline rather than separately. We propose a meta-learning approach that automatically finds the best filter and embedded FS pipeline for a given dataset called FSPL. We demonstrate the performance of FSPL on n = 90 datasets, obtaining 0.496 accuracy for the optimal FS pipeline, revealing an improvement of up to 5.98 percent in the model’s accuracy compared to the second-best meta-learning method.
EN
Depression is one of the primary causes of global mental illnesses and an underlying reason for suicide. The user generated text content available in social media forums offers an opportunity to build automatic and reliable depression detection models. The core objective of this work is to select an optimal set of features that may help in classifying depressive contents posted on social media. To this end, a novel multi-objective feature selection technique (EFS-pBGSK) and machine learning algorithms are employed to train the proposed model. The novel feature selection technique incorporates a binary gaining-sharing knowledge-based optimization algorithm with population reduction (pBGSK) to obtain the optimized features from the original feature space. The extensive feature selector (EFS) is used to filter out the excessive features based on their ranking. Two text depression datasets collected from Twitter and Reddit forums are used for the evaluation of the proposed feature selection model. The experimentation is carried out using naive Bayes (NB) and support vector machine (SVM) classifiers for five different feature subset sizes (10, 50, 100, 300 and 500). The experimental outcome indicates that the proposed model can achieve superior performance scores. The top results are obtained using the SVM classifier for the SDD dataset with 0.962 accuracy, 0.929 F1 score, 0.0809 log-loss and 0.0717 mean absolute error (MAE). As a result, the optimal combination of features selected by the proposed hybrid model significantly improves the performance of the depression detection system.
EN
The paper considers the problem of increasing the generalization ability of classification systems by creating an ensemble of classifiers based on the CNN architecture. Different structures of the ensemble will be considered and compared. Deep learning fulfills an important role in the developed system. The numerical descriptors created in the last locally connected convolution layer of CNN flattened to the form of a vector, are subjected to a few different selection mechanisms. Each of them chooses the independent set of features, selected according to the applied assessment techniques. Their results are combined with three classifiers: softmax, support vector machine, and random forest of the decision tree. All of them do simultaneously the same classification task. Their results are integrated into the final verdict of the ensemble. Different forms of arrangement of the ensemble are considered and tested on the recognition of facial images. Two different databases are used in experiments. One was composed of 68 classes of greyscale images and the second of 276 classes of color images. The results of experiments have shown high improvement of class recognition resulting from the application of the properly designed ensemble.
EN
A continuous heart disease monitoring system is one of the significant applications specified by the Internet of Things (IoT). This goal might be achieved by combining sophisticated expert systems with extensive healthcare data on heart diseases. Several machine learning-based methods have recently been proven for predicting and diagnosing cardiac illness. However, these algorithms are unable to manage high-dimensional information due to the lack of a smart framework that can combine several sources to anticipate cardiac illness. The Fuzzy-Long Short Term Memory (LSTM) model is used in this work to present a unique IoT-enabled heart disease prediction method. The benchmark data for the experiment came from public sources and collected via wearable IoT devices. An improved Harris Hawks Optimization (HHO) called Population and Fitness-based HHO (PF-HHO) is utilized to select the best features, with the objective function of correlation maximization within the same class and correlation minimization among different classes. The scientific contributions of the health care monitoring system are depicted here that help to improve heart disease healthcare efficiency and also it can be reducing the death rate in the current world. The important section of this persistent healthcare mode is the real-world monitoring system. The simulation outcomes proved that the recommended approach is more successful at predicting heart illness than existing technologies.
EN
In recent years, solar energy forecasting has been increasingly embraced as a sustainablelow-energy solution to environmental awareness. It is a subject of interest to the scientificcommunity, and machine learning techniques have proven to be a powerful means toconstruct an automatic learning model for an accurate prediction. Along with the variousmachine learning and data mining utilities applied to solar energy prediction, the processof feature selection is becoming an ultimate requirement for improving model buildingefficiency. In this paper, we consider the feature selection (FS) approach potential. Weprovide a detailed taxonomy of various feature selection techniques and examine theirusability and ability to deal with a solar energy forecasting problem, given meteorologicaland geographical data. We focus on filter-based, wrapper-based, and embedded-basedfeature selection methods. We use the reduced number of selected features, stability, andregression accuracy and compare feature selection techniques. Moreover, the experimentalresults demonstrate how the feature selection methods studied can considerably improvethe prediction process and how the selected features vary by method, depending on thegiven data constraints.
EN
Recent research on Parkinson disease (PD) detection has shown that vocal disorders are linked to symptoms in 90% of the PD patients at early stages. Thus, there is an interest in applying vocal features to the computer-assisted diagnosis and remote monitoring of patients with PD at early stages. The contribution of this research is an increase of accuracy and a reduction of the number of selected vocal features in PD detection while using the newest and largest public dataset available. Whereas the number of features in this public dataset is 754, the number of selected features for classification ranges from 8 to 20 after using Wrappers feature subset selection. Four classifiers (k nearest neighbor, multi-layer perceptron, support vector machine and random forest) are applied to vocal-based PD detection. The proposed approach shows an accuracy of 94.7%, sensitivity of 98.4%, specificity of 92.68% and precision of 97.22%. The best resulting accuracy is obtained by using a support vector machine and it is higher than the one, which was reported on the first work to use the same dataset. In addition, the corresponding computational complexity is further reduced by selecting no more than 20 features.
8
Content available remote Czech parliament meeting recordings as ASR training data
EN
I present a way to leverage the stenographed recordings of the Czech parliament meetings for purposes of training a speech-to-text system. The article presents a method for scraping the data, acquiring word-level alignment and selecting reliable parts of the imprecise transcript. Finally, I present an ASR system trained on these and other data.
EN
Feature selection is the main step in classification systems, a procedure that selects a subset from original features. Feature selection is one of major challenges in text categorization. The high dimensionality of feature space increases the complexity of text categorization process, because it plays a key role in this process. This paper presents a novel feature selection method based on particle swarm optimization to improve the performance of text categorization. Particle swarm optimization inspired by social behavior of fish schooling or bird flocking. The complexity of the proposed method is very low due to application of a simple classifier. The performance of the proposed method is compared with performance of other methods on the Reuters-21578 data set. Experimental results display the superiority of the proposed method.
EN
The algorithms of pattern recognition were used for differentiation between two forms of Emery-Dreifuss muscular dystrophy (EDMD), i.e. autosomal-dominant laminopathy (AD-EDMD) and Xlinked emerynopathy (X-EDMD). A set of some matrix metalloproteinases (MMPs) and their tissue inhibitors (TIMPs) in serum of EDMD patients and healthy subjects were treated as features. In concluding MMPs and TIMPs levels are helpful to identifying the EDMD patients and the disease progress.
11
Content available Nonparametric methods of supervised classification
EN
Selected nonparametric methods of statistical pattern recognition are described. A part of them form modifications of the well known k-NN rule. To this group of the presented methods belong: a fuzzy k-NN rule, a pair-wise k-NN rule and a corrected k-NN rule. They can improve classification quality as compared with the standard k-NN rule. For the cases when these modifications would offer to large error rates an approach based on class areas determination is proposed. The idea of class areas can be also used for construction of the multistage classifier. A separate feature selection can be performed in each stage. The modifications of the k-NN rule and the methods based on determination class areas can be too slow in some applications, therefore algorithms for reference set reduction and condensation, for simple NN rule, are proposed. To construct fast classifiers it is worth to consider also a pair-wise linear classifiers. The presented idea can be used as in the case when the class pairs are linearly separable as well as in the contrary case.
EN
A proposed hybrid genetic algorithm (GA) approach for feature selection combined with support vector machines for regression (SVMR) was applied in this paper to optimise a data set of fibre properties and predict the yarn tenacity property. This hybrid approach was compared with a noisy model of SVMR that used all the data set of fibre properties as input in the prediction. The GA for feature selection was used as the preprocessing stage that aimed to find and select the best attributes or variables that most effect or are related to the prediction of yarn tenacity. The hybrid approach showed better predictive performance than the noisy model. However, the results indicated the suitability of GA for feature selection in the choice of the best fibre property attributes that give the preferred performance and high accuracy in the prediction of yarn tenacity.
PL
Zaproponowany system hybrydowy łączący algorytmy genetyczne z klasyfikatorem w postaci maszyny wektorów nośnych dla regresji (SVMR) został zastosowany dla zoptymalizowania zestawu danych obejmującego właściwości fizyczne włókien dla prognozowania właściwości wytrzymałościowych przędzy. W tym hybrydowym rozwiązaniu porównano zaproponowany model SVMR z modelem „zaszumionym”, w którym użyto pełny zestaw danych właściwości fizycznych włókien jako danych wejściowych w prognozowaniu. Algorytmy genetyczne w selekcji cech zostały użyte na etapie wstępnego przetwarzania, którego celem było znalezienie i wybranie najlepszych zmiennych, które najefektywniej są powiązane z przewidywaniem wytrzymałości przędzy. Hybrydowe rozwiązanie wykazało lepsze efekty przewidywania wytrzymałości przędzy w porównaniu z modelem „zaszumionym”. Jednakże wyniki badań wykazały, że do realizacji zadania polegającego na wyborze cech z selekcji najkorzystniejszych właściwości włókien bardzo przydatne są również algorytmy genetyczne, które umożliwiają uzyskanie wysokiej dokładności prognozowania wytrzymałości przędzy.
EN
Machine learning is being used in tasks of the regression and classification. In the field of classification a multidimensional of classified objects is one of essential problems. Classification is held on the basis of the value of features. These features are reflecting dimensions of the object subjected to the classification. In the article, applied algorithms were introduced selection of features which let reduce a problem “curses of dimensionality”.
EN
The study being presented is a continuation of the previous studies that consisted in the adaptation and use of the Levenshtein method in a signature recognition process. Three methods based on the normalized Levenshtein measure were taken into consideration. The studies included an analysis and selection of appropriate signature features, on the basis of which the authenticity of a signature was verified later. A statistical apparatus was used to perform a comprehensive analysis. The independence test ◈ was applied. It allowed determining the relationship between signature features and the error returned by the classifier.
EN
The two kinds of classifier based on the k-NN rule, the standard and the parallel version, were used for recognition of severity of ALS disease. In case of the second classifier version, feature selection was done separately for each pair of classes. The error rate, estimated by the leave one out method, was used as a criterion as for determination the optimum values of k's as well as for feature selection. All features selected in this manner were used in the standard and in the parallel classifier based on k-NN rule. Furthermore, only for the verification purpose, the linear classifier was applied. For this kind of classifier the error rates were calculated by use the training set also as a testing one. The linear classifier was trained by the error correction algorithm with a modified stop condition. The data set concerned with the healthy subjects and patients with amyotrophic lateral sclerosis (ALS). The set of several biomarkers such as erythropoietin, matrix metalloproteinases and their tissue inhibitors measured in serum and cerebrospinal fluid (CSF) were treated as features. It was shown that CSF biomarkers were very sensitive for the ALS progress.
EN
The problem raised in this article is the selection of the most important components from multispectral images for the purpose of skin tumor tissue detection. It occured that 21 channel spectrum makes it possible to separate healthy and tumor regions almost perfectly. The disadvantage of this method is the duration of single picture acquisition because this process requires to keep the device very stable. In the paper two approaches to the problem are presented: hill climbing strategy and some ranking methods.
EN
The paper provides a preview of some work in progress on the computer system to support breast cancer diagnosis. Diagnosis approach is based on microscope images of the FNB (Fine Needle Biopsy) and assumes distinguishing malignant from benign cases. Studies conducted focus on two different problems, the first concern the extraction of morphometric parameters of nuclei present in cytological images and the other concentrate on breast cancer nature classification using selected features. Studies in both areas are conducted in parallel. This work is devoted to the problem of feature selection from the set of determined features in order to maximize the accuracy of classification. Morphometric features are derived directly from a digital scans of breast fine needle biopsy slides and are computed for segmented nuclei. The quality of feature space is measured with four different classification methods. In order to illustrate the effectiveness of the approach, the automatic system of malignancy classification was applied on a set of medical images with promising results.
EN
An objective of the work is to demonstrate some difficulties with construction of a classifier based on the k-NN rule. The standard k-NN classifier and the parallel k-NN classifier have been chosen as the two most powerful approaches. This kind of classifiers has been applied to automatic recognition of diaphragm paralysis degree. The classifier construction consists in determination of the number of nearest neighbors, selection of features and estimation of the classification quality. Three classes of muscle pathology, including the control class, and five ventilatory parameters are taken into account. The data concern a model of the diaphragm pathology in a cat. The animals were forced to breathe in three different experimental situations: air, hypercapnic and hypoxic conditions. A separate classifier is constructed for each kind of the mentioned situations. The calculation of the misclassification rate is based on the leave one out and on the testing set method. Several computational experiments are suggested for the correct feature selection, the classifier type choice and the misclassification probability estimation.
EN
A main objective of the work was presentation of a new statistic approach to an analysis of respiration data. The breathing with intact and denervated diaphragm was compared. The respiration process was desciribed by three parameters: breathing frequency, tidal volume, and minute ventilation. Experimental data concerned a group of twelve anaesthetised cats. These data were analysed by a modification of the well-known k nearest neighbour rule (k-NN). It has been adopted from the statistical pattern recognition theory. The three ventilatory parameters were used to recognise whether we deal with the normal or the pathological case. Certain percentage of misclassifications must be taken into account. This misclassification rate is a measure how strong is the dependence between the ventilation parameters and preservation of the diaphragm innervation. The proposed method promises good differentiation of the two compared ways of respiration. It offers nearly five times smaller misclassification rate as compared with the standard k-NN rule.
EN
The paper deals with determination of the LPS factor influence and the significance of Na+ -contained and Na+ -free HEPES solution on a behavior of microglial cells cultured in vitro. A behavior of microglial cells is characterized by 14 parameters. The dependence between these parameters and a presence of LPS factor or natrium ions has been studied by use of the “k nearest neighbor” (k-NN) rule taken from the pattern recognition theory. The obtained computational results were verified by the Fisher test.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.