Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 77

Liczba wyników na stronie
first rewind previous Strona / 4 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  selekcja cech
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 4 next fast forward last
PL
Bezinwazyjny monitoring obciążenia (Non-IntrusiveLoad Monitoring - NILM) jest systemem wspomagającym decyzje ukierunkowane na zmniejszenie zużycia energii elektrycznej w gospodarstwach domowych i obiektach komercyjnych. Głównym zadaniem w tym systemie jest identyfikacja urządzeń elektrycznych wykorzystująca analizę zdarzeń występujących w instalacji domowej lub poprzez analizę jej stanu ustalonego. W przypadku analizy stanu ustalonego istotny jest dobór parametrów elektrycznych, które w jednoznaczny sposób opisują pracujące urządzenia. W pracy przedstawiono analizę szerokiego spektrum parametrów elektrycznych (prąd, napięcie, moce oraz harmoniczne tych sygnałów, THD, CF, PF) w celu wskazania, które z nich charakteryzują się największą stabilnością w obrębie danego urządzenia oraz jak największą separowalnością wobec innych urządzeń. Tak wybrane parametry w kolejnym kroku wykorzystano do identyfikacji pracujących urządzeń elektrycznych.
EN
The main objective of Non-Intrusive Load Monitoring (NILM) electrical appliance identification is to reduce residential and commercial electricity consumption. This identification can be based on the analysis of events occurring in the home system or by analyzing its steady state. In the case of steady-state analysis, it is necessary to select electrical parameters that uniquely describe the electrical equipment in operation. This paper presents an analysis of a wide spectrum of electrical parameters (current, voltage, powers and harmonics of these signals, THD, CF, PF) in order to indicate those that are characterized by the greatest consistency within a given device and the greatest separability from other devices. Parameters selected in this way were used in the next step to identify working electrical devices.
EN
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
EN
Antioxidant proteins have been discovered closely associated with disease control due to its capability to eradicate excess free radicals. The accurate identification of antioxidant proteins is on the upsurge owing to their therapeutic significance. However, observing the rapid increases of this toxic disease in the human body, several machine learning algorithms have been applied and performed inadequately to identify antioxidant proteins. Therefore, measuring the effectiveness of antioxidant proteins on the human body, a reliable intelligent model is indispensable for the researchers. In this study, primary protein sequences are formulated using evolutionary and sequence-based numerical descriptors. Whereas, evolutionary features are collected using a bigram Position-specific scoring matrix, besides, K-space amino acid pair (KSAAP) and dipeptide composition are utilized to extract sequential information. Furthermore, in order to reduce the computational time and to eradicate irreverent and noisy features, the Sequential forward selection and Support vector machine (SFS-SVM) based ensemble approach is applied to select optimal features. At last, several distinct nature classification learning methods are applied to choose a suitable operational engine for our model. After evaluating the empirical results, SVM using optimal features achieved an accuracy of 97.54%, 93.71% using the training and independent dataset, respectively. It was found that our proposed model outperformed and reported the highest performance than the existing computational models. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development. The source code and all datasets are publicly available at https://github.com/salman-khan-mrd/Antioxident_proteins.
EN
The purpose of this study is to develop a hybrid algorithm for feature selection and classification of masses in digital mammograms based on the Crow search algorithm (CSA) and Harris hawks optimization (HHO). The proposed CSAHHO algorithm finds the best features depending on their fitness value, which is determined by an artificial neural network. Using an artificial neural network and support vector machine classifiers, the best features determined by CSAHHO are utilized to classify masses in mammograms as benign or malignant. The performance of the suggested method is assessed using 651 mammograms. Experimental findings show that the proposed CSAHHO tends to be the best as compared to the original CSA and HHO algorithms when evaluated using ANN. It achieves an accuracy of 97.85% with a kappa value of 0.9569 and area under curve AZ = 0.982 ± 0.006. Furthermore, benchmark datasets are used to test the feasibility of the suggested approach and then compared with four state-of-the-art algorithms. The findings indicate that CSAHHO achieves high performance with the least amount of features and support to enhance breast cancer diagnosis.
5
Content available remote Diagnosis of Parkinson’s disease based on SHAP value feature selection
EN
To address the problem of high feature dimensionality of Parkinson’s disease medical data, this paper introduces SHapley Additive exPlanations (SHAP) value for feature selection of Parkinson’s disease medical dataset. This paper combines SHAP value with four classifiers, namely deep forest (gcForest), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and random forest (RF), respectively. Then this paper applies them to Parkinson’s disease diagnosis. First, the classifier is used to calculate the magnitude of contribution of SHAP value to the features, then the features with significant contribution in the classification task are selected, and then the data after feature selection is used as input to classify the Parkinson’s disease dataset for diagnosis using the classifier. The experimental results show that compared to Fscore, analysis of variance (Anova-F) and mutual information (MI) feature selection methods, the four models based on SHAP-value feature selection achieved good classification results. The SHAP-gcForest model combined with gcForest achieves classification accuracy of 91.78% and F1-score of 0.945 when 150 features are selected. The SHAP-LightGBM model combined with LightGBM achieves classification accuracy and F1-score of 91.62% and 0.945 when 50 features are selected, respectively. The classification effectiveness is second only to the SHAP-gcForest model, but the SHAP-LightGBM model is more computationally efficient than the SHAP-gcForest model. Finally, the effectiveness of the proposed method is verified by comparing it with the results of existing literature. The findings demonstrate that machine learning with SHAP value feature selection method has good classification performance in the diagnosis of Parkinson’s disease, and provides a reference for physicians in the diagnosis and prevention of Parkinson’s disease.
EN
Health problems, directly or indirectly caused by cardiac arrhythmias, may threaten life. The analysis of electrocardiogram (ECG) signals is an important diagnostic tool for assessing cardiac function in clinical research and disease diagnosis. Until today various Soft Computing methods and techniques have been proposed for the analysis of ECG signals. In this study, a new Ensemble Learning based method is proposed that automatically classifies the arrhythmic heartbeats of ECG signal according to the category-based and patient-based evaluation plan. A two-stage median filter was used to remove the baseline wander from the ECG signal. The locations of fiducial points of the ECG signal were determined using the developed QRS complex detection method. Within the scope of this study, four different feature extraction methods were utilized. A new feature extraction technique based on the Power Spectral Density has been proposed. Hybrid sub-feature sets were constructed using a Wrapper-based feature selection algorithm. A new method based on Ensemble Learning (EL) has been proposed by using a stacking algorithm. Multi-layer Perceptron (MLP) and Random Forest (RF) as base learners and Linear Regression (LR) as meta learner were utilized. Average performance values for the category-based arrhythmic heartbeat classification of the proposed new method based on Ensemble Learning; accuracy was 99,88%, sensitivity was 99,08%, specificity was 99,94% and positive predictivity (+P) was 99,08%. Average performance values for patient-based arrhythmic heartbeat classification were 99,72% accuracy, 99,30% sensitivity, 99,83% specificity and 99,30% positive predictivity (+P). Thus, it is concluded that the proposed method has higher performance results than similar studies in the literature.
7
Content available A weighted wrapper approach to feature selection
EN
This paper considers feature selection as a problem of an aggregation of three state-of-the-art filtration methods: Pearson’s linear correlation coefficient, the ReliefF algorithm and decision trees. A new wrapper method is proposed which, on the basis of a fusion of the above approaches and the performance of a classifier, is capable of creating a distinct, ordered subset of attributes that is optimal based on the criterion of the highest classification accuracy obtainable by a convolutional neural network. The introduced feature selection uses a weighted ranking criterion. In order to evaluate the effectiveness of the solution, the idea is compared with sequential feature selection methods that are widely known and used wrapper approaches. Additionally, to emphasize the need for dimensionality reduction, the results obtained on all attributes are shown. The verification of the outcomes is presented in the classification tasks of repository data sets that are characterized by a high dimensionality. The presented conclusions confirm that it is worth seeking new solutions that are able to provide a better classification result while reducing the number of input features.
EN
Cardiovascular disease is the leading cause of death worldwide. The diagnosis is made by non-invasive methods, but it is far from being comfortable, rapid, and accessible to everyone. Speech analysis is an emerging non-invasive diagnostic tool, and a lot of researches have shown that it is efficient in speech recognition and in detecting Parkinson's disease, so can it be effective for differentiating between patients with cardiovascular disease and healthy people? This present work answers the question posed, by collecting a database of 75 people, 35 of whom suffering from cardiovascular diseases, and 40 are healthy. We took from each one three vocal recordings of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii… ..). By measuring dysphonia in speech, we were able to extract 26 features, with which we will train three types of classifiers: the k-near-neighbor, the support vectors machine classifier, and the naive Bayes classifier. The methods were tested for accuracy and stability, and we obtained 81% accuracy as the best result using the k-near-neighbor classifier.
9
Content available remote Feature assisted cervical cancer screening through DIC cell images
EN
The mortality rate of cervical cancer is increasing alarmingly. Conventional cytological methods are not always efficient to diagnose cancer at an early stage. Several label-free, quantitative screening approaches are emerging rapidly for fast and accurate detection of cervical cancer. Differential interference contrast (DIC) imaging is one of such label-free methods for the detection of cellular abnormality. The combination of DIC imaging and prediction algorithm enables the development of an efficient computer-aided diagnosis (CAD) system for cervical cancer detection at an early stage. In the present study, the DIC dataset is categorized into 2-classes (abnormal and normal) and 3-classes (normal, pre-cancer, and squamous cell carcinoma). After segmentation of the cells using the modified valley-based Otsu’s thresholding method, three classifiers, namely support vector machine (SVM), multilayer perceptron (MLP), and k-nearest neighbour (k-NN) are applied. Further, to improve the classification performances, principal component analysis (PCA) is applied for feature selection. The experimental results reveal that the SVM classifier has the greatest accuracy of 0.97 (2-class classification) and 0.90 (3-class classification).
EN
Most essential biomolecule found in the human body is a biomarker; with these biomarkers, the abnormal biological processes and disease states of each patient can be accurately determined. Nowadays, the biomarker applications are frequently applied during clinical trials to identify cancer patients. In this method, the major significance of miRNA biomarkers during liver cancer detection is analysed. For such analysis, a deep learning technique is introduced along with optimization algorithms. Six different filter-based approaches are considered for feature selection they are Chi-Squared (Chi2), Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), RelieF (RF) and RF-W. Two high ranked features from these selected features are extracted by the Modified Social Ski-Driver optimization (MSSO) algorithm. With that high ranked features, the liver cancer tissues are accurately detected by Sunflower Optimization-based deep neural network (DSFNN) approach. The analysis part concludes that a miRNA biomarker having a higher rank provide better cancer detection results than other low-ranked biomarkers. In this work, 10 different, clinically verified miRNA biomarkers are selected for this detection process. The data required for liver cancer detection is selected from NCBI-GEO database. The performance of this entire cancer detection process is evaluated by accuracy, sensitivity, precision, specificity, and Area under curve (AUC) metrics. Furthermore, we also determined that the usage of 10, 5, and 3 clinically verified miRNAs provide better cancer detection results than other miRNAs. Among all clinically verified miRNAs, the selected three biomarkers (hsa-mir-10b, hsa-let-7c, hsa-mir- 145) has attained higher recognition result. The performance result attained by the proposed DSFNN is compared with five different algorithms for both training and validation datasets.
EN
Electroencephalogram (EEG) is one of the most important signals for diagnosis of Autism Spectrum Disorder (ASD). There are different challenges such as feature selection and the existence of artifacts in EEG signals. This article aims to present a robust method for early diagnosis of ASD from EEG signal. The study population consists of 34 children with ASD between 3–12 years and 11 healthy children in the same ranges of age. The proposed approach uses linear and nonlinear features such as Power Spectrum, Wavelet Transform, Fast Fourier Transform (FFT), Fractal Dimension, Correlation Dimension, Lyapunov Exponent, Entropy, Detrended Fluctuation Analysis and Synchronization Likelihood for describing the EEG signal. In addition Density Based Clustering is utilized for artifact removal and robustness. Besides, features selection is applied based on different criterions such as Mutual Information (MI), Information Gain (IG), Minimum-Redundancy Maximum-Relevancy (mRmR) and Genetic Algorithm (GA). Finally, the K-Nearest-Neighbor (KNN) and Support Vector Machines (SVM) classifiers are used for final decision. As a result, the investigation indicates that the classification accuracy of the approach using SVM is 90.57% while for KNN it is 72.77%. Moreover, the sensitivity of the proposed method is 99.91% for SVM and 91.96% for KNN. Also, experiments show that DFA, LE, Entropy and SL features have considerable influence in promoting the classification accuracy.
PL
W artykule przedstawiono problematykę oceny aparatu głosu u osób z chorobami neurodegradacyjnymi. W ramach badań dokonano opisu sygnału akustycznego w oparciu o parametry wyodrębnione przy użyciu powszechnie wykorzystywanych metod analizy akustycznej. Następnie przeprowadzono wstępną ocenę przydatności wyekstrahowanych cech z zastosowaniem wybranych miar statystycznych, w kontekście możliwości ich wykorzystania w systemach ukierunkowanych na wczesne wykrycie tzw. stanów otępiennych.
EN
The paper presents the problem related to the evaluation of speech organ in the context of persons with neurodegenerative changes. As part of the research, the acoustic signal was described based on the parameters extracted using commonly used methods of acoustic analysis. Next, a preliminary assessment of the usefulness of the extracted features with the use of selected statistical measures was carried out, in the context of the possibility of their use in systems aimed at early detection of the so-called dementia.
EN
The radiological test is cost-effective, widely available, allows for the visualisation of large areas of the skeleton and can identify long bones potentially at risk for fractures in osteolysis sites. Therefore, radiology is often used in the early stages of multiple myeloma, in the detection and characterisation of complications, and in the assessment of the patient's response to treatment. The accuracy of this method can be improved through the use of appropriate algorithms of computer image processing and analysis. In the study, the feature vector based on humerus CR images was extracted. As a result of the analysis, 279 image descriptors were obtained. Hellwig's method in the selection process was applied. It found the set of feature combinations of the largest integral index of information capacity. To evaluate these combinations, 11 classifiers were built and tested. As a result, 2 feature sets were identified that provided the highest classification accuracy in combination with the K-NN classifier. The 9-NN classifier for the first combination (2 features) was used and 5-NN for the second one (3 features). The classification accuracy (depending on the quality index used) was as follows: overall classification accuracy – 93%, classification sensitivity – 92%, classification specificity – 96%, positive predictive value – 96% and negative predictive value – 93%. Results show that: (1) the use of humerus CR images may be useful in the detection of bone damages caused by multiple myeloma; (2) the Hellwig's method is effective in the feature selection of the analysed kind of images.
EN
Medical imaging technologies provide an increasing number of opportunities for disease prediction and prognosis. Specifically, imaging biomarkers can quantify the entire tumor phenotypes to enhance the prediction. Machine learning technology can be explored to mine and analyze these biomarkers and to establish predictive models for the clinical applications. Several studies have applied various machine learning methods to imaging biomark-ers based clinical predictions of different diseases. Here we seek to evaluate different machine learning methods in pediatric posterior fossa tumor prediction. We present a machine learning based magnetic resonance imaging biomarkers analysis framework for two kinds of pediatric posterior fossa tumors. In details, three feature extraction methods are used to obtain 300 imaging biomarkers. 10 feature selection methods and 11 classifiers are evaluated by the quantified predictive performance and stability, and importance consistency of features and the influence of the experimental factors are also analyzed. Our results demonstrate that the CFS feature selection method (accuracy: 83.85 5.51%, stability: [0.84, 0.06]) and SVM classifier (accuracy: 85.38 3.47%, RSD: 4.77%) show relatively better performance than others and should be preferred. Among all the biomarkers, 17 texture features seem to be more important. Multifactor analysis results indicate the choice of classifier accounts for the most contribution to the variability in performance (37.25%). The machine learning based framework is efficient for pediatric posterior fossa tumors biomarkers analysis and could provide valuable references and decision support for assisted clinical diagnosis.
EN
Early detection of breast cancer plays crucial role in planning and result of associated treatment. The purpose of this article is threefold: (i) to investigate whether or not clinical features obtained using routine blood analysis combined with anthropometric measurements can be utilized for envisaging breast cancer using predictive machine learning techniques; (ii) to explore the role of various machine learning components such as feature selection, data division protocols and classification to determine suitable biomarkers for breast cancer prediction; and (iii) to evaluate a recent database of clinical and anthropometric measurements acquired from normal individuals and individuals suffering from breast cancer. A database consisting of anthropometric and clinical attributes is used in the experiments. Various feature selection and statistical significance analysis methods are used to determine the relevance of various features. Furthermore, popular classifiers such as kernel based support vector machine (SVM), Naïve Bayesian, linear discriminant, quadratic discriminant, logistic regression, K-nearest neighbor (K-NN) and random forest were implemented and evaluated for breast cancer risk prediction using these features. Results of feature selection techniques indicate that among the nine features considered in this study, glucose, age and resistin are found to be most relevant and effective biomarkers for breast cancer prediction. Further, when these three features are used for classification, the medium K-NN classifier achieves the highest classification accuracy of 92.105% followed by medium Gaussian SVM which achieves classification accuracy of 83.684% under hold out data division protocol.
EN
Diabetes mellitus (DM) is one of the most widespread and rapidly growing diseases. With its advancement, DM-related complications are also increasing. We used characteristic features of toe photoplethysmogram for the detection of type-2 DM using support vector machine (SVM). We collected toe PPG signal, from 58 healthy and 83 type-2 DM subjects. From each PPG signal 37 different features were extracted for further classification. To improve the performance of SVM and reduce the noisy data we employed hybrid feature selection technique that reduces the feature set of 37 to 10 on the basis of majority voting. Using 10 selected features set, we gained an accuracy of 97.87%, sensitivity of 98.78% and specificity of 96.61%. Further for the validation of our method we need to do random population test, so that it can be used as a non-invasive screening tool. Photoplethysmogram is an economic, technically easy and completely non-invasive method for both physician and subject. With the high accuracy that we obtained, we hope that our work will help the clinician in screening of diabetes and adopting suitable treatment plan for preventing end organ damage.
EN
Modern cancer diagnostics is based heavily on cytological examinations. Unfortunately, visual inspection of cytological preparations under the microscope is a tedious and time-consuming process. Moreover, intra- and inter-observer variations in cytological diagnosis are substantial. Cytological diagnostics can be facilitated and objectified by using automatic image analysis and machine learning methods. Computerized systems usually preprocess cytological images, segment and detect nuclei, extract and select features, and finally classify the sample. In spite of the fact that a lot of different computerized methods and systems have already been proposed for cytology, they are still not routinely used because there is a need for improvement in their accuracy. This contribution focuses on computerized breast cancer classification. The task at hand is to classify cellular samples coming from fine-needle biopsy as either benign or malignant. For this purpose, we compare 5 methods of nuclei segmentation and detection, 4 methods of feature selection and 4 methods of classification. Nuclei detection and segmentation methods are compared with respect to recall and the F1 score based on the Jaccard index. Feature selection and classification methods are compared with respect to classification accuracy. Nevertheless, the main contribution of our study is to determine which features of nuclei indicate reliably the type of cancer. We also check whether the quality of nuclei segmentation/detection significantly affects the accuracy of cancer classification. It is verified using the test set that the average accuracy of cancer classification is around 76%. Spearman’s correlation and chi-square test allow us to determine significantly better features than the feature forward selection method.
EN
In the era of big data, solutions are desired that would be capable of efficient data reduction. This paper presents a summary of research on an algorithm for complementation of a Boolean function which is fundamental for logic synthesis and data mining. Successively, the existing problems and their proposed solutions are examined, including the analysis of current implementations of the algorithm. Then, methods to speed up the computation process and efficient parallel implementation of the algorithm are shown; they include optimization of data representation, recursive decomposition, merging, and removal of redundant data. Besides the discussion of computational complexity, the paper compares the processing times of the proposed solution with those for the well-known analysis and data mining systems. Although the presented idea is focused on searching for all possible solutions, it can be restricted to finding just those of the smallest size. Both approaches are of great application potential, including proving mathematical theorems, logic synthesis, especially index generation functions, or data processing and mining such as feature selection, data discretization, rule generation, etc. The problem considered is NP-hard, and it is easy to point to examples that are not solvable within the expected amount of time. However, the solution allows the barrier of computations to be moved one step further. For example, the unique algorithm can calculate, as the only one at the moment, all minimal sets of features for few standard benchmarks. Unlike many existing methods, the algorithm additionally works with undetermined values. The result of this research is an easily extendable experimental software that is the fastest among the tested solutions and the data mining systems.
PL
Hipotezy dotyczące schorzeń drogi wzrokowej formułowane są na podstawie oceny wzrokowych potencjałów wywołanych powstałych na wskutek stymulacji oka zewnętrznym źródłem światła. Proces diagnostyczny jest złożony i skomplikowany, dlatego wymaga od lekarza doświadczenia i dobrej percepcji. W niniejszym artykule opracowano system wspierający proces decyzyjny, który charakteryzuje się w zbiorze testującym 100,00% czułością w grupie 49 przypadków przy 14,38% prawdopodobieństwie fałszywego alarmu w grupie 153 przypadków diagnostycznych.
EN
Hypotheses regarding visual pathway disorders are formulated on the basis of visual evoked potentials arising as a result of stimulation of the eye by external light source. Diagnostic process is complex and complicated and therefore requires a doctor's experience and a good perception. This article provides a system supporting decision-making process, which is characterized in the testing set by a 100.00% sensitivity in 49 cases with 14.38% probability of false alarm in the group of 153 diagnostic cases.
20
Content available remote Metody przetwarzania sygnału EOG na użytek pomiaru stopnia zmęczenia osób
PL
Celem prac raportowanych w artykule było zbadanie możliwości wykorzystania uproszczonego rejestratora EOG do detekcji i parametryzacji mrugnięć. Wykryte i opisane współczynnikami mrugnięcia mają posłużyć w późniejszych badaniach, jako cechy umożliwiające określenie stopnia zmęczenia osoby. Opracowane metody detekcji oraz algorytmy wyznaczania parametrów mrugnięć przetestowano dla 26 osób. Skuteczność detekcji mrugnięć wynosi 91%. Zaproponowany algorytm umożliwia automatyczne opisanie pojedynczego mrugnięcia z wykorzystaniem 6 wiarygodnych współczynników.
EN
The purpose of the work reported in this article was to investigate the possibility of using a simplified EOG recorder for the detection and parameterization of blinks. The detected and reported batting coefficients are to be used in later studies as features to determine the degree of fatigue of a person. Developed detection methods and algorithms for determination blinking parameters, have been tested for 26 people. Blink detection is 91% effective. The proposed algorithm enables automatic single blink analysis using 6 reliable coefficients.
first rewind previous Strona / 4 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.