Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 83

Liczba wyników na stronie
first rewind previous Strona / 5 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  selekcja cech
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 5 next fast forward last
EN
Liquid-based cytology (LBC) is a widely used diagnostic tool for cervical cancer diagnosis. However, the accuracy and efficiency of LBC-based cervical cancer classification are still limited due to the lack of standardized, scalable, and objective cytological assessment protocols. To address these gaps, this study develops and evaluates a machine learning framework that integrates various feature extraction techniques, feature selection methods, and machine learning classifiers to improve cervical cancer detection. The results demonstrate that handcrafted and local binary pattern features achieve the best overall performance, with the SVM, gradient boosting and histogram-based gradient buffering reaching a 95.92% accuracy, highlighting the strength of combining morphological and texture descriptors to maximize their discriminative potential. Moreover, we provide a systematic comparison of different classification pipelines, offering insights into the feasibility of hybrid approaches, particularly in resource-constrained medical environments. The promising results obtained in this study highlight the potential impact of machine learning in modern medical diagnostics, providing a clinically relevant, highly accurate, and efficient classification method for LBC slides.
EN
Parkinson's disease (PD) is a progressive neurological disorder that affects millions worldwide, leading to motor dysfunction and significant reductions in quality of life. Early diagnosis is pivotal for initiating timely treatment and improving long-term patient outcomes, yet existing diagnostic methods, which often rely on clinical evaluations andimaging, are prone to delays and varying accuracy. This study presents an innovative, non-invasive approach to early PD detection through the analysis of handwriting patterns, offering a potential alternative to traditional diagnostic techniques. Leveraging a publicly available and meticulously normalized handwriting dataset, our approach applies advanced data processing methods to identify subtle neuromotor impairments associated with PD. Through the integration of robust feature selection processes and cutting-edge machine learning models, we achieved a high accuracy rate of 83.02%, highlighting the method’s reliability. The findings suggest that this approach could significantly enhance early PD detection, leading to more personalized therapeutic strategies that align with the stages of disease progression and potentially delaying the onset of severe symptoms.
PL
Choroba Parkinsona (PD) jest postępującą chorobą neurologiczną, która dotyka miliony ludzi na całym świecie, prowadząc do zaburzeń motorycznych i znacznego obniżenia jakości życia. Wczesna diagnoza ma kluczowe znaczenie dla rozpoczęcia leczenia w odpowiednim czasie i poprawy długoterminowych wyników leczenia pacjentów, jednak istniejące metody diagnostyczne, które często opierają się na ocenach klinicznych i obrazowaniu, są podatne na opóźnienia i różną dokładność. W niniejszym badaniu przedstawiono innowacyjne, nieinwazyjne podejście do wczesnego wykrywania PD poprzez analizę wzorców pisma ręcznego, które stanowi potencjalną alternatywę dla tradycyjnych technik diagnostycznych. Wykorzystując publicznie dostępny i skrupulatnie znormalizowany zbiór danych dotyczących pisma ręcznego, w naszym podejściu zastosowano zaawansowane metody przetwarzania danych w celu identyfikacji subtelnych zaburzeń neuromotorycznych związanych z PD. Dzięki integracji solidnych procesów selekcji cechi najnowocześniejszych modeli uczenia maszynowego osiągnęliśmy wysoką dokładność wynoszącą 83,02%, co podkreśla niezawodność tej metody. Wyniki sugerują, że podejście to może znacznie poprawić wczesne wykrywanie PD, prowadząc do bardziej spersonalizowanych strategii terapeutycznych dostosowanych do etapów postępu choroby i potencjalnie opóźniających wystąpienie poważnych objawów.
EN
This research aims to develop a new transfer function to transform continuous space to binary space using the Polar Lights Optimizer (PLO) algorithm for the feature selection problem. The PLO algorithm relies on simulating the behaviourof the aurora borealis to achieve a balance in exploring and exploiting binary space. A new transfer function called the tent-shaped transfer function has been incorporated into the algorithm to improve its performance. The proposed function was tested on seven datasets, and compared with traditional transfer functions such as the S-shaped function family and the V-shaped function family. The results showed that the tent-shaped transfer function outperforms in terms of feature selection accuracy and reduces the number of features more effectively, which enhances the algorithm's ability to improve performance and reduce computational complexity.
PL
Badania te mają na celu opracowanie nowej funkcji przenoszenia w celu przekształcenia przestrzeni ciągłej w przestrzeń binarną przy użyciu algorytmu Polar Lights Optimizer (PLO) dla problemu selekcji cech. Algorytm PLO opiera się na symulacji zachowania zorzy polarnej w celu osiągnięcia równowagi w eksploracji i wykorzystaniu przestrzeni binarnej. Nowa funkcja przenoszenia zwana funkcją przenoszenia w kształcienamiotu została włączona do algorytmu w celu poprawy jego wydajności. Proponowana funkcja została przetestowana na siedmiu zestawach danych iporównanaz tradycyjnymi funkcjami przenoszenia, takimi jak rodzina funkcji w kształcie litery S i rodzina funkcji w kształcie litery V. Wyniki pokazały, że funkcja przenoszenia w kształcie namiotu jest lepsza pod względem dokładności wyboru cech i skuteczniej zmniejsza liczbę cech, co zwiększa zdolność algorytmu do poprawy wydajności i zmniejszenia złożoności obliczeniowej.
EN
In this paper, we propose a new hybrid approach, which combines Generalized Normal Distribution Optimization Algorithm (GNDOA) and fuzzy C-Means clustering (FCM). It is designed for processing unsuperviseddatasets. This idea target list the development about conventional function option and clustering techniques. The proposed GNDOA-FCM uses normalized normal distribution concept along with FCM for more accurate and efficient clustering outputs leading to accelerated detection in survey region. Calinski-Harabasz index helps finding the number of clusters that has high compactness within each cluster and also apart from other clusters. The performance of the proposed hybrid GNDOA-FCM approach is tested extensively using different benchmark datasets. The results are compared with existing clustering methods using evaluation metrics like silhouette score & feature selection accuracy. Experimental results show that the proposed method can be flexibly set to obtain higher quality of clustering and is more effective than conventional techniques.
PL
W niniejszym artykule proponujemy nowe podejście hybrydowe, które łączy algorytm uogólnionej optymalizacji rozkładu normalnego (GNDOA) i klasteryzację rozmytych C-średnich(FCM). Zostało ono zaprojektowane do przetwarzania nienadzorowanych zbiorów danych. Pomysł ten ma na celu rozwój konwencjonalnych opcji funkcji i technik klasteryzacji. Proponowany GNDOA-FCMwykorzystuje koncepcję znormalizowanego rozkładu normalnego wraz z FCM w celu uzyskania dokładniejszych i wydajniejszych wyników klasteryzacji, co prowadzi do przyspieszenia wykrywania w badanym regionie. Wskaźnik Calińskiego-Harabasza pomaga znaleźć liczbę klastrów, które charakteryzują się wysoką zwartością w obrębie każdego klastra, a także w odniesieniu do innych klastrów. Wydajność proponowanego hybrydowego podejścia GNDOA-FCM została dokładnie przetestowana przy użyciu różnych zestawów danych benchmarkowych. Wyniki porównano z istniejącymi metodami klastrowania przy użyciu wskaźników oceny, takich jak wynik sylwetki i dokładność wyboru cech. Wyniki eksperymentów pokazują, że proponowana metoda może być elastycznie dostosowana w celu uzyskania wyższej jakości klastrowania i jest bardziej skuteczna niż konwencjonalne techniki.
EN
Mental arithmetic can be helpful for the evaluation of neurodevelopmental disorders arising from atypical development of the brain. We propose a novel explainable machine learning method for classifying mental arithmetic calculation tasks from resting brain states and good from bad calculations using Electroencephalography. Empirical mode decomposition features are extracted from intrinsic mode functions of the average signals of all trials. Most relevant features to the mental arithmetic tasks are ranked by a random forest-based recursive feature elimination method. These features identify the changes in frequency bands of the brain rhythms, such as delta, theta, and alpha, during mental tasks for the first time in literature. These unique explainable features are also used to identify brain areas such as frontal, temporal, and occipital lobes involved in mental arithmetic tasks. Moreover, our approach describes the memory regions and that bad calculations excite the brain areas, mostly related to emotions such as frustration and anxiety due to stressful mental arithmetic. Using a random forest classifier, beating the state-of-the-art, this method achieved classification accuracies of 99.30 % and 98.33 % for resting vs calculation and good vs bad calculation brain tasks, respectively. Also, our method outperformed the state of art in handling the inter-subject variability and achieved 98.17 ± 0.47 % and 97.19 ± 0.95 % classification accuracies for resting vs calculation and good vs bad calculation tasks, respectively.
PL
Bezinwazyjny monitoring obciążenia (Non-IntrusiveLoad Monitoring - NILM) jest systemem wspomagającym decyzje ukierunkowane na zmniejszenie zużycia energii elektrycznej w gospodarstwach domowych i obiektach komercyjnych. Głównym zadaniem w tym systemie jest identyfikacja urządzeń elektrycznych wykorzystująca analizę zdarzeń występujących w instalacji domowej lub poprzez analizę jej stanu ustalonego. W przypadku analizy stanu ustalonego istotny jest dobór parametrów elektrycznych, które w jednoznaczny sposób opisują pracujące urządzenia. W pracy przedstawiono analizę szerokiego spektrum parametrów elektrycznych (prąd, napięcie, moce oraz harmoniczne tych sygnałów, THD, CF, PF) w celu wskazania, które z nich charakteryzują się największą stabilnością w obrębie danego urządzenia oraz jak największą separowalnością wobec innych urządzeń. Tak wybrane parametry w kolejnym kroku wykorzystano do identyfikacji pracujących urządzeń elektrycznych.
EN
The main objective of Non-Intrusive Load Monitoring (NILM) electrical appliance identification is to reduce residential and commercial electricity consumption. This identification can be based on the analysis of events occurring in the home system or by analyzing its steady state. In the case of steady-state analysis, it is necessary to select electrical parameters that uniquely describe the electrical equipment in operation. This paper presents an analysis of a wide spectrum of electrical parameters (current, voltage, powers and harmonics of these signals, THD, CF, PF) in order to indicate those that are characterized by the greatest consistency within a given device and the greatest separability from other devices. Parameters selected in this way were used in the next step to identify working electrical devices.
EN
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
EN
Antioxidant proteins have been discovered closely associated with disease control due to its capability to eradicate excess free radicals. The accurate identification of antioxidant proteins is on the upsurge owing to their therapeutic significance. However, observing the rapid increases of this toxic disease in the human body, several machine learning algorithms have been applied and performed inadequately to identify antioxidant proteins. Therefore, measuring the effectiveness of antioxidant proteins on the human body, a reliable intelligent model is indispensable for the researchers. In this study, primary protein sequences are formulated using evolutionary and sequence-based numerical descriptors. Whereas, evolutionary features are collected using a bigram Position-specific scoring matrix, besides, K-space amino acid pair (KSAAP) and dipeptide composition are utilized to extract sequential information. Furthermore, in order to reduce the computational time and to eradicate irreverent and noisy features, the Sequential forward selection and Support vector machine (SFS-SVM) based ensemble approach is applied to select optimal features. At last, several distinct nature classification learning methods are applied to choose a suitable operational engine for our model. After evaluating the empirical results, SVM using optimal features achieved an accuracy of 97.54%, 93.71% using the training and independent dataset, respectively. It was found that our proposed model outperformed and reported the highest performance than the existing computational models. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development. The source code and all datasets are publicly available at https://github.com/salman-khan-mrd/Antioxident_proteins.
EN
The purpose of this study is to develop a hybrid algorithm for feature selection and classification of masses in digital mammograms based on the Crow search algorithm (CSA) and Harris hawks optimization (HHO). The proposed CSAHHO algorithm finds the best features depending on their fitness value, which is determined by an artificial neural network. Using an artificial neural network and support vector machine classifiers, the best features determined by CSAHHO are utilized to classify masses in mammograms as benign or malignant. The performance of the suggested method is assessed using 651 mammograms. Experimental findings show that the proposed CSAHHO tends to be the best as compared to the original CSA and HHO algorithms when evaluated using ANN. It achieves an accuracy of 97.85% with a kappa value of 0.9569 and area under curve AZ = 0.982 ± 0.006. Furthermore, benchmark datasets are used to test the feasibility of the suggested approach and then compared with four state-of-the-art algorithms. The findings indicate that CSAHHO achieves high performance with the least amount of features and support to enhance breast cancer diagnosis.
10
Content available remote Diagnosis of Parkinson’s disease based on SHAP value feature selection
EN
To address the problem of high feature dimensionality of Parkinson’s disease medical data, this paper introduces SHapley Additive exPlanations (SHAP) value for feature selection of Parkinson’s disease medical dataset. This paper combines SHAP value with four classifiers, namely deep forest (gcForest), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and random forest (RF), respectively. Then this paper applies them to Parkinson’s disease diagnosis. First, the classifier is used to calculate the magnitude of contribution of SHAP value to the features, then the features with significant contribution in the classification task are selected, and then the data after feature selection is used as input to classify the Parkinson’s disease dataset for diagnosis using the classifier. The experimental results show that compared to Fscore, analysis of variance (Anova-F) and mutual information (MI) feature selection methods, the four models based on SHAP-value feature selection achieved good classification results. The SHAP-gcForest model combined with gcForest achieves classification accuracy of 91.78% and F1-score of 0.945 when 150 features are selected. The SHAP-LightGBM model combined with LightGBM achieves classification accuracy and F1-score of 91.62% and 0.945 when 50 features are selected, respectively. The classification effectiveness is second only to the SHAP-gcForest model, but the SHAP-LightGBM model is more computationally efficient than the SHAP-gcForest model. Finally, the effectiveness of the proposed method is verified by comparing it with the results of existing literature. The findings demonstrate that machine learning with SHAP value feature selection method has good classification performance in the diagnosis of Parkinson’s disease, and provides a reference for physicians in the diagnosis and prevention of Parkinson’s disease.
EN
Health problems, directly or indirectly caused by cardiac arrhythmias, may threaten life. The analysis of electrocardiogram (ECG) signals is an important diagnostic tool for assessing cardiac function in clinical research and disease diagnosis. Until today various Soft Computing methods and techniques have been proposed for the analysis of ECG signals. In this study, a new Ensemble Learning based method is proposed that automatically classifies the arrhythmic heartbeats of ECG signal according to the category-based and patient-based evaluation plan. A two-stage median filter was used to remove the baseline wander from the ECG signal. The locations of fiducial points of the ECG signal were determined using the developed QRS complex detection method. Within the scope of this study, four different feature extraction methods were utilized. A new feature extraction technique based on the Power Spectral Density has been proposed. Hybrid sub-feature sets were constructed using a Wrapper-based feature selection algorithm. A new method based on Ensemble Learning (EL) has been proposed by using a stacking algorithm. Multi-layer Perceptron (MLP) and Random Forest (RF) as base learners and Linear Regression (LR) as meta learner were utilized. Average performance values for the category-based arrhythmic heartbeat classification of the proposed new method based on Ensemble Learning; accuracy was 99,88%, sensitivity was 99,08%, specificity was 99,94% and positive predictivity (+P) was 99,08%. Average performance values for patient-based arrhythmic heartbeat classification were 99,72% accuracy, 99,30% sensitivity, 99,83% specificity and 99,30% positive predictivity (+P). Thus, it is concluded that the proposed method has higher performance results than similar studies in the literature.
12
Content available A weighted wrapper approach to feature selection
EN
This paper considers feature selection as a problem of an aggregation of three state-of-the-art filtration methods: Pearson’s linear correlation coefficient, the ReliefF algorithm and decision trees. A new wrapper method is proposed which, on the basis of a fusion of the above approaches and the performance of a classifier, is capable of creating a distinct, ordered subset of attributes that is optimal based on the criterion of the highest classification accuracy obtainable by a convolutional neural network. The introduced feature selection uses a weighted ranking criterion. In order to evaluate the effectiveness of the solution, the idea is compared with sequential feature selection methods that are widely known and used wrapper approaches. Additionally, to emphasize the need for dimensionality reduction, the results obtained on all attributes are shown. The verification of the outcomes is presented in the classification tasks of repository data sets that are characterized by a high dimensionality. The presented conclusions confirm that it is worth seeking new solutions that are able to provide a better classification result while reducing the number of input features.
EN
Cardiovascular disease is the leading cause of death worldwide. The diagnosis is made by non-invasive methods, but it is far from being comfortable, rapid, and accessible to everyone. Speech analysis is an emerging non-invasive diagnostic tool, and a lot of researches have shown that it is efficient in speech recognition and in detecting Parkinson's disease, so can it be effective for differentiating between patients with cardiovascular disease and healthy people? This present work answers the question posed, by collecting a database of 75 people, 35 of whom suffering from cardiovascular diseases, and 40 are healthy. We took from each one three vocal recordings of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii… ..). By measuring dysphonia in speech, we were able to extract 26 features, with which we will train three types of classifiers: the k-near-neighbor, the support vectors machine classifier, and the naive Bayes classifier. The methods were tested for accuracy and stability, and we obtained 81% accuracy as the best result using the k-near-neighbor classifier.
14
Content available remote Feature assisted cervical cancer screening through DIC cell images
EN
The mortality rate of cervical cancer is increasing alarmingly. Conventional cytological methods are not always efficient to diagnose cancer at an early stage. Several label-free, quantitative screening approaches are emerging rapidly for fast and accurate detection of cervical cancer. Differential interference contrast (DIC) imaging is one of such label-free methods for the detection of cellular abnormality. The combination of DIC imaging and prediction algorithm enables the development of an efficient computer-aided diagnosis (CAD) system for cervical cancer detection at an early stage. In the present study, the DIC dataset is categorized into 2-classes (abnormal and normal) and 3-classes (normal, pre-cancer, and squamous cell carcinoma). After segmentation of the cells using the modified valley-based Otsu’s thresholding method, three classifiers, namely support vector machine (SVM), multilayer perceptron (MLP), and k-nearest neighbour (k-NN) are applied. Further, to improve the classification performances, principal component analysis (PCA) is applied for feature selection. The experimental results reveal that the SVM classifier has the greatest accuracy of 0.97 (2-class classification) and 0.90 (3-class classification).
EN
Most essential biomolecule found in the human body is a biomarker; with these biomarkers, the abnormal biological processes and disease states of each patient can be accurately determined. Nowadays, the biomarker applications are frequently applied during clinical trials to identify cancer patients. In this method, the major significance of miRNA biomarkers during liver cancer detection is analysed. For such analysis, a deep learning technique is introduced along with optimization algorithms. Six different filter-based approaches are considered for feature selection they are Chi-Squared (Chi2), Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), RelieF (RF) and RF-W. Two high ranked features from these selected features are extracted by the Modified Social Ski-Driver optimization (MSSO) algorithm. With that high ranked features, the liver cancer tissues are accurately detected by Sunflower Optimization-based deep neural network (DSFNN) approach. The analysis part concludes that a miRNA biomarker having a higher rank provide better cancer detection results than other low-ranked biomarkers. In this work, 10 different, clinically verified miRNA biomarkers are selected for this detection process. The data required for liver cancer detection is selected from NCBI-GEO database. The performance of this entire cancer detection process is evaluated by accuracy, sensitivity, precision, specificity, and Area under curve (AUC) metrics. Furthermore, we also determined that the usage of 10, 5, and 3 clinically verified miRNAs provide better cancer detection results than other miRNAs. Among all clinically verified miRNAs, the selected three biomarkers (hsa-mir-10b, hsa-let-7c, hsa-mir- 145) has attained higher recognition result. The performance result attained by the proposed DSFNN is compared with five different algorithms for both training and validation datasets.
EN
Electroencephalogram (EEG) is one of the most important signals for diagnosis of Autism Spectrum Disorder (ASD). There are different challenges such as feature selection and the existence of artifacts in EEG signals. This article aims to present a robust method for early diagnosis of ASD from EEG signal. The study population consists of 34 children with ASD between 3–12 years and 11 healthy children in the same ranges of age. The proposed approach uses linear and nonlinear features such as Power Spectrum, Wavelet Transform, Fast Fourier Transform (FFT), Fractal Dimension, Correlation Dimension, Lyapunov Exponent, Entropy, Detrended Fluctuation Analysis and Synchronization Likelihood for describing the EEG signal. In addition Density Based Clustering is utilized for artifact removal and robustness. Besides, features selection is applied based on different criterions such as Mutual Information (MI), Information Gain (IG), Minimum-Redundancy Maximum-Relevancy (mRmR) and Genetic Algorithm (GA). Finally, the K-Nearest-Neighbor (KNN) and Support Vector Machines (SVM) classifiers are used for final decision. As a result, the investigation indicates that the classification accuracy of the approach using SVM is 90.57% while for KNN it is 72.77%. Moreover, the sensitivity of the proposed method is 99.91% for SVM and 91.96% for KNN. Also, experiments show that DFA, LE, Entropy and SL features have considerable influence in promoting the classification accuracy.
PL
W artykule przedstawiono problematykę oceny aparatu głosu u osób z chorobami neurodegradacyjnymi. W ramach badań dokonano opisu sygnału akustycznego w oparciu o parametry wyodrębnione przy użyciu powszechnie wykorzystywanych metod analizy akustycznej. Następnie przeprowadzono wstępną ocenę przydatności wyekstrahowanych cech z zastosowaniem wybranych miar statystycznych, w kontekście możliwości ich wykorzystania w systemach ukierunkowanych na wczesne wykrycie tzw. stanów otępiennych.
EN
The paper presents the problem related to the evaluation of speech organ in the context of persons with neurodegenerative changes. As part of the research, the acoustic signal was described based on the parameters extracted using commonly used methods of acoustic analysis. Next, a preliminary assessment of the usefulness of the extracted features with the use of selected statistical measures was carried out, in the context of the possibility of their use in systems aimed at early detection of the so-called dementia.
EN
The radiological test is cost-effective, widely available, allows for the visualisation of large areas of the skeleton and can identify long bones potentially at risk for fractures in osteolysis sites. Therefore, radiology is often used in the early stages of multiple myeloma, in the detection and characterisation of complications, and in the assessment of the patient's response to treatment. The accuracy of this method can be improved through the use of appropriate algorithms of computer image processing and analysis. In the study, the feature vector based on humerus CR images was extracted. As a result of the analysis, 279 image descriptors were obtained. Hellwig's method in the selection process was applied. It found the set of feature combinations of the largest integral index of information capacity. To evaluate these combinations, 11 classifiers were built and tested. As a result, 2 feature sets were identified that provided the highest classification accuracy in combination with the K-NN classifier. The 9-NN classifier for the first combination (2 features) was used and 5-NN for the second one (3 features). The classification accuracy (depending on the quality index used) was as follows: overall classification accuracy – 93%, classification sensitivity – 92%, classification specificity – 96%, positive predictive value – 96% and negative predictive value – 93%. Results show that: (1) the use of humerus CR images may be useful in the detection of bone damages caused by multiple myeloma; (2) the Hellwig's method is effective in the feature selection of the analysed kind of images.
EN
Medical imaging technologies provide an increasing number of opportunities for disease prediction and prognosis. Specifically, imaging biomarkers can quantify the entire tumor phenotypes to enhance the prediction. Machine learning technology can be explored to mine and analyze these biomarkers and to establish predictive models for the clinical applications. Several studies have applied various machine learning methods to imaging biomark-ers based clinical predictions of different diseases. Here we seek to evaluate different machine learning methods in pediatric posterior fossa tumor prediction. We present a machine learning based magnetic resonance imaging biomarkers analysis framework for two kinds of pediatric posterior fossa tumors. In details, three feature extraction methods are used to obtain 300 imaging biomarkers. 10 feature selection methods and 11 classifiers are evaluated by the quantified predictive performance and stability, and importance consistency of features and the influence of the experimental factors are also analyzed. Our results demonstrate that the CFS feature selection method (accuracy: 83.85 5.51%, stability: [0.84, 0.06]) and SVM classifier (accuracy: 85.38 3.47%, RSD: 4.77%) show relatively better performance than others and should be preferred. Among all the biomarkers, 17 texture features seem to be more important. Multifactor analysis results indicate the choice of classifier accounts for the most contribution to the variability in performance (37.25%). The machine learning based framework is efficient for pediatric posterior fossa tumors biomarkers analysis and could provide valuable references and decision support for assisted clinical diagnosis.
EN
Early detection of breast cancer plays crucial role in planning and result of associated treatment. The purpose of this article is threefold: (i) to investigate whether or not clinical features obtained using routine blood analysis combined with anthropometric measurements can be utilized for envisaging breast cancer using predictive machine learning techniques; (ii) to explore the role of various machine learning components such as feature selection, data division protocols and classification to determine suitable biomarkers for breast cancer prediction; and (iii) to evaluate a recent database of clinical and anthropometric measurements acquired from normal individuals and individuals suffering from breast cancer. A database consisting of anthropometric and clinical attributes is used in the experiments. Various feature selection and statistical significance analysis methods are used to determine the relevance of various features. Furthermore, popular classifiers such as kernel based support vector machine (SVM), Naïve Bayesian, linear discriminant, quadratic discriminant, logistic regression, K-nearest neighbor (K-NN) and random forest were implemented and evaluated for breast cancer risk prediction using these features. Results of feature selection techniques indicate that among the nine features considered in this study, glucose, age and resistin are found to be most relevant and effective biomarkers for breast cancer prediction. Further, when these three features are used for classification, the medium K-NN classifier achieves the highest classification accuracy of 92.105% followed by medium Gaussian SVM which achieves classification accuracy of 83.684% under hold out data division protocol.
first rewind previous Strona / 5 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.