Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 13

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  walidacja krzyżowa
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
PL
W artykule przeprowadzono analizę zbioru danych za pomocą dwóch metod walidacji krzyżowej. Wykorzystano program RSES do identyfikacji kluczowych właściwości i relacji w zbiorze. Wyniki wykazują wpływ niektórych parametrów na potencjalną dokładność wyników.
EN
This article presents an analysis of a dataset using two cross-validation methods. The RSES program was employed to identify key properties and relationships within the dataset. The results indicate the impact of certain parameters on the potential accuracy of the outcomes.
PL
W artykule przeprowadzono analizę zbioru danych za pomocą dwóch metod walidacji krzyżowej. Wykorzystano program RSES do identyfikacji kluczowych właściwości i relacji w zbiorze. Wyniki wykazują wpływ niektórych parametrów na potencjalną dokładność wyników.
EN
This article presents an analysis of a dataset using two cross-validation methods. The RSES program was employed to identify key properties and relationships within the dataset. The results indicate the impact of certain parameters on the potential accuracy of the outcomes.
EN
This study compares two interpolation methods in the problem of a local GNSS/levelling (quasi) geoid modelling. It uses raw data, no global geopotential model is involved. The methods differ as to the complexity of modelling procedure and theoretical background, they are ordinary kriging/least-squares collocation with constant trend and inverse distance weighting (IDW). The comparison itself was done through leave-one-out and random (Monte Carlo) cross-validation. Ordinary kriging and IDW performance was tested with a local (using limited number of data) and global (using all available data) neighbourhoods using various planar covariance function models in case of kriging and various exponents (power parameter) in case of IDW. For the study area both methods assure an overall accuracy level, measured by mean absolute error, root mean square error and median absolute error, of less than 1 cm. Although the method of IDW is much simpler, a suitably selected parameters (also trend removal) may contribute to differences between methods that are virtually negligible (fraction of a millimetre).
EN
Antioxidant proteins have been discovered closely associated with disease control due to its capability to eradicate excess free radicals. The accurate identification of antioxidant proteins is on the upsurge owing to their therapeutic significance. However, observing the rapid increases of this toxic disease in the human body, several machine learning algorithms have been applied and performed inadequately to identify antioxidant proteins. Therefore, measuring the effectiveness of antioxidant proteins on the human body, a reliable intelligent model is indispensable for the researchers. In this study, primary protein sequences are formulated using evolutionary and sequence-based numerical descriptors. Whereas, evolutionary features are collected using a bigram Position-specific scoring matrix, besides, K-space amino acid pair (KSAAP) and dipeptide composition are utilized to extract sequential information. Furthermore, in order to reduce the computational time and to eradicate irreverent and noisy features, the Sequential forward selection and Support vector machine (SFS-SVM) based ensemble approach is applied to select optimal features. At last, several distinct nature classification learning methods are applied to choose a suitable operational engine for our model. After evaluating the empirical results, SVM using optimal features achieved an accuracy of 97.54%, 93.71% using the training and independent dataset, respectively. It was found that our proposed model outperformed and reported the highest performance than the existing computational models. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development. The source code and all datasets are publicly available at https://github.com/salman-khan-mrd/Antioxident_proteins.
EN
Machine learning algorithms have become popular in diabetes research, especially within the scope of glucose prediction from continuous glucose monitoring (CGM) data. We investigated the design choices in case-based reasoning (CBR) approach to glucose prediction from the CGM data. Design choices were made with regards to the distance function (city-block, Euclidean, cosine, Pearson’s correlation), number of observations, and adaptation of the solution (average, weighted average, linear regression) used in the model, and were evaluated using five-fold cross-validation to establish the impact of each choice to the prediction error. Our best models showed mean absolute error of 13.35 ± 3.04 mg/dL for prediction horizon PH = 30 min, and 30.23 ± 6.50 mg/dL for PH = 60 min. The experiments were performed using the data of 20 subjects recorded in free-living conditions. The problem of using small datasets to test blood glucose prediction models and assess the prediction error of the model was also addressed in this paper. We proposed for the first time the methodology for estimation of the impact of the number of subjects (i.e., dataset size) on the distribution of the prediction error of the model. The proposed methodology is based on Monte Carlo cross-validation with the systematic reduction of subjects in the dataset. The implementation of the methodology was used to gauge the change in the prediction error when the number of subjects in the dataset increases, and as such allows the projection on the prediction error in case the dataset is extended with new subjects.
EN
A geoid or quasigeoid model allows the integration of satellite measurements with ground levelling measurements in valid height systems. A precise quasigeoid model has been developed for the city of Krakow. One of the goals of the model construction was to provide a more detailed quasigeoid course than the one offered by the national model PL-geoid2011. Only four measurement points in the area of Kraków were used to build a national quasigeoid model. It can be assumed that due to the small number of points and their uneven distribution over the city area, the quasigeoid can be determined less accurately. It became the reason for developing a local quasigeoid model based on a larger number of evenly distributed points. The quasigeoid model was based on 66 evenly distributed points (from 2.5 km to 5.0 km apart) in the study area. The process of modelling the quasigeoid used height anomalies determined at these points on the basis of normal heights derived through levelling and ellipsoidal heights derived through GNSS surveys. Height anomalies coming from the global geopotential model EGM2008 served as a long-wavelength trend in those derived from surveys. Analyses showed that the developed height anomaly model fits the empirical data at the level of single millimetres – mean absolute difference 0.005 m. The developed local model QuasigeoidKR2019, similar to the national model PL-geoid2011, are models closely related to the reference and height systems in Poland. Such models are used to integrate GNSS and levelling observations. A comparison of the local QuasigeoidKR2019 and national PL-geoid2011 model was made for the reference frame PL-ETRF2000 and height datum PL-KRON86-NH. The comparison of the two models with respect to GNSS/levelling height anomalies shows a triple reduction in the values of individual quartiles and a mean absolute difference for the developed local model. These summary statistics clearly indicate that the accuracy of the local model for the city of Krakow is significantly higher than that of the national one.
7
Content available remote Decision tree for modeling survival data with competing risks
EN
This work considers decision tree for modeling survival data with competing risks. A Survival Classification and Regression Tree (SCART) technique is proposed for analysing survival data by modifying classification and regression tree (CART) algorithm to handle censored data for both regression and classification problems. Different performance measures for regression and classification tree are proposed. Model validation is done by two different cross-validation methods. Two real life data sets are analyzed for illustration. It is found that the proposed method improve upon the existing classical method for analysis of survival data with competing risks.
PL
Badania dotyczą alternatywnego podejścia do oceny jakości metod interpolacji niewielkich i zróżnicowanych zestawów danych. Podstawowa analiza statystyczna oparta na klasycznej walidacji krzyżowej nie zawsze daje jednoznaczne wnioski. W przypadku analizowanego zestawu danych (niezgodnego z rozkładem normalnym) trzy metody interpolacji zostały wybrane jako najlepsze (zgodnie z procedurą klasycznej walidacji krzyżowej). Niemniej jednak mapy powstałe na podstawie tych metod wyraźnie się od siebie różnią. To jest powód, dla którego dogłębna analiza statystyczna była konieczna. Zaproponowano alternatywne podejście do tego zagadnienia, które uwzględnia szersze spektrum parametrów opisujących badany zestaw danych. Głównym założeniem tej metodyki jest porównanie nie tylko odchylenia standardowego estymatora, ale również trzech dodatkowych parametrów. To powoduje, iż końcowa ocena jest znacznie dokładniejsza. Analizę wykonano za pomocą programu Surfer (Golden Software). Zapewnia on możliwość wykorzystania wielu metod interpolacji wraz z różnorakimi, regulowanymi parametrami.
EN
The research concerns an alternative approach to the evaluation of interpolation methods for mapping small and imbalanced data sets. A basic statistical analysis of the standard cross-validation procedure is not always conclusive. In the case of the investigated data set (which is inconsistent with normal distribution), three interpolation methods have been selected as the most reliable (according to standard cross-validation). However, maps resulting from the aforementioned methods clearly differ from each other. This is the reason why a comprehensive statistical analysis of the studied data is a necessity. We propose an alternative approach that evaluates a broadened scope of parameters describing the data distribution. The general idea of the methodology is to compare not only the standard deviation of the estimator but also three additional parameters to make the final assessment much more accurate. The analysis has been carried out with the use of Golden Software Surfer. It provides a wide range of interpolation methods and numerous adjustable parameters.
EN
The paper presents results of the transformation between two height systems Kronstadt’60 and Kronstadt’86 within the area of Krakow’s district, the latter system being nowadays a part of National Spatial Reference System in Poland. The transformation between the two height systems was carried out based on the well known and frequently applied in geodesy polynomial regression. Despite the fact it is well known and frequently applied it is rather seldom broader tested against the optimal degree of a polynomial function, goodness of fit and its predictive capabilities. In this study some statistical tests, measures and techniques helpful in analyzing a polynomial transformation function (and not only) have been used.
PL
W artykule przedstawiono wyniki transformacji wysokości miedzy układami Kronsztadt’60 i Kronsztadt’ 86 na obszarze powiatu krakowskiego. Ostatni z wymienionych układów jest obecnie częścią obowiązującego w Polsce Państwowego Systemu Odniesień Przestrzennych. Transformacja miedzy wymienionymi układami wysokości została wykonana w oparciu o dobrze znana i często stosowana w geodezji regresje wielomianowa. Mimo jej powszechności w zastosowaniach rzadziej można spotkać w literaturze jej szersza analizę pod względem optymalnego stopnia wielomianu, jakości dopasowania oraz zdolności predykcyjnych. W niniejszym opracowaniu wykorzystano różne metody w celu uzyskania statystycznej pewności co do poprawności i praktycznej użyteczności opisywanego modelu.
PL
Zaproponowano dwie metody projektowania sztucznych sieci neuronowych (SSN) do identyfikacji parametrów geometrycznych łuków. Jedna z metod to powszechnie stosowana metoda walidacji krzyżowej, w której poszukuje się minimum funkcji błędu. Druga to nowoczesna metoda MML - maksimum całkowitej wiarygodności, oparta na podejściu bayesowskim. W celu porównania obu metod przeanalizowano sieci kaskadowe, w których wejście stanowiło zawsze sześć podstawowych częstości drgań własnych i w każdym kroku kaskady otrzymane z uczenia sieci parametry geometryczne łuku. Wyjściem zawsze był tylko jeden parametr geometryczny. Otrzymane wyniki potwierdzają skuteczność metody MML, stosowanej zamiast metody walidacji krzyżowej bezpośrednio na całym zbiorze danych, bez wielokrotnych powtórzeń.
EN
Two methods were proposed for design of artificial neural networks (ANN) of the identification of the shape parameters for the arches. One of methods is applied universally method of the cross-validation in which we seek the minimum of the function of the mistake. Second is modem method MML Maximum of Marginal Likelihood taken from Bayesian approach. In the paper the design of ANN is related to searching of an optimal value of the number of neurons H in the hidden layer of network. It is illustrated on six numerical examples. In these problems the input vector always composed of the first six eigenfrequencies and made up the plus in every the step the cascade one the shape parameter. The obtained results enable to formulate a conclusion the criterion MLM can be used instead of the cross-validation method. This conclusion if of practical value, since it permits to design ANNs without formulation of a test set of patterns.
EN
Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.
PL
Metoda krzyżowej walidacji jest powszechnie stosowana do projektowania Sztucznych Sieci Neuronowych (SSN). W pracy projektowanie odnosi się do obliczania optymalnych wartości parametru regularyzacji lub liczby neuronów w warstwie ukrytej SSN. Metoda krzyżowej walidacji opiera się na obliczaniu wartości minimalnej krzywej walidacji, gdyż krzywa uczenia jest funkcją monotonicznie malejącą wymienionych parametrów regularyzacji. Celem zmiany kryterium projektowania SSN oparto się na krzywej maksymalnej wiarygodności, stosowanej w podejściu bayesowskim. W kryterium MML (Maximum Marginal Likelihood) oblicza się maksimum funkcji całkowitej wiarygodności lnCW(PR; L), gdzie CW jest prawdopodobieństwem całkowitej wiarygodności, a L liczebnością zbioru uczącego. Efektywność proponowanego podejścia wykazano na dwóch przykładach liczbowych. Otrzymane wyniki prowadzą do wniosku, że kryterium MML może być stosowane zamiast metody krzyżowej walidacji. Taki wniosek ma znaczenie praktyczne, zwłaszcza w przypadku małych zbiorów danych, gdyż umożliwia projektowanie SSN bez formułowania zbioru walidującego.
EN
The cross-validation method is commonly applied in the design of Artificial Neural Networks (ANNs). In the paper the design of ANN is related to searching of an optimal value of the regression parameter or the number of neurons in the hidden layer of network. The cross-validation error has a minimal value, vs. the training error curve which is monotonically decreasing. In order to change the design criterion, the marginal likelihood curve, taken from Bayesian approach, can be used. A corresponding formula for the plotting of the curve is shortly discussed. The criterion MML (Maximum Marginal Likelihood), applied to find optimal values of design parameters, is illustrated on two numerical examples. The obtained results enable us to formulate a conclusion that the criterion MLM can be used instead of the cross-validation method. This conclusion if of practical value (especially for small data sets), since it permits to design ANNs without formulation of a validation set of patterns.
13
EN
In this paper, we propose a numerical algorithm for filtering and robust signal differentiation. The numerical procedure is based on the solution of a simplified linear optimization problem. A compromise between smoothing and fidelity with respect to the measurable data is achieved by the computation of an optimal regularization parameter that minimizes the Generalized Cross Validation criterion (GCV). Simulation results are given to highlight the effectiveness of the proposed procedure.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.