Wyniki wyszukiwania - BazTech

1

Analiza wybranych metod walidacji krzyżowej w programie RSES

Kołpacki Radosław

Studia i Materiały Informatyki Stosowanej

|

2024

|

T. 16, nr 1

PL

W artykule przeprowadzono analizę zbioru danych za pomocą dwóch metod walidacji krzyżowej. Wykorzystano program RSES do identyfikacji kluczowych właściwości i relacji w zbiorze. Wyniki wykazują wpływ niektórych parametrów na potencjalną dokładność wyników.

EN

This article presents an analysis of a dataset using two cross-validation methods. The RSES program was employed to identify key properties and relationships within the dataset. The results indicate the impact of certain parameters on the potential accuracy of the outcomes.

2

Analiza wybranych metod walidacji krzyżowej w programie RSES

Bethke Beata

Studia i Materiały Informatyki Stosowanej

|

2024

|

T. 16, nr 1

11--14

PL

W artykule przeprowadzono analizę zbioru danych za pomocą dwóch metod walidacji krzyżowej. Wykorzystano program RSES do identyfikacji kluczowych właściwości i relacji w zbiorze. Wyniki wykazują wpływ niektórych parametrów na potencjalną dokładność wyników.

EN

This article presents an analysis of a dataset using two cross-validation methods. The RSES program was employed to identify key properties and relationships within the dataset. The results indicate the impact of certain parameters on the potential accuracy of the outcomes.

3

A crossvalidation-based comparison of kriging and IDW in local GNSS/levelling quasigeoid modelling

Ligas Marcin, Lucki Blazej, Banasik Piotr

Reports on Geodesy and Geoinformatics

|

2022

|

Vol. 114

1--7

EN

This study compares two interpolation methods in the problem of a local GNSS/levelling (quasi) geoid modelling. It uses raw data, no global geopotential model is involved. The methods differ as to the complexity of modelling procedure and theoretical background, they are ordinary kriging/least-squares collocation with constant trend and inverse distance weighting (IDW). The comparison itself was done through leave-one-out and random (Monte Carlo) cross-validation. Ordinary kriging and IDW performance was tested with a local (using limited number of data) and global (using all available data) neighbourhoods using various planar covariance function models in case of kriging and various exponents (power parameter) in case of IDW. For the study area both methods assure an overall accuracy level, measured by mean absolute error, root mean square error and median absolute error, of less than 1 cm. Although the method of IDW is much simpler, a suitably selected parameters (also trend removal) may contribute to differences between methods that are virtually negligible (fraction of a millimetre).

4

Identification of antioxidant proteins using a discriminative intelligent model of k-space amino acid pairs based descriptors incorporating with ensemble feature selection

Ahmad Ashfaq, Akbar Shahid, Hayat Maqsood, Ali Farman, Khan Salman, Sohail Mohammad

Biocybernetics and Biomedical Engineering

|

2022

|

Vol. 42, no. 2

727--735

EN

Antioxidant proteins have been discovered closely associated with disease control due to its capability to eradicate excess free radicals. The accurate identification of antioxidant proteins is on the upsurge owing to their therapeutic significance. However, observing the rapid increases of this toxic disease in the human body, several machine learning algorithms have been applied and performed inadequately to identify antioxidant proteins. Therefore, measuring the effectiveness of antioxidant proteins on the human body, a reliable intelligent model is indispensable for the researchers. In this study, primary protein sequences are formulated using evolutionary and sequence-based numerical descriptors. Whereas, evolutionary features are collected using a bigram Position-specific scoring matrix, besides, K-space amino acid pair (KSAAP) and dipeptide composition are utilized to extract sequential information. Furthermore, in order to reduce the computational time and to eradicate irreverent and noisy features, the Sequential forward selection and Support vector machine (SFS-SVM) based ensemble approach is applied to select optimal features. At last, several distinct nature classification learning methods are applied to choose a suitable operational engine for our model. After evaluating the empirical results, SVM using optimal features achieved an accuracy of 97.54%, 93.71% using the training and independent dataset, respectively. It was found that our proposed model outperformed and reported the highest performance than the existing computational models. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development. The source code and all datasets are publicly available at https://github.com/salman-khan-mrd/Antioxident_proteins.

5

Data size considerations and hyperparameter choices in case-based reasoning approach to glucose prediction

Zulj Sara, Carvalho Paulo, Ribeiro Rogério T., Andrade Rita, Magjarevic Ratko

Biocybernetics and Biomedical Engineering

|

2021

|

Vol. 41, no. 2

733--745

EN

Machine learning algorithms have become popular in diabetes research, especially within the scope of glucose prediction from continuous glucose monitoring (CGM) data. We investigated the design choices in case-based reasoning (CBR) approach to glucose prediction from the CGM data. Design choices were made with regards to the distance function (city-block, Euclidean, cosine, Pearson’s correlation), number of observations, and adaptation of the solution (average, weighted average, linear regression) used in the model, and were evaluated using five-fold cross-validation to establish the impact of each choice to the prediction error. Our best models showed mean absolute error of 13.35 ± 3.04 mg/dL for prediction horizon PH = 30 min, and 30.23 ± 6.50 mg/dL for PH = 60 min. The experiments were performed using the data of 20 subjects recorded in free-living conditions. The problem of using small datasets to test blood glucose prediction models and assess the prediction error of the model was also addressed in this paper. We proposed for the first time the methodology for estimation of the impact of the number of subjects (i.e., dataset size) on the distribution of the prediction error of the model. The proposed methodology is based on Monte Carlo cross-validation with the systematic reduction of subjects in the dataset. The implementation of the methodology was used to gauge the change in the prediction error when the number of subjects in the dataset increases, and as such allows the projection on the prediction error in case the dataset is extended with new subjects.

6

Development of a precise local quasigeoid model for the city of Krakow - QuasigeoidKR2019

Banasik Piotr, Bujakowski Kazimierz, Kudrys Jacek, Ligas Marcin

Reports on Geodesy and Geoinformatics

|

2020

|

Vol. 109

25--31

EN

A geoid or quasigeoid model allows the integration of satellite measurements with ground levelling measurements in valid height systems. A precise quasigeoid model has been developed for the city of Krakow. One of the goals of the model construction was to provide a more detailed quasigeoid course than the one offered by the national model PL-geoid2011. Only four measurement points in the area of Kraków were used to build a national quasigeoid model. It can be assumed that due to the small number of points and their uneven distribution over the city area, the quasigeoid can be determined less accurately. It became the reason for developing a local quasigeoid model based on a larger number of evenly distributed points. The quasigeoid model was based on 66 evenly distributed points (from 2.5 km to 5.0 km apart) in the study area. The process of modelling the quasigeoid used height anomalies determined at these points on the basis of normal heights derived through levelling and ellipsoidal heights derived through GNSS surveys. Height anomalies coming from the global geopotential model EGM2008 served as a long-wavelength trend in those derived from surveys. Analyses showed that the developed height anomaly model fits the empirical data at the level of single millimetres – mean absolute difference 0.005 m. The developed local model QuasigeoidKR2019, similar to the national model PL-geoid2011, are models closely related to the reference and height systems in Poland. Such models are used to integrate GNSS and levelling observations. A comparison of the local QuasigeoidKR2019 and national PL-geoid2011 model was made for the reference frame PL-ETRF2000 and height datum PL-KRON86-NH. The comparison of the two models with respect to GNSS/levelling height anomalies shows a triple reduction in the values of individual quartiles and a mean absolute difference for the developed local model. These summary statistics clearly indicate that the accuracy of the local model for the city of Krakow is significantly higher than that of the national one.

7

Decision tree for modeling survival data with competing risks

Dauda Kazeem Adesina, Pradhan Biswabrata, Uma Shankar B., Mitra Sushmita

Biocybernetics and Biomedical Engineering

|

2019

|

Vol. 39, no. 3

697--708

EN

This work considers decision tree for modeling survival data with competing risks. A Survival Classification and Regression Tree (SCART) technique is proposed for analysing survival data by modifying classification and regression tree (CART) algorithm to handle censored data for both regression and classification problems. Different performance measures for regression and classification tree are proposed. Model validation is done by two different cross-validation methods. Two real life data sets are analyzed for illustration. It is found that the proposed method improve upon the existing classical method for analysis of survival data with competing risks.

8

Alternative Approach to Evaluating Interpolation Methods of Small and Imbalanced Data Sets

Gonet T., Gonet K.

Geomatics and Environmental Engineering

|

2017

|

Vol. 11, no. 3

49--65

PL

Badania dotyczą alternatywnego podejścia do oceny jakości metod interpolacji niewielkich i zróżnicowanych zestawów danych. Podstawowa analiza statystyczna oparta na klasycznej walidacji krzyżowej nie zawsze daje jednoznaczne wnioski. W przypadku analizowanego zestawu danych (niezgodnego z rozkładem normalnym) trzy metody interpolacji zostały wybrane jako najlepsze (zgodnie z procedurą klasycznej walidacji krzyżowej). Niemniej jednak mapy powstałe na podstawie tych metod wyraźnie się od siebie różnią. To jest powód, dla którego dogłębna analiza statystyczna była konieczna. Zaproponowano alternatywne podejście do tego zagadnienia, które uwzględnia szersze spektrum parametrów opisujących badany zestaw danych. Głównym założeniem tej metodyki jest porównanie nie tylko odchylenia standardowego estymatora, ale również trzech dodatkowych parametrów. To powoduje, iż końcowa ocena jest znacznie dokładniejsza. Analizę wykonano za pomocą programu Surfer (Golden Software). Zapewnia on możliwość wykorzystania wielu metod interpolacji wraz z różnorakimi, regulowanymi parametrami.

EN

The research concerns an alternative approach to the evaluation of interpolation methods for mapping small and imbalanced data sets. A basic statistical analysis of the standard cross-validation procedure is not always conclusive. In the case of the investigated data set (which is inconsistent with normal distribution), three interpolation methods have been selected as the most reliable (according to standard cross-validation). However, maps resulting from the aforementioned methods clearly differ from each other. This is the reason why a comprehensive statistical analysis of the studied data is a necessity. We propose an alternative approach that evaluates a broadened scope of parameters describing the data distribution. The general idea of the methodology is to compare not only the standard deviation of the estimator but also three additional parameters to make the final assessment much more accurate. The analysis has been carried out with the use of Golden Software Surfer. It provides a wide range of interpolation methods and numerous adjustable parameters.

9

Local height transformation through polynomial regression

Ligas M., Banasik P.

Geodesy and Cartography

|

2012

|

Vol. 61, no. 1

3--17

EN

The paper presents results of the transformation between two height systems Kronstadt’60 and Kronstadt’86 within the area of Krakow’s district, the latter system being nowadays a part of National Spatial Reference System in Poland. The transformation between the two height systems was carried out based on the well known and frequently applied in geodesy polynomial regression. Despite the fact it is well known and frequently applied it is rather seldom broader tested against the optimal degree of a polynomial function, goodness of fit and its predictive capabilities. In this study some statistical tests, measures and techniques helpful in analyzing a polynomial transformation function (and not only) have been used.

PL

W artykule przedstawiono wyniki transformacji wysokości miedzy układami Kronsztadt’60 i Kronsztadt’ 86 na obszarze powiatu krakowskiego. Ostatni z wymienionych układów jest obecnie częścią obowiązującego w Polsce Państwowego Systemu Odniesień Przestrzennych. Transformacja miedzy wymienionymi układami wysokości została wykonana w oparciu o dobrze znana i często stosowana w geodezji regresje wielomianowa. Mimo jej powszechności w zastosowaniach rzadziej można spotkać w literaturze jej szersza analizę pod względem optymalnego stopnia wielomianu, jakości dopasowania oraz zdolności predykcyjnych. W niniejszym opracowaniu wykorzystano różne metody w celu uzyskania statystycznej pewności co do poprawności i praktycznej użyteczności opisywanego modelu.

10

Projektowanie sztucznych sieci neuronowych w zagadnieniu identyfikacji parametrów geometrycznych łuków

Kłos M.

Zeszyty Naukowe Politechniki Rzeszowskiej. Budownictwo i Inżynieria Środowiska

|

2010

|

z.57[271], nr 2

21-28

PL

Zaproponowano dwie metody projektowania sztucznych sieci neuronowych (SSN) do identyfikacji parametrów geometrycznych łuków. Jedna z metod to powszechnie stosowana metoda walidacji krzyżowej, w której poszukuje się minimum funkcji błędu. Druga to nowoczesna metoda MML - maksimum całkowitej wiarygodności, oparta na podejściu bayesowskim. W celu porównania obu metod przeanalizowano sieci kaskadowe, w których wejście stanowiło zawsze sześć podstawowych częstości drgań własnych i w każdym kroku kaskady otrzymane z uczenia sieci parametry geometryczne łuku. Wyjściem zawsze był tylko jeden parametr geometryczny. Otrzymane wyniki potwierdzają skuteczność metody MML, stosowanej zamiast metody walidacji krzyżowej bezpośrednio na całym zbiorze danych, bez wielokrotnych powtórzeń.

EN

Two methods were proposed for design of artificial neural networks (ANN) of the identification of the shape parameters for the arches. One of methods is applied universally method of the cross-validation in which we seek the minimum of the function of the mistake. Second is modem method MML Maximum of Marginal Likelihood taken from Bayesian approach. In the paper the design of ANN is related to searching of an optimal value of the number of neurons H in the hidden layer of network. It is illustrated on six numerical examples. In these problems the input vector always composed of the first six eigenfrequencies and made up the plus in every the step the cascade one the shape parameter. The obtained results enable to formulate a conclusion the criterion MLM can be used instead of the cross-validation method. This conclusion if of practical value, since it permits to design ANNs without formulation of a test set of patterns.

11

Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

Klęsk P.

International Journal of Applied Mathematics and Computer Science

|

2010

|

Vol. 20, no 3

525-544

EN

Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.

12

Maksimum całkowitej wiarygodności zamiast walidacji krzyżowej w projektowaniu sztucznych sieci neuronowych

Waszczyszyn Z., Słoński M.

Zeszyty Naukowe Politechniki Rzeszowskiej. Budownictwo i Inżynieria Środowiska

|

2007

|

z. 45 {243}

173-185

PL

Metoda krzyżowej walidacji jest powszechnie stosowana do projektowania Sztucznych Sieci Neuronowych (SSN). W pracy projektowanie odnosi się do obliczania optymalnych wartości parametru regularyzacji lub liczby neuronów w warstwie ukrytej SSN. Metoda krzyżowej walidacji opiera się na obliczaniu wartości minimalnej krzywej walidacji, gdyż krzywa uczenia jest funkcją monotonicznie malejącą wymienionych parametrów regularyzacji. Celem zmiany kryterium projektowania SSN oparto się na krzywej maksymalnej wiarygodności, stosowanej w podejściu bayesowskim. W kryterium MML (Maximum Marginal Likelihood) oblicza się maksimum funkcji całkowitej wiarygodności lnCW(PR; L), gdzie CW jest prawdopodobieństwem całkowitej wiarygodności, a L liczebnością zbioru uczącego. Efektywność proponowanego podejścia wykazano na dwóch przykładach liczbowych. Otrzymane wyniki prowadzą do wniosku, że kryterium MML może być stosowane zamiast metody krzyżowej walidacji. Taki wniosek ma znaczenie praktyczne, zwłaszcza w przypadku małych zbiorów danych, gdyż umożliwia projektowanie SSN bez formułowania zbioru walidującego.

EN

The cross-validation method is commonly applied in the design of Artificial Neural Networks (ANNs). In the paper the design of ANN is related to searching of an optimal value of the regression parameter or the number of neurons in the hidden layer of network. The cross-validation error has a minimal value, vs. the training error curve which is monotonically decreasing. In order to change the design criterion, the marginal likelihood curve, taken from Bayesian approach, can be used. A corresponding formula for the plotting of the curve is shortly discussed. The criterion MML (Maximum Marginal Likelihood), applied to find optimal values of design parameters, is illustrated on two numerical examples. The obtained results enable us to formulate a conclusion that the criterion MLM can be used instead of the cross-validation method. This conclusion if of practical value (especially for small data sets), since it permits to design ANNs without formulation of a validation set of patterns.

13

A numerical procedure for filtering and efficient high-order signal differentiation

Ibrir S., Diop S.

International Journal of Applied Mathematics and Computer Science

|

2004

|

Vol. 14, no 2

201-208

EN

In this paper, we propose a numerical algorithm for filtering and robust signal differentiation. The numerical procedure is based on the solution of a simplified linear optimization problem. A compromise between smoothing and fidelity with respect to the measurable data is achieved by the computation of an optimal regularization parameter that minimizes the Generalized Cross Validation criterion (GCV). Simulation results are given to highlight the effectiveness of the proposed procedure.