Wyniki wyszukiwania - BazTech

1

Outlier detection in EEG signals

Duraj Agnieszka, Chomątek Łukasz

Przegląd Elektrotechniczny

|

2023

|

R. 99, nr 1

237--240

EN

In this paper, the topic of detection of outliers in EEG signals was discussed, which facilitates making decisions about the diagnosis of a patient based on this study. We used two methods to detect outliers: the support vector machine and the k nearest neighbors method. The experiments were performed on a publicly available dataset containing EEG test results for 500 patients. The obtained results showed that the methods we used allow for the outlier detection efficiency at the level of 93%.

PL

W niniejszej pracy podjęto temat detekcji wyjątków w sygnałach EEG, co pozwala na ułatwienie podejmowania decyzji co do diagnozy pacjenta na podstawie tego badania. Do detekcji wyjątków wykorzystaliśmy dwie metody: maszynę wektorów nośnych i metodę k najblizszych sąsiadów. Eksperymenty zostały przeprowadzone na ogólnodostępnym zbiorze danych zawieraj ącym wyniki badania EEG dla 500 pacjentów. Uzyskane wyniki pokazały, że u żyte przez nas metody pozwalają na uzyskanie skuteczności detekcji wyjątków na poziomie 93%.

2

A comparative study for outlier detection methods in high dimensional text data

Park Cheong Hee

Journal of Artificial Intelligence and Soft Computing Research

|

2023

|

Vol. 13, No. 1

5--17

EN

Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semisupervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.

3

Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams

Kim Taegong, Park Cheong Hee

Journal of Artificial Intelligence and Soft Computing Research

|

2022

|

Vol. 12, No. 1

19--27

EN

Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.

4

Improving coronary heart disease prediction by outlier elimination

Riyaz Lubna, Butt Muheet Ahmed, Zaman Majid

Applied Computer Science

|

2022

|

Vol. 18, no 1

70--88

EN

Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease. Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer percep-tron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier.

5

Modelling volatity of time series data containing outliers observations with ARCH effect

Duraj Agnieszka, Ludwicka Magdalena

Przegląd Elektrotechniczny

|

2019

|

R. 95, nr 1

37--40

EN

The subject of this work is a comparative analysis of selected models used to describe the volatility of time series including exceptions. This paper is focus on the the dynamic properties of the time series, generallyon the heterogeneity of conditional variance over time. This paper describes common approaches to detecting outliers, modelling and forecasting time series. Based on the researches performed by R. F. Engle, T. B. Bollerslev, J. Caiadoin, were examined selected ARIMA, ARCH and GARCH.An attention was paid to the ARCH effect in time series and its impact on the modelling volatility of financial time series, which contain outliers. The studies showed that the typical features of financial time series are the so-called grouped variances. Therefore, using ARIMA models for forecasting was insufficient, ARCH and GARCH modelsshowed good statistical properties for modelling time series data.

PL

Przedmiotem niniejszej pracy jest analiza porównawcza wybranych modeli służących do opisu zmienności szeregów czasowych, w tym wyjątków. Artykuł koncentruje się na dynamicznych właściwościach szeregów czasowych, na ogół na heterogeniczności warunkowej wariancji w czasie. W niniejszym artykule opisano powszechne metody wykrywania wartości odstających, modelowania i prognozowania szeregów czasowych. Na podstawie badań przeprowadzonych przez RF Engle, TB Bollerslev, J. Caiadoin, zbadano wybrane ARIMA, ARCH i GARCH. Zwrócono uwagę na efekt ARCH w szeregach czasowych i jego wpływ na zmienność modelowania finansowych szeregów czasowych, które zawierają odstające. Badania wykazały, że typowymi cechami finansowych szeregów czasowych są tak zwane pogrupowane wariancje. Dlatego wykorzystanie modeli ARIMA do prognozowania było niewystarczające, modele ARCH i GARCH prezentowały dobre właściwości statystyczne do modelowania danych szeregów czasowych.

6

Wykrywanie anomalii bazujące na wskazanych przykładach

Kwiatkowski W.

Przegląd Teleinformatyczny

|

2018

|

T. 6, Nr 1-2 (46)

3--21

PL

Rozpatrywany jest problem wykrywania anomalii na podstawie zarejestrowanych obserwacji zachowania systemu. Problem jest sformułowany jako zadanie rozpoznawania wzorców zachowania normalnego i zachowania nietypowego. Obydwa wzorce są określane przez wskazanie odpowiednich przykładów. Osobliwość rozwiązywanego zadania wynika z faktu, że zwykle liczebność przykładów jest dużo mniejsza od wymiaru wektora obserwacji. W artykule zostały przedstawione dwie metody detekcji anomalii bazujące na wyznaczaniu rzutów obserwacji na podprzestrzenie wzorców. Wyróżnikiem pierwszej metody jest wykorzystywanie odległości wektora obserwacji od podprzestrzeni wzorców. Druga metoda polega na przeniesieniu zadania rozpoznawania wzorców do podprzestrzeni wzorców.

EN

The paper considers the issue of anomalies detection based on registered observations of a system behavior. The problem is formulated as recognition of normal and anomalous behavior patterns. Both types of patterns are identified by indication of appropriate examples. A peculiarity of this task is that usually the number of examples is far lower than the dimension of vectors describing the observations. Two methods to solve this task have been presented in the paper, based on projecting the observations on the subspace of examples. The first method is based on a distance of the observation vector from the subspace of examples. The second method is based on transferring the pattern recognition problem to the subspace of examples.

7

Outlier detection in ocean wave measurements by using unsupervised data mining methods

Mahmoodi K., Ghassemi H.

Polish Maritime Research

|

2018

|

nr 1

44--50

EN

Outliers are considerably inconsistent and exceptional objects in the data set that do not adapt to expected normal condition. An outlier in wave measurements may be due to experimental and configuration errors, technical defects in equipment, variability in the measurement conditions, rare or unknown conditions such as tsunami, windstorm and etc. To improve the accuracy and reliability of an built ocean wave model, or to extract important and valuable information from collected wave data, detecting of outlying observations in wave measurements is very important. In this study, three typical outlier detection algorithms:Box-plot (BP), Local Distance-based Outlier Factor (LDOF), and Local Outlier Factor (LOF) methods are used to detect outliers in significant wave height (Hs) records. The historical wave data are taken from National Data Buoy Center (NDBC). Finally, those data points are considered as outlier identified by at least two methods which are presented and discussed. Then, Hs prediction has been modelled with and without the presence of outliers by using Regression trees (RTs).

8

Outlier mining using the DBSCAN algorithm

Nowak-Brzezińska A., Xięski T.

Journal of Applied Computer Science

|

2017

|

Vol. 25, nr 2

53--68

EN

This paper introduces an approach to outlier mining in the context of a real-world dataset containing information about the mobile transceivers operation. The goal of the paper is to analyze the influence of using different similarity measures and multiple values of input parameters for the densitybased clustering algorithm on the number of outliers discovered during the mining process. The results of the experiments are presented in section 4 in order to discuss the significance of the analyzed parameters.

9

Outlier mining in rule-based knowledge bases

Nowak-Brzezińska A.

Journal of Applied Computer Science

|

2017

|

Vol. 25, nr 2

7--27

EN

This paper introduces an approach to outlier mining in the context of rule-based knowledge bases. Rules in knowledge bases are a very specific type of data representation and it is necessary to analyze them carefully, especially when they differ from each other. The goal of the paper is to analyze the influence of using different similarity measures and clustering methods on the number of outliers discovered during the mining process. The results of the experiments are presented in Section 6 in order to discuss the significance of the analyzed parameters.

10

Outlier Detection by Interaction with Domain Experts

Krasuski A., Wasilewski P.

Fundamenta Informaticae

|

2013

|

Vol. 127, nr 1-4

529--544

EN

We present a method for improving the detection of outlying Fire Service's reports based on domain knowledge and dialogue with Fire & Rescue domain experts. The outlying report is considered as an element which is significantly different from the remaining data. We follow the position of Professor Andrzej Skowron that effective algorithms in data mining and knowledge discovery in big data should incorporate an interaction with domain experts or/and be domain oriented. Outliers are defined and searched on the basis of domain knowledge and dialogue with experts. We face the problem of reducing high data dimensionality without loosing specificity and real complexity of reported incidents. We solve this problem by introducing a knowledge based generalization level intermediating between analyzed data and experts domain knowledge. In our approach we use the Formal Concept Analysis methods for both generation of the appropriate categories from data and as tools supporting communication with domain experts. We conducted two experiments in finding two types of outliers in which outlier detection was supported by domain experts.

11

Exploratory data analysis for outlier detection in bioequivalence studies

Mogoş B.

Biocybernetics and Biomedical Engineering

|

2013

|

Vol. 33, no. 3

164--170

EN

Exploratory Data Analysis techniques are recognized as useful tools in outlier detection through visual representations. One limitation of this direction is the lack of studies concerning the reliability of the visual interpretation. In this paper we propose a method that combines an Exploratory Data Analysis technique, Andrews curves, with a statistical approach which can be applied to automatically classify the data. Using a simulation study we show that the results provided by the Andrews curves approach are markedly superior to the estimates distance test (the best proposed method for detecting outliers revealed in the literature) for the crossover bioequivalence design.

12

Comparison of outlier detection methods in biomedical data

Chromiński K., Tkacz M.

Journal of Medical Informatics & Technologies

|

2010

|

Vol. 16

89--94

EN

In this paper the use of outlier detection methods is discussed. This analysis is an introduction to the use of various methods of outlier detection in medical diagnoses (screening). The authors investigated the usefulness of selected outlier detection methods in the context of detection sensitivity, speed performance analysis and the difficulty of automating the performance analysis by using the test methods for outlier detection.

13

Mining Outliers in Correlated Subspaces for High Dimensional Data Sets

Leng J., Hong T-P.

Fundamenta Informaticae

|

2010

|

Vol. 98, nr 1

71-86

EN

Outlier detection in high dimensional data sets is a challenging data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Searching for all possible subspaces can lead to the problem called "the curse of dimensionality". Due to the existence of many irrelevant dimensions in high dimensional data sets, it is of paramount importance to eliminate the irrelevant or unimportant dimensions and identify interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques or subspace-based clustering methods. The dimension-growth subspace clustering techniques can find interesting subspaces in relatively lower dimension spaces, while dimension-reduction approaches try to group interesting subspaces with larger dimensions. This paper aims to investigate the possibility of detecting outliers in correlated subspaces. We present a novel approach by identifying outliers in the correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ a dimension-reduction method to find the correlated subspaces. Based on the correlated subspaces obtained, we introduce another criterion called "shape factor" to rank most important subspaces in the projected subspaces. Finally, outliers are distinguished from most important subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional data sets.

14

Pre-processing of the industrial data for data mining and modelling – application to the copper flash smelting process

Stanisławczyk A., Kusiak J.

Computer Methods in Materials Science

|

2009

|

Vol. 9, No. 3

369-373

EN

The paper presents the methodology of the pre-processing of the industrial data (measurements) collected from the copper flash smelting process (Kusiak, 2009). The application of data filtering and the cleaning method for the needs of the exploratory data analysis and modelling has been discussed. The influence of the appropriate data preparation on the quality of the developed model of the considered process has also been presented

PL

W pracy przedstawiono metodykę wstępnego opracowania danych pomiarowych otrzymanych w procesie zawiesinowego wytopu miedzi (Kusiak, 2009). Omówiono zastosowane metody filtrowania i czyszczenia danych na potrzeby eksploracyjnej analizy danych i modelowania. Przedstawiono również wpływ odpowiedniego przygotowania danych na jakość zbudowanego modelu procesu.