Wyniki wyszukiwania - BazTech

1

Fast attack detection method for imbalanced data in industrial cyber-physical systems

Huang Meng, Li Tao, Li Beibei, Zhang Nian, Huang Hanyuan

Journal of Artificial Intelligence and Soft Computing Research

|

2023

|

Vol. 13, No. 4

229--245

EN

Integrating industrial cyber-physical systems (ICPSs) with modern information technologies (5G, artificial intelligence, and big data analytics) has led to the development of industrial intelligence. Still, it has increased the vulnerability of such systems regarding cybersecurity. Traditional network intrusion detection methods for ICPSs are limited in identifying minority attack categories and suffer from high time complexity. To address these issues, this paper proposes a network intrusion detection scheme, which includes an information-theoretic hybrid feature selection method to reduce data dimensionality and the ALLKNN-LightGBM intrusion detection framework. Experimental results on three industrial datasets demonstrate that the proposed method outperforms four mainstream machine learning methods and other advanced intrusion detection techniques regarding accuracy, F-score, and run time complexity.

2

Using information on class interrelations to improve classification of multiclass imbalanced data: A new resampling algorithm

Janicka Małgorzata, Lango Mateusz, Stefanowski Jerzy

International Journal of Applied Mathematics and Computer Science

|

2019

|

Vol. 29, no. 4

769--781

EN

The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.

3

Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study

Lango Mateusz

Foundations of Computing and Decision Sciences

|

2019

|

Vol. 44, No. 2

151--178

EN

Sentiment classification is an important task which gained extensive attention both in academia and in industry. Many issues related to this task such as handling of negation or of sarcastic utterances were analyzed and accordingly addressed in previous works. However, the issue of class imbalance which often compromises the prediction capabilities of learning algorithms was scarcely studied. In this work, we aim to bridge the gap between imbalanced learning and sentiment analysis. An experimental study including twelve imbalanced learning preprocessing methods, four feature representations, and a dozen of datasets, is carried out in order to analyze the usefulness of imbalanced learning methods for sentiment classification. Moreover, the data difficulty factors - commonly studied in imbalanced learning - are investigated on sentiment corpora to evaluate the impact of class imbalance.

4

CCR: A combined cleaning and resampling algorithm for imbalanced data classification

Koziarski M., Woźniak M.

International Journal of Applied Mathematics and Computer Science

|

2017

|

Vol. 27, no. 4

727--736

EN

Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered.

5

Difficulty factors and preprocessing in imbalanced data sets: an experimental study on artificial data

Wojciechowski S., Wilk S.

Foundations of Computing and Decision Sciences

|

2017

|

Vol. 42, No. 2

149--176

EN

In this paper we describe results of an experimental study where we checked the impact of various difficulty factors in imbalanced data sets on the performance of selected classifiers applied alone or combined with several preprocessing methods. In the study we used artificial data sets in order to systematically check factors such as dimensionality, class imbalance ratio or distribution of specific types of examples (safe, borderline, rare and outliers) in the minority class. The results revealed that the latter factor was the most critical one and it exacerbated other factors (in particular class imbalance). The best classification performance was demonstrated by non-symbolic classifiers, particular by k-NN classifiers (with 1 or 3 neighbors – 1NN and 3NN, respectively) and by SVM. Moreover, they benefited from different preprocessing methods – SVM and 1NN worked best with undersampling, while oversampling was more beneficial for 3NN.

6

Analiza danych niezrównoważonych we wstępnej diagnostyce raka pęcherza moczowego

Piotrowska E., Stanisławski W.

Pomiary Automatyka Kontrola

|

2012

|

R. 58, nr 8

737-740

PL

Artykuł przedstawia wyniki rozważań dotyczących klasyfikacji danych niezrównoważonych w obrazach mikroskopowych preparatów cytologicznych. Do klasyfikacji wykorzystano algorytmy uczenia nadzorowanego jak: naiwny klasyfikator Bayesa, analiza dyskryminacyjna, drzewa decyzyjne oraz zaproponowany przez autorów algorytm klasyfikacji będący połączeniem zbiorów przybliżonych i metody k-najbliższych sąsiadów. Do analizy wykorzystano opracowane przez autorów narzędzie Rough Sets Analysis Toolbox (RSA Toolbox) - przybornik dla środowiska MATLAB. Wykorzystane obrazy mikroskopowe uzyskano w procesie diagnostyki nowotworu pęcherza moczowego badając metodą FISH odpowiednio przygotowane preparaty moczu.

EN

In the paper the results of imbalanced data classification based on microscope images are described. The images were acquired in the process of bladder cancer diagnosis using the FISH method. The conducted research were focused on the effectiveness of the initial cancer diagnosis using specimen radiation in a DAPI channel and supervised learning methods. The analyzed data set contains about 23,000 objects described by 212 morphometric features. Each object was classified to one of two classes: normal cells or cancers cells. Decisions about belonging objects to the corresponding classes were carried out by an expert. There were identified only 640 cancer cells in the analyzed data. Most of learning algorithms assume balance between classes. The class imbalance problem causes difficulties at a learning stage and reduces the predictive ability. Therefore, the classifier evaluation was performed using G-mean and F-value measures. The authors defined additional measure FMaxSen=sen2ospe which is the product of sensitivity and specificity coefficients. Use of the second power factor emphasizes the importance of sensitivity and allows searching the classifier with the maximum specificity at the maximum sensitivity. The analysis presented in the paper was performed with use of Rough Sets Analysis Toolbox (RSA Toolbox) for MATLAB implemented by the authors. The main part of the RSA Toolbox contains a module which supports the rough sets theory processing. Another part (RSAm module) is a wrapper for the proposed rough classification functions and others implemented in Matalab such as NaiveBayes, Discriminant Analysis, Decision Tree. The RSAm gives us possibility to use cross validation for measuring the classification accuracy. The RSAm also contains features reduction algorithms (correlation based feature selection, sequential feature selection, principal component analysis) as well as discretizations algorithms (EWD, CAIM, CACC). An important part of the RSAToolbox is implementation of distributed computations using Matlab Parallel Computing Toolbox and Distributed Computing Server.

7

Selektywny wybór przykładów w konstrukcji klasyfikatorów z niezrównoważonych danych

Stefanowski J.

Pomiary Automatyka Kontrola

|

2006

|

R. 52, nr 6 bis

65-67

PL

W artykule omawia się problemy automatycznego konstruowania klasyfikatorów, będących zbiorem reguł decyzyjnych, z niezrówno-ważonych danych, w których klasa obiektów, będących przedmiotem szczególnego zainteresowania, zawiera zdecydowanie mniej przykładów niż inne klasy. W celu polepszenia zdolności rozpoznawania przykładów z klasy mniejszościowej przedstawia się propozycje wykorzystania selektywnego wyboru przykładów z klasy większościowej przed fazą indukcji reguł. Podejście jest ocenione w eksperymentach porównawczych ze innymi metodami.

EN

This paper concerns problems of automatic learning rule based classifiers from imbalanced data, where the minority class of primary importance is underrepresented in comparison to majority classes. To improve recognition of the minority class, we present the new approach, where the rule induction is combined with the selective filtering phase that removes noisy and borderline majority class examples from the input data. This approach is evaluated in a comparative experimental study.