Wyniki wyszukiwania - BazTech

1

Methods of strong reduction and edition of a reference set for the nearest neighbour rule

Raniszewski M.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2010

|

z. 122

37-46

EN

The article summarises a doctoral dissertation proposing new methods of a reference set reduction and edition for the Nearest Neighbour Rule (NN).The presented methods are designed to accelerate NN and to improve its classification quality. The algorithms use the concept of the object representativeness. The obtained results were compared with the results provided by well-known and popular reduction and editing procedures.

PL

W artykule zaprezentowano tezy i podstawowe wyniki rozprawy doktorskiej dotyczącej nowych metod redukcji i edycji zbioru odniesienia dla reguły typu najbliszy sąsiad (NN). Przedstawione metody mają na celu przyspieszenie działania reguły NN i poprawę jej jakości klasyfikacji. Zaprezentowane algorytmy w większości wykorzystują pojęcie reprezentatywności obiektu. Wyniki ich działania zostały porównane z wynikami działania innych popularnych algorytmów redukcji i edycji.

2

Klasyfikacja danych algorytmy redukcji i edycji zbiorów wykorzystujące miarę reprezentatywności

Raniszewski M.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2010

|

z. 121

463-486

PL

Klasyfikacja danych to podejmowanie decyzji na podstawie informacji, które te dane przenoszą (tzw. cech danych). Prawidłowa i szybka klasyfikacja zależy od prawidłowego przygotowania zbioru danych, jak i doboru odpowiedniego algorytmu klasyfikacji. Jednym z takich algorytmów jest popularny algorytm najbliższego sąsiada (NN). Jego zaletami są prostota, intuicyjność i szerokie spektrum zastosowań. Jego wadą są duże wymagania pamięciowe i spadek szybkości działania dla ogromnych zbiorów danych. Algorytmy redukcji usuwają znaczną część elementów ze zbioru danych, co znacząco przyspiesza działanie algorytmu NN, jednocześnie pozostawiając te, na podstawie których nadal można z zadawalającą jakością klasyfikować dane. Algorytmy edycji oczyszczają zbiór danych z nadmiarowych i błędnych elementów. W artykule zaprezentowane zostaną algorytm redukcji i algorytm edycji zbiorów danych, obydwa wykorzystujące miarę reprezentatywności. Testy przeprowadzono na kilku dobrze znanych w literaturze zbiorach danych różnej wielkości. Otrzymane wyniki są obiecujące. Zestawiono je z wynikami innych popularnych algorytmów redukcji i edycji.

EN

In data classification we make decision based on data features. Proper and fast classification depends on a Preparation of a data set and a selection of a suitable classification algorithm. One of these algorithms is popular Nearest Neighbor Rule (NN). Its advantages are simplicity, intuitiveness and wide rangę of applications. Its disadvantages are large memory requirements and decrease in speed for large data sets. Reduction algorithms remove much of data, which significantly speeds up NN. Simultaneously, they leave that data on the basis of which we can still make decisions with an acceptable classification quality. Editing algorithms remove redundant and atypical data from a data set. In this paper new reduction and editing algorithms, both using the representative measure, are presented. Tests were performed on several well-known in the literature data sets of different sizes. The results are promising. They were compared with the results of other popular reduction and editing procedures.

3

Fast reduction of large dataset for nearest neighbor classifier

Raniszewski M.

Journal of Medical Informatics & Technologies

|

2010

|

Vol. 16

111--116

EN

Accurate and fast classification of large data obtained from medical images is very important. Proper images (data) processing results to construct a classifier, which supports the work of doctors and can solve many medical problems. Unfortunately, Nearest Neighbor classifiers become inefficient and slow for large datasets. A dataset reduction is one of the most popular solution to this problem, but the large size of a dataset causes long time of a reduction phase for reduction algorithms. A simple method to overcome the large dataset reduction problem is a dataset division into smaller subsets. In this paper five different methods of large dataset division are considered. The received subsets are reduced by using an algorithm based on representative measure. The reduced subsets are combined to form the reduced dataset. The experiments were performed on a large (almost 82 000 samples) two–class dataset dating from ultrasound images of certain 3D objects found in a human body.

4

Bubble algorithm for the reduction of reference

Sierszeń A.

Journal of Medical Informatics & Technologies

|

2010

|

Vol. 16

117--123

5

The edited nearest neighbor rule based on the reduced reference set and the consistency criterion

Raniszewski M.

Biocybernetics and Biomedical Engineering

|

2010

|

Vol. 30, no. 1

31-40

EN

In this paper a new editing procedure for the Nearest Neighbor Rule (NN) is presented. The representativeness measure is introduced and used to choose the most representative samples of the classes. These samples constitute a reduced reference set. An edited reference set is created from all the training set samples (including samples from the reduced set), which are correctly classified by the NN rule operating with the reduced set. The performance of the presented method is evaluated and compared with five other well-known editing techniques, on five medical datasets.

6

Network behavior-analysis systems with the use of learning set and decision rules based on distance

Sierszeń A., Sturgulewski Ł., Dubel M., Marciniak T., Wójciński A.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2010

|

T. 14, z. 3/1

383-394

EN

Network Behavior Analysis is an ability to identify traffic patterns which do not occur during normal operation of a network. In other words, it is an attempt to identify irregularities in a network, an attempt which goes beyond simple settings concerning exceeding parameters for traffic of a given type. In the article the authors present a concept of using Pattern Recognition, i.e. a learning set and decision rules based on neighbor rule.

PL

Sieciowa analiza behawioralna jest zdolnością do identyfikacji wzorców ruchu, który nie pojawia się podczas normalnej pracy sieci. Innymi słowy, jest to próba identyfikacji nieregularności w sieci wykraczająca ponad proste ustawienia dotyczące przekroczenia parametrów dla danego typu ruchu. W artykule przedstawiona została idea wykorzystania rozpoznawania obiektów, tzn. z użyciem zbioru uczącego oraz reguł decyzyjnych bazujących na zasadzie sąsiedztwa.

7

Reduction of reference set with the method of cutting hyperplanes

Sierszeń A.

Journal of Medical Informatics & Technologies

|

2009

|

Vol. 13

215--220

EN

Reduction of this type may help to solve one of the greatest problems in pattern recognition, i.e. the compromise between the time of making a decision and its correctness. In the analysis of biomedical data, classification time is less important than certainty that classification is correct, i.e. that reliability of classification is accepted by the algorithm’s operator. It is usually possible to reduce the number of wrong decisions, using a more complex recognition algorithm and, as a consequence, increasing classification time. However, with a large quantity of data, this time may be considerably reduced by condensation of a set. Condensation of a set presented in this article is incremental, i.e. formation of the condensed reference set begins from a set containing one element. In each step, the size of the set is increased with one object. This algorithm consists in dividing the feature space with hyperplanes determined with pairs of the mutually furthest points. The hyperplanes are orthogonal to segments linking pairs of the mutually furthest points and they go through their centre.

8

Some Symmetry Based Classifiers

Saha S., Bandyopadhyay S.

Fundamenta Informaticae

|

2009

|

Vol. 90, nr 1-2

107-123

EN

In this paper, a novel point symmetry based pattern classifier (PSC) is proposed. A recently developed point symmetry based distance is utilized to determine the amount of point symmetry of a particular test pattern with respect to a class prototype. Kd-tree based nearest neighbor search is used for reducing the complexity of point symmetry distance computation. The proposed point symmetry based classifier is well-suited for classifying data sets having point symmetric classes, irrespective of any convexity, overlap or size. In order to classify data sets having line symmetry property, a line symmetry based classifier (LSC) along the lines of PSC is thereafter proposed in this paper. To measure the total amount of line symmetry of a particular point in a class, a new definition of line symmetry based distance is also provided. Proposed LSC preserves the advantages of PSC. The performance of PSC and LSC are demonstrated in classifying fourteen artificial and real-life data sets of varying complexities. For the purpose of comparison, k-NN classifier and the well-known support vector machine (SVM) based classifiers are executed on the data sets used here for the experiments. Statistical analysis, ANOVA, is also performed to compare the performance of these classification techniques.

9

Reduction of large reference sets with modified Chang's algorithm

Sierszeń A.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2009

|

T. 13, z. 3/1

1009-1019

EN

The advantage of the Chang's algorithm is a considerable reduction of the reference set. Its drawback is relatively small speed. The modification proposed by the author of this article aims at accelerating computations by replacing a larger number of objects, not only a pair of them, with one object. For any object in the reference set, it is possible to determine all objects from the same class which are located at a shorter distance to it than any other object from a different class. This group of objects can be replaced by a single artificial object.

PL

Zaletą algorytmu Changa jest znaczna redukcja zbioru odniesienia. Wadą tego algorytmu jest względnie mała szybkość działania. Modyfikacja zaproponowana przez autora niniejszego artykułu ma na celu przyspieszenie obliczeń poprzez zastępowanie jednym obiektem nie pary obiektów, ale większej liczby obiektów. Dla każdego obiektu ze zbioru odniesienia można wyznaczyć wszystkie obiekty z tej samej klasy znajdujące się od niego w mniejszej odległości niż jakikolwiek obiekt z innej klasy. Grupa takich obiektów może być zastąpiona jednym sztucznym obiektem.

10

Cascade algorithm for the reference set size reduction

Sierszeń A.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2009

|

T. 13, z. 3/1

995-1008

EN

Two algorithms of the reference set condensation, one of which is based on finding the mutually furthest points and the other is the modification of the Chang's algorithm, are respectively of the incremental and eliminative type, i.e. the size of the condensed set increases or is reduced as a result of a subsequent iteration. The combination of both aforementioned types of condensation, i.e. the cascade algorithm of condensation, is more effective than each of these algorithms executed sepa-rately.

PL

Dwa algorytmy kondesacji zbioru odniesienia, z których jeden jest oparty na znajdowaniu punktów wzajemnie najdalszych, a drugi jest modyfikacją algorytmu Changa, mają odpowiednio przyrostowy i eliminacyjnych charakter, tzn. w wyniku kolejnej iteracji wielkość skondensowanego zbioru odniesienia wzrasta lub jest redukowana. Kombinacja obu wymienionych typów kondensacji, tj. kaskadowy algorytm kondensacji, okazała się efektywniejsza od każdego z tych algorytmów działających samodzielnie.