Wyniki wyszukiwania - BazTech

1

Klasyfikacja danych algorytmy redukcji i edycji zbiorów wykorzystujące miarę reprezentatywności

Raniszewski M.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2010

|

z. 121

463-486

PL

Klasyfikacja danych to podejmowanie decyzji na podstawie informacji, które te dane przenoszą (tzw. cech danych). Prawidłowa i szybka klasyfikacja zależy od prawidłowego przygotowania zbioru danych, jak i doboru odpowiedniego algorytmu klasyfikacji. Jednym z takich algorytmów jest popularny algorytm najbliższego sąsiada (NN). Jego zaletami są prostota, intuicyjność i szerokie spektrum zastosowań. Jego wadą są duże wymagania pamięciowe i spadek szybkości działania dla ogromnych zbiorów danych. Algorytmy redukcji usuwają znaczną część elementów ze zbioru danych, co znacząco przyspiesza działanie algorytmu NN, jednocześnie pozostawiając te, na podstawie których nadal można z zadawalającą jakością klasyfikować dane. Algorytmy edycji oczyszczają zbiór danych z nadmiarowych i błędnych elementów. W artykule zaprezentowane zostaną algorytm redukcji i algorytm edycji zbiorów danych, obydwa wykorzystujące miarę reprezentatywności. Testy przeprowadzono na kilku dobrze znanych w literaturze zbiorach danych różnej wielkości. Otrzymane wyniki są obiecujące. Zestawiono je z wynikami innych popularnych algorytmów redukcji i edycji.

EN

In data classification we make decision based on data features. Proper and fast classification depends on a Preparation of a data set and a selection of a suitable classification algorithm. One of these algorithms is popular Nearest Neighbor Rule (NN). Its advantages are simplicity, intuitiveness and wide rangę of applications. Its disadvantages are large memory requirements and decrease in speed for large data sets. Reduction algorithms remove much of data, which significantly speeds up NN. Simultaneously, they leave that data on the basis of which we can still make decisions with an acceptable classification quality. Editing algorithms remove redundant and atypical data from a data set. In this paper new reduction and editing algorithms, both using the representative measure, are presented. Tests were performed on several well-known in the literature data sets of different sizes. The results are promising. They were compared with the results of other popular reduction and editing procedures.

2

Fast reduction of large dataset for nearest neighbor classifier

Raniszewski M.

Journal of Medical Informatics & Technologies

|

2010

|

Vol. 16

111--116

EN

Accurate and fast classification of large data obtained from medical images is very important. Proper images (data) processing results to construct a classifier, which supports the work of doctors and can solve many medical problems. Unfortunately, Nearest Neighbor classifiers become inefficient and slow for large datasets. A dataset reduction is one of the most popular solution to this problem, but the large size of a dataset causes long time of a reduction phase for reduction algorithms. A simple method to overcome the large dataset reduction problem is a dataset division into smaller subsets. In this paper five different methods of large dataset division are considered. The received subsets are reduced by using an algorithm based on representative measure. The reduced subsets are combined to form the reduced dataset. The experiments were performed on a large (almost 82 000 samples) two–class dataset dating from ultrasound images of certain 3D objects found in a human body.

3

Double sort algorithm resulting in reference set of the desired size

Raniszewski M.

Biocybernetics and Biomedical Engineering

|

2008

|

Vol. 28, no. 4

43-50

EN

An algorithm for obtaining the reduced reference set that does not exceed the desired size is presented. It consists in double sorting of the original reference set samples. The first sort key of the sample x is the number of such samples from the same class, that sample x is their nearest neighbour, while the second one is mutual distance measure proposed by Gowda and Krishna. The five medical datasets are used to compare the proposed procedure with the RMHC-P algorithm introduced by Skalak and the Gowda and Krishna algorithm, which are known as the most effective ones.

4

Nowe metody selekcji cech i redukcji zbiorów odniesienia dla klasyfikatora typu 1-NN

Kośla P., Raniszewski M.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2008

|

T. 12, z. 3

805-820

PL

W artykule zostały przedstawione nowe metody minimalizacji zbioru odniesienia dla klasyfikatora 1-NN, czyli selekcja cech i redukcja zbioru odniesienia. Do selekcji cech zaproponowano metodę wykorzystującą badanie zależności miedzy cechami, a do redukcji zbioru odniesienia użyto sekwencyjnego algorytmu wykorzystującego podwójne sortowanie punktów. Rozstrzygnięto również, w jakiej kolejności procedury te powinny zostać zastosowane, analizując ich wpływ na jakość klasyfikacji i stopień redukcji danych. Zarówno nowe metody, jak i dobrze znane, takie jak procedura kolejnego dołączania cech, algorytm Gowdy-Krishny i algorytm RMHC zaproponowany przez Skalaka, zostały przetestowane na siedmiu zbiorach danych rzeczywistych i sztucznych.

EN

The reference set minimization methods for 1-NN classifier were proposed. The combine of a feature selection procedure, based on analysis of dependences between features, and reference set reduction algorithm that uses double point sorting was introduced. The proposed approach to the reference set size reduction was compared with the wellknown forward feature selection, the Gowda and Krishna algorithm and the RMHC algorithm introduced by Skalak. The computational experiments were performed with use of seven real and artificial datasets.