Klasyfikacja danych algorytmy redukcji i edycji zbiorów wykorzystujące miarę reprezentatywności

Raniszewski, M.

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!
Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Klasyfikacja danych algorytmy redukcji i edycji zbiorów wykorzystujące miarę reprezentatywności

Autorzy

Raniszewski M.

Wybrane pełne teksty z tego czasopisma

http://cybra.lodz.pl/dlibra/collectiondescription?dirids=8

Identyfikatory

Warianty tytułu

Data classification data set reduction and editing algorithms using the representative measure

Języki publikacji

Abstrakty

Klasyfikacja danych to podejmowanie decyzji na podstawie informacji, które te dane przenoszą (tzw. cech danych). Prawidłowa i szybka klasyfikacja zależy od prawidłowego przygotowania zbioru danych, jak i doboru odpowiedniego algorytmu klasyfikacji. Jednym z takich algorytmów jest popularny algorytm najbliższego sąsiada (NN). Jego zaletami są prostota, intuicyjność i szerokie spektrum zastosowań. Jego wadą są duże wymagania pamięciowe i spadek szybkości działania dla ogromnych zbiorów danych. Algorytmy redukcji usuwają znaczną część elementów ze zbioru danych, co znacząco przyspiesza działanie algorytmu NN, jednocześnie pozostawiając te, na podstawie których nadal można z zadawalającą jakością klasyfikować dane. Algorytmy edycji oczyszczają zbiór danych z nadmiarowych i błędnych elementów. W artykule zaprezentowane zostaną algorytm redukcji i algorytm edycji zbiorów danych, obydwa wykorzystujące miarę reprezentatywności. Testy przeprowadzono na kilku dobrze znanych w literaturze zbiorach danych różnej wielkości. Otrzymane wyniki są obiecujące. Zestawiono je z wynikami innych popularnych algorytmów redukcji i edycji.

In data classification we make decision based on data features. Proper and fast classification depends on a Preparation of a data set and a selection of a suitable classification algorithm. One of these algorithms is popular Nearest Neighbor Rule (NN). Its advantages are simplicity, intuitiveness and wide rangę of applications. Its disadvantages are large memory requirements and decrease in speed for large data sets. Reduction algorithms remove much of data, which significantly speeds up NN. Simultaneously, they leave that data on the basis of which we can still make decisions with an acceptable classification quality. Editing algorithms remove redundant and atypical data from a data set. In this paper new reduction and editing algorithms, both using the representative measure, are presented. Tests were performed on several well-known in the literature data sets of different sizes. The results are promising. They were compared with the results of other popular reduction and editing procedures.

Słowa kluczowe

klasyfikacja wzorców klasyfikacja danych algorytm k najbliższego sąsiada NNR algorytmy redukcji CNN algorytmy edycji miara reprezentatywności

pattern recognition date classification nearest neighbour rule NNR reduction algorithms CNN Condesed Nearest Neighbor editiong algorithms representative measure

Wydawca

Wydawnictwo Politechniki Łódzkiej

Czasopismo

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

Rocznik

2010

Tom

z. 121

Strony

463--486

Opis fizyczny

Bibliogr. 19 poz.

Twórcy

autor

Raniszewski M.

Wydział Elektrotechniki, Elektroniki, Informatyki i Automatyki Politechniki Łódzkiej

Bibliografia

[1] Duda R.O., Hart P.E., Stork D.G.: Pattern Classification - Second Edition. John Wiley & Sons, Inc., 2001.
[2] Theodoridis S., Koutroumbas K.: Pattern Recognition - Third Edition. Academic Press - Elsevier, USA, 2006.
[3] Staj)or K.: Automatyczna klasyfikacja obiektow. EXIT, Warszawa, 2005.
[4] Hart P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory, vol. IT-14, 3, 1968, pp. 515-516.
[5] Gowda K. C, Krishna G.: The condensed nearest neighbor rule using the concept of mutual nearest neighborhood. IEEE Transaction on Information Theory, v. IT-25, 4, 1979, pp. 488-490.
[6] Tomek L: Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-6, No. 11, 1976, pp. 769-772.
[7] Dasarathy B.V.: Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Transactions on Systems, Man, and Cybernetics 24(3), 1994, pp. 511-517.
[8] Skalak D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. 11 th International Conference on Machine Learning, New Brunswick, NJ, USA, 1994, pp. 293-301.
[9] Kuncheva L.I., Bezdek J.C.: Nearest prototype classification: clustering, genetic algorithms, or random search? IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 28(1), 1998, pp. 160-164.
[10] Cerveron V., Ferri F.J.: Another move towards the minimum consistent subset: A tabu search approach to the condensed nearest neighbor rule. IEEE Trans, on Systems, Man and Cybernetics, Part B: Cybernetics, Vol. 31(3), 2001, pp. 408-413.
[11] Wilson D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions On Systems, Man and Cybernetics, Vol. 2, 1972, pp. 408-421.
[12] Tomek L: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-6, No. 6, 1976, pp. 448-452.
[13] Devijver P.A., Kittler J.: On the edited nearest neighbor rule. Proc. 5th Internat. Conf. on Pattern Recognition, 1980, pp. 72-80.
[14] Kuncheva L.I.: Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognition Letters, Vol. 16, 1995, pp. 809-814.
[15] Raniszewski M.: The Edited Nearest Neighbor Rule Based on the Reduced Reference Set and the Consistency Criterion. Biocybernetics and Biomedical Engineering, Vol. 30(1), 2010, pp. 31-40.
[16] Xi X., Keogh E.J., Shelton C, Wei L., Ratanamahatana C.A.: Fast Time Series Classification Using Numerosity Reduction. ICML, 2006, pp. 1033-1040.
[17] Frank A., Asuncion A.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2010.
[18] The ELENA Project Real Databases [http://www.dice.ucl.ac.be/neural-nets/Research/Projects/ELENA/databases/REAL/].
[19] Kuncheva L.I.: Fitness functions in editing k-NN reference set by genetic algorithms. Pattern Recognition, Vol. 30, No. 6, 1997, pp. 1041-1049.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-LOD1-0030-0041