Wyniki wyszukiwania - BazTech

1

Compact and hash based variants of the suffix array

Grabowski S., Raniszewski M.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2017

|

Vol. 65, nr 4

407--418

EN

Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n–1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5–2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.

2

Methods of strong reduction and edition of a reference set for the nearest neighbour rule

Raniszewski M.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2010

|

z. 122

37-46

EN

The article summarises a doctoral dissertation proposing new methods of a reference set reduction and edition for the Nearest Neighbour Rule (NN).The presented methods are designed to accelerate NN and to improve its classification quality. The algorithms use the concept of the object representativeness. The obtained results were compared with the results provided by well-known and popular reduction and editing procedures.

PL

W artykule zaprezentowano tezy i podstawowe wyniki rozprawy doktorskiej dotyczącej nowych metod redukcji i edycji zbioru odniesienia dla reguły typu najbliszy sąsiad (NN). Przedstawione metody mają na celu przyspieszenie działania reguły NN i poprawę jej jakości klasyfikacji. Zaprezentowane algorytmy w większości wykorzystują pojęcie reprezentatywności obiektu. Wyniki ich działania zostały porównane z wynikami działania innych popularnych algorytmów redukcji i edycji.

3

Klasyfikacja danych algorytmy redukcji i edycji zbiorów wykorzystujące miarę reprezentatywności

Raniszewski M.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2010

|

z. 121

463-486

PL

Klasyfikacja danych to podejmowanie decyzji na podstawie informacji, które te dane przenoszą (tzw. cech danych). Prawidłowa i szybka klasyfikacja zależy od prawidłowego przygotowania zbioru danych, jak i doboru odpowiedniego algorytmu klasyfikacji. Jednym z takich algorytmów jest popularny algorytm najbliższego sąsiada (NN). Jego zaletami są prostota, intuicyjność i szerokie spektrum zastosowań. Jego wadą są duże wymagania pamięciowe i spadek szybkości działania dla ogromnych zbiorów danych. Algorytmy redukcji usuwają znaczną część elementów ze zbioru danych, co znacząco przyspiesza działanie algorytmu NN, jednocześnie pozostawiając te, na podstawie których nadal można z zadawalającą jakością klasyfikować dane. Algorytmy edycji oczyszczają zbiór danych z nadmiarowych i błędnych elementów. W artykule zaprezentowane zostaną algorytm redukcji i algorytm edycji zbiorów danych, obydwa wykorzystujące miarę reprezentatywności. Testy przeprowadzono na kilku dobrze znanych w literaturze zbiorach danych różnej wielkości. Otrzymane wyniki są obiecujące. Zestawiono je z wynikami innych popularnych algorytmów redukcji i edycji.

EN

In data classification we make decision based on data features. Proper and fast classification depends on a Preparation of a data set and a selection of a suitable classification algorithm. One of these algorithms is popular Nearest Neighbor Rule (NN). Its advantages are simplicity, intuitiveness and wide rangę of applications. Its disadvantages are large memory requirements and decrease in speed for large data sets. Reduction algorithms remove much of data, which significantly speeds up NN. Simultaneously, they leave that data on the basis of which we can still make decisions with an acceptable classification quality. Editing algorithms remove redundant and atypical data from a data set. In this paper new reduction and editing algorithms, both using the representative measure, are presented. Tests were performed on several well-known in the literature data sets of different sizes. The results are promising. They were compared with the results of other popular reduction and editing procedures.

4

Fast reduction of large dataset for nearest neighbor classifier

Raniszewski M.

Journal of Medical Informatics & Technologies

|

2010

|

Vol. 16

111--116

EN

Accurate and fast classification of large data obtained from medical images is very important. Proper images (data) processing results to construct a classifier, which supports the work of doctors and can solve many medical problems. Unfortunately, Nearest Neighbor classifiers become inefficient and slow for large datasets. A dataset reduction is one of the most popular solution to this problem, but the large size of a dataset causes long time of a reduction phase for reduction algorithms. A simple method to overcome the large dataset reduction problem is a dataset division into smaller subsets. In this paper five different methods of large dataset division are considered. The received subsets are reduced by using an algorithm based on representative measure. The reduced subsets are combined to form the reduced dataset. The experiments were performed on a large (almost 82 000 samples) two–class dataset dating from ultrasound images of certain 3D objects found in a human body.

5

The edited nearest neighbor rule based on the reduced reference set and the consistency criterion

Raniszewski M.

Biocybernetics and Biomedical Engineering

|

2010

|

Vol. 30, no. 1

31-40

EN

In this paper a new editing procedure for the Nearest Neighbor Rule (NN) is presented. The representativeness measure is introduced and used to choose the most representative samples of the classes. These samples constitute a reduced reference set. An edited reference set is created from all the training set samples (including samples from the reduced set), which are correctly classified by the NN rule operating with the reduced set. The performance of the presented method is evaluated and compared with five other well-known editing techniques, on five medical datasets.

6

Double sort algorithm resulting in reference set of the desired size

Raniszewski M.

Biocybernetics and Biomedical Engineering

|

2008

|

Vol. 28, no. 4

43-50

EN

An algorithm for obtaining the reduced reference set that does not exceed the desired size is presented. It consists in double sorting of the original reference set samples. The first sort key of the sample x is the number of such samples from the same class, that sample x is their nearest neighbour, while the second one is mutual distance measure proposed by Gowda and Krishna. The five medical datasets are used to compare the proposed procedure with the RMHC-P algorithm introduced by Skalak and the Gowda and Krishna algorithm, which are known as the most effective ones.

7

Nowe metody selekcji cech i redukcji zbiorów odniesienia dla klasyfikatora typu 1-NN

Kośla P., Raniszewski M.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2008

|

T. 12, z. 3

805-820

PL

W artykule zostały przedstawione nowe metody minimalizacji zbioru odniesienia dla klasyfikatora 1-NN, czyli selekcja cech i redukcja zbioru odniesienia. Do selekcji cech zaproponowano metodę wykorzystującą badanie zależności miedzy cechami, a do redukcji zbioru odniesienia użyto sekwencyjnego algorytmu wykorzystującego podwójne sortowanie punktów. Rozstrzygnięto również, w jakiej kolejności procedury te powinny zostać zastosowane, analizując ich wpływ na jakość klasyfikacji i stopień redukcji danych. Zarówno nowe metody, jak i dobrze znane, takie jak procedura kolejnego dołączania cech, algorytm Gowdy-Krishny i algorytm RMHC zaproponowany przez Skalaka, zostały przetestowane na siedmiu zbiorach danych rzeczywistych i sztucznych.

EN

The reference set minimization methods for 1-NN classifier were proposed. The combine of a feature selection procedure, based on analysis of dependences between features, and reference set reduction algorithm that uses double point sorting was introduced. The proposed approach to the reference set size reduction was compared with the wellknown forward feature selection, the Gowda and Krishna algorithm and the RMHC algorithm introduced by Skalak. The computational experiments were performed with use of seven real and artificial datasets.

8

Stratna kompresja obrazu z wykorzystaniem aproksymacji liniowej

Raniszewski M.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2006

|

T. 10, z. 3

455-470

PL

Artykuł przedstawia algorytm stratnej kompresji obrazu z wykorzystaniem aproksymacji liniowej. Omówione są wyniki kompresji przykładowych bitmap. Sformułowane są również wnioski na temat przydatności tego algorytmu dla pewnego rodzaju obrazów.

EN

In this paper lossy image compression algorithm has been presented. The algorithm uses linear approximation. The article discusses the compression result of example bitmaps. The conclusions of the usefulness of the algorithm for some kind of pictures has been discussed.