Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 9

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  Levenshtein distance
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
The article presents a system for testing the independence of solutions to algorithmic problems sent by students as part of the student programming competition. First, the context was discussed, as well as the need to organize programming competitions resulting from this context. Then, an algorithm was proposed to study the mutual similarity of source codes of programs sent as part of a programming competition. Since, after implementation, the algorithm was used in practice, examples of its application for detecting the plagiarism of source codes of solutions in two programming competitions conducted as part ofmclasses on Algorithms and Numerical Methods were also presented. Finally, the effectiveness of the solutions used in the work was discussed.
2
Content available remote Similarity detection based on document matrix model and edit distance algorithm
EN
This paper presents a new algorithm with an objective of analyzing the similarity measure between two text documents. Specifically, the main idea of the implemented method is based on the structure of the so-called “edit distance matrix” (similarity matrix). Elements of this matrix are filled with a formula based on Levenshtein distances between sequences of sentences. The Levenshtein distance algorithm (LDA) is used as a replacement for various implementations of stemming or lemmatization methods. Additionally, the proposed algorithm is fast, precise, and may be implemented for analyzing very large documents (e.g., books, diploma works, newspapers, etc.). Moreover, it seems to be versatile for the most common European languages such as Polish, English, German, French and Russian. The presented tool is intended for all employees and students of the university to detect the level of similarity regarding analyzed documents. Results obtained in the paper were confirmed in the tests shown in the article.
3
Content available remote The evaluation of text string matching algorithms as an aid to image search
EN
The main goal of this paper is to analyse intelligent text string matching methods (like fuzzy sets and relations) and evaluate their usefulness for image search. The present study examines the ability of different algorithms to handle multi-word and multi-sentence queries. Eight different similarity measures (N-gram, Levenshtein distance, Jaro coefficient, Dice coefficient, Overlap coefficient, Euclidean distance, Cosine similarity and Jaccard similarity) are employed to analyse the algorithms in terms of time complexity and accuracy of results. The outcomes are used to develop a hierarchy of methods, illustrating their usefulness to image search. The search response time increases significantly in the case of data sets containing several thousand images. The findings indicate that the analysed algorithms do not fulfil the response-time requirements of professional applications. Due to its limitations, the proposed system should be considered only as an illustration of a novel solution with further development perspectives. The use of Polish as the language of experiments affects the accuracy of measures. This limitation seems to be easy to overcome in the case of languages with simpler grammar rules (e.g. English).
EN
This article focuses on semantic tagging of content in terms of sentimental meaning which may often lead to ambiguities between the primary sense of the word and its meaning in a particular expression. To address this issue, a specially modified Levenshtein distance algorithm for suffix-mitigation was used to measure similarity of words. Sentence sentiment classification was based on fuzzy logic approach and a fuzzy classifier. The presented method was experimentally tested with the sentimental analysis of selected sentences in the Polish language. Limitations of the presented method and possible improvements are discussed.
PL
Artykuł skupia się na semantycznym tagowaniu zawartości tekstu w kategoriach znaczenia sentymentalnego, które często może prowadzić do niejednoznaczności między pierwotnym wydzwiekiem słowa i jego znaczeniem w danej wypowiedzi. Aby zmierzyć się z tym zagadnieniem zastosowano specjalnie zmodyfikowany algorytm na odległość Levenshteina z łagodzeniem znaczenia końcówki fleksyjnej wyrazu do pomiaru podobieństwa słów. Sentymentalna klasyfikacja zdań została oparta na logice rozmytej i podejścia rozmytego klasyfikatora. Przedstawiona metoda została eksperymentalnie sprawdzona z sentymentalnej analizy wybranych zdań w języku polskim. Ograniczenia prezentowanej metody oraz możliwe ulepszenia są również omówiane.
5
Content available remote Parallelization of the Levenshtein distance algorithm
EN
This paper presents a method for the parallelization of the Levenshtein distance algorithm deployed on very large strings. The proposed approach was accomplished using .NET Framework 4.0 technology with a specific implementation of threads using the System. Threading.Task namespace library. The algorithms developed in this study were tested on a high performance machine using Xamarin Mono (for Linux RedHat/Fedora OS). The computational results demonstrate a high level of efficiency of the proposed parallelization procedure.
PL
Artykuł przedstawia metodę zrównoleglenia algorytmu analizy odległości edycyjnej Levenshteina dedykowaną bardzo dużym ciągom tekstowym. Zaproponowane rozwiązanie zostało zaimplementowane na platformie .NET Framework 4.0 z uwzględnieniem metod dostępnych w przestrzeni nazw System.Threading.Task. Zastosowane algorytmy przetestowano na komputerze wysokiej wydajności, w oparciu o narzędzia Xamarin Mono (dla SO Linux RedHat/ Fedora). Otrzymane wyniki pokazują znacząco zwiększoną wydajność obliczeń dla przedstawionych w artykule rozwiązań.
PL
Artykuł opisuje mechanizm identyfikacji i klasyfikacji treści, oparty na metodzie ważenia terminów, bazującej na odwrotnej częstości dokumentowej, częstości wystąpienia terminu i odległości Levenshteina. Zaproponowany mechanizm zaimplementowano w program analizujący tematy i opisy prac dyplomowych, w celu automatycznego doboru promotorów i recenzentów.
EN
This paper presents the mechanism of identification and classification of content, based on terms weighted method with inversed document frequency analysis and Levenstein distance technique. The proposed mechanism is applied in the analysis of topics and descriptions of selected diploma thesis, to automatic selection of supervisors and reviewers.
EN
The changing information technology makes data increase exponentially in all areas, the quality of the huge amounts of data is the core problems. Data cleaning is an effective technology to solve data quality problems. This paper focuses on the duplicate data cleaning techniques. It studies the quality of the data from the architectural level, the instance-level problems, the multi-source single-source problems, duplicated records cleaning application platform and the evaluation criteria. In these studies, a improved novel detection method adopts the fuzzy clustering algorithm with the Levenshtein distance combination to data cleaning .It can accurately and quickly detect and remove duplicate raw data. The improved method includes a similar duplicate records detection process, the major system framework design, system function modules of the implementation process and results analysis in the paper. The precision and recall rates are higher than several other data cleaning methods. These comparisons confirm the validity of the method. The experimental results exhibit that the proposed method is effective in data detection and cleaning process.
PL
Artykuł proponuje nowe metody czyszczenia danych z uwzględnieniem liczby przypadków, wielu źródeł, podwójnych rekordów i innych kryteriów oceny. Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina. W ten sposób szybko wykrywane są i usuwane podwójne wiersze danych.
EN
This study examines the effectiveness of normalized Levenshtein metrics in the process of recognition of handwritten signatures. Three methods of normalization of the Levenshtein metric were taken into consideration. In addition, it was determined, which signature features are most important during their comparisons with the use of the aforementioned metric. The following signature features were examined: coordinates of signature points, pen pressure in successive points, and different types of pen speed. The influence of individual parameters of the Levenshtein algorithm on the obtained results was also determined, and the best method of normalization was selected.
PL
Ogólnie stosowane metody sprawdzania poprawności pisowni wyrazów opierają się na wykorzystaniu zagadnienia odległości Levenshteina. Metody te do działania wymagają obecności słownika fleksyjnego języka, w którym sprawdzane wyrazy zostały napisane. Ze względu na to, że metody te zostały pierwotnie utworzone na potrzeby języka angielskiego, nie są optymalne w użyciu do przetwarzania tekstów w języku polskim. W niniejszym artykule zaprezentowano charakterystyczne cechy języka polskiego, które wpływają na budowę spellcheckera oraz propozycję pewnej adaptacji metody odległości Levenshteina z uwzględnieniem tych specyficznych cech. Nowy algorytm wykazuje się poprawą jakościową w poprawianiu tekstów napisanych w języku polskim.
EN
Today's widely used spellchecking methods are based on Levenshtein distance algorithms. Inflectional dictionary of language is also needed in spellchecking process. These methods are not optimal for spellchecking texts written in Polish language, because they were inwented for use with English texts, and are optimized for it. This article provides information about characteristics of Polish language that have impact on spellchecking optimizations, as also some proposition of spellchecker implementation based on Levenshtein distance that will use Polish language characteristics and will bring some improvement in Polish texts spellchecking process.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.