Wyniki wyszukiwania - BazTech

1

Alignment for rooted labeled caterpillars

Ukita Yoshiyuki, Yoshino Takuya, Hirata Kouichi

Annals of Computer Science and Information Systems

|

2019

|

Vol. 19

19--25

EN

A rooted labeled caterpillar (caterpillars, for short) is a rooted labeled tree transformed to a rooted path after removing all the leaves in it. In this paper, we design the algorithm to compute the alignment distance between caterpillars in O(h2λ3) time under the general cost function and in O(h2λ) time under the unit cost function, where h is the maximum height and λ is the maximum number of leaves in caterpillars.

2

Niewiarowski A.

Czasopismo Techniczne. Nauki Podstawowe

|

2016

|

Y. 113, iss. 1-NP

159--173

EN

This paper proposes a method of comparing the short texts using the Levenshtein distance algorithm and thesaurus for analysing terms enclosed in texts instead of popular methods exploiting the grammatical variations glossary. The tested texts contain a variety of nouns and verbs together with grammatical or orthographical mistakes. Based on the proposed new algorithm the similarity of such texts will be estimated. The described technique is compared with methods: Cosine distances, distance Dice and Jaccard distance constructed on the term frequency method. The proposition is competitive against well-known algorithms of stemming and lemmatization.

PL

Artykuł przedstawia propozycję metody porównywania krótkich fragmentów tekstów bazującą na algorytmie odległości Levenshteina i słowniku wyrazów bliskoznacznych. Porównywane teksty zawierają odmienione terminy oraz celowe błędy ortograficzne i gramatyczne. Opisany mechanizm zestawiony został z popularnymi metodami porównywania tekstów, takimi jak: odległości Kosinusowa, Dice’a i Jaccard’a, dla których wartości wektorów obliczane są metodą częstości terminów. Zastosowanie w mechanizmie słownika wyrazów bliskoznacznych jest alternatywą wobec znanych algorytmów określania rdzenia terminu i lematyzacji w analizie danych tekstowych.

3

Parallelization of the Levenshtein distance algorithm

Niewiarowski A., Stanuszek M.

Czasopismo Techniczne. Nauki Podstawowe

|

2014

|

R. 111, z. 3-NP

109--122

EN

This paper presents a method for the parallelization of the Levenshtein distance algorithm deployed on very large strings. The proposed approach was accomplished using .NET Framework 4.0 technology with a specific implementation of threads using the System. Threading.Task namespace library. The algorithms developed in this study were tested on a high performance machine using Xamarin Mono (for Linux RedHat/Fedora OS). The computational results demonstrate a high level of efficiency of the proposed parallelization procedure.

PL

Artykuł przedstawia metodę zrównoleglenia algorytmu analizy odległości edycyjnej Levenshteina dedykowaną bardzo dużym ciągom tekstowym. Zaproponowane rozwiązanie zostało zaimplementowane na platformie .NET Framework 4.0 z uwzględnieniem metod dostępnych w przestrzeni nazw System.Threading.Task. Zastosowane algorytmy przetestowano na komputerze wysokiej wydajności, w oparciu o narzędzia Xamarin Mono (dla SO Linux RedHat/ Fedora). Otrzymane wyniki pokazują znacząco zwiększoną wydajność obliczeń dla przedstawionych w artykule rozwiązań.

4

Automatyczne sprawdzanie poprawności pisowni w języku polskim oparte na odległości Levenshteina

Dorosz K.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2008

|

T. 12, z. 1

29-40

PL

Ogólnie stosowane metody sprawdzania poprawności pisowni wyrazów opierają się na wykorzystaniu zagadnienia odległości Levenshteina. Metody te do działania wymagają obecności słownika fleksyjnego języka, w którym sprawdzane wyrazy zostały napisane. Ze względu na to, że metody te zostały pierwotnie utworzone na potrzeby języka angielskiego, nie są optymalne w użyciu do przetwarzania tekstów w języku polskim. W niniejszym artykule zaprezentowano charakterystyczne cechy języka polskiego, które wpływają na budowę spellcheckera oraz propozycję pewnej adaptacji metody odległości Levenshteina z uwzględnieniem tych specyficznych cech. Nowy algorytm wykazuje się poprawą jakościową w poprawianiu tekstów napisanych w języku polskim.

EN

Today's widely used spellchecking methods are based on Levenshtein distance algorithms. Inflectional dictionary of language is also needed in spellchecking process. These methods are not optimal for spellchecking texts written in Polish language, because they were inwented for use with English texts, and are optimized for it. This article provides information about characteristics of Polish language that have impact on spellchecking optimizations, as also some proposition of spellchecker implementation based on Levenshtein distance that will use Polish language characteristics and will bring some improvement in Polish texts spellchecking process.

5

Hirschberg's algorithm for approximate matching

Drozdek A.

Computer Science

|

2002

|

Vol. 4

91-100

EN

The Hirschberg algorithm was devised to solve the longest common subsequence problem. The paper discusses the way of adopting the algorithm to solve the string matching problem in linear space to determine edit distance for two strings and their alignment.

PL

Algorytm Hirschberga został podany w celu rozwiązania problemu najdłuższego wspólnego podciągu. Niniejszy artykuł prezentuje sposób zaadoptowania tego algorytmu do rozwiązania przy liniowych wymogach pamięciowych problemu wyszukiwania wzorca w celu znalezienia odległości edycyjnej dwóch tekstów i ich wyrównania.