The evaluation of text string matching algorithms as an aid to image search

Ochelska-Mierzejewska, J.

Artykuł - szczegóły

Tytuł artykułu

The evaluation of text string matching algorithms as an aid to image search

Autorzy

Ochelska-Mierzejewska J.

Wybrane pełne teksty z tego czasopisma

https://eczasopisma.p.lodz.pl/JACS/issue/archive

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The main goal of this paper is to analyse intelligent text string matching methods (like fuzzy sets and relations) and evaluate their usefulness for image search. The present study examines the ability of different algorithms to handle multi-word and multi-sentence queries. Eight different similarity measures (N-gram, Levenshtein distance, Jaro coefficient, Dice coefficient, Overlap coefficient, Euclidean distance, Cosine similarity and Jaccard similarity) are employed to analyse the algorithms in terms of time complexity and accuracy of results. The outcomes are used to develop a hierarchy of methods, illustrating their usefulness to image search. The search response time increases significantly in the case of data sets containing several thousand images. The findings indicate that the analysed algorithms do not fulfil the response-time requirements of professional applications. Due to its limitations, the proposed system should be considered only as an illustration of a novel solution with further development perspectives. The use of Polish as the language of experiments affects the accuracy of measures. This limitation seems to be easy to overcome in the case of languages with simpler grammar rules (e.g. English).

Słowa kluczowe

text comparison N-gram Levenshtein distance Jaro coefficient Dice's coefficient Overlap coefficient Euclidean distance cosine similarity Jaccard similarity

porównywanie tekstu N-gram odległość Levenshteina współczynnik Jaro współczynnik Dice'a odległość euklidesowa miara kosinusowa miara Jaccarda

Wydawca

Wydawnictwo Politechniki Łódzkiej

Czasopismo

Journal of Applied Computer Science

Rocznik

2018

Tom

Vol. 26, nr 1

Strony

33--62

Opis fizyczny

Bibliogr. 9 poz., 1 rys., wykr.

Twórcy

autor

Ochelska-Mierzejewska J.

joanna.ochelska-mierzejewska@p.lodz.pl

Lodz University of Technology, Institute of Information Technology, ul. Wólczańska 215, 90-924 Łódź, Poland

Bibliografia

[1] Ng, C. W., Inexact Pattern Matching Algorithms via Automata http://cmgm.stanford.edu/biochem218/Projects.html, Tech. rep., Stanford, 2008.
[2] Zadeh, L., Fuzzy Sets, Information and Control, Vol. 8, 1965, pp. 338-353.
[3] Zadeh, L., The concept of a linguistic variable and its application to approximate reasoning (I), Information Science, Vol. 8, 1975, pp. 199-249.
[4] Ochelska, J., Szczepaniak, P., and Niewiadomski, A., Automatic Summarization of Standarized Textual Databases Interpreted in Terms of Intuitionistic Fuzzy Sets, In: Soft Computing: Tools, Techniques and Application, edited by P. Grzegorzewski, M. Krawczak, and S. Zadrozny, The Academic Press EXIT, 2004, pp. 204-216.
[5] Niewiadomski, A., Intuicjonistyczne zbiory rozmyte w komputerowym określaniu podobieństwa dokumentów tekstowych, Ph.D. thesis, Instytut Badań Systemowych PAN, 2001.
[6] Winkler, W. and Thibaudeau, Y., An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, http://www.census.gov/srd/papers/pdf/rr91-9.pdf, 2004.
[7] Porter, E. and Winkler, W., Approximate String Comparison and its Effect on an Advanced Record Linkage System, http://www.fcsm.gov/workingpapers /porter-winkler.pdf, 2003.
[8] Cohen, W., Ravikumar, P., and Fienberg, S., A Comparison of String Metrics for Matching Names and Records, http://www.cs.cmu.edu/wcohen/postscript/kdd-2003-match-ws.pdf, 2002.
[9] Navarro, G., A Guided Tour to Approximate String Matching, ACM Computing Surveys, Vol. 33, No. 1, 2001.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-86d4333f-a845-42e4-b645-82a95ae83cad