PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

On-line Approximate String Matching in Natural Language

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We consider approximate pattern matching in natural language text. We use the words of the text as the alphabet, instead of the characters as in traditional string matching approaches. Hence our pattern consists of a sequence of words. >From the algorithmic point of view this has several advantages: (i) the number of words is much less than the number of characters, which in effect means shorter text (less possible matching positions); (ii) the pattern effectively becomes shorter, so bit-parallel techniques become more applicable; (iii) the alphabet size becomes much larger, so the probability that two symbols (in this case, words) match is reduced. We extend several known approximate string matching algorithms for this scenario, allowing \kW insertions, deletions or substitutions of symbols (natural language words). We further extend the algorithms to allow \kC errors inside the pattern symbols (words) as well. The two error thresholds \kW and \kC can be applied simultaneously and independently. Hence we have in effect two alphabets, and perform approximate matching in both levels. >From the application point of view the advantage is that the method is flexible, allowing simple solutions to problems that are hard to solve with traditional approaches. Finally, we extend the algorithms to handle multiple patterns at the same time. Depending on the search parameters, we obtain algorithms that run in linear or sublinear time and that perform the optimal number of word comparisons on average, We conclude with experimental results showing that the methods work well in practice.
Wydawca
Rocznik
Strony
453--466
Opis fizyczny
wykr., bibliogr. 20 poz.
Twórcy
Bibliografia
  • [1] Baeza-Yates, R., Navarro, G.: Faster Approximate String Matching, Algorithmica, 23(2), 1999, 127-158.
  • [2] Baeza-Yates, R., Navarro, G.: New and Faster Filters for Multiple Approximate String Matching, Random Structures and Algorithms (RSA), 20, 2002, 23-49.
  • [3] Chang, W., Marr, T.: Approximate String Matching with Local Similarity, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (M. Crochemore, D. Gusfield, Eds.), number 807 in Lecture Notes in Computer Science, Springer-Verlag, Berlin, Asilomar, CA, 1994.
  • [4] Crochemore, M., Czumaj, A., Ga¸sieniec, L., Jarominek, S., Lecroq, T., Plandowski,W., Rytter,W.: Speeding up two string matching algorithms, Algorithmica, 12(4/5), 1994, 247-267.
  • [5] Fredriksson, K.: Metric Indexes for Approximate String Matching in a Dictionary, Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE'2004), LNCS 3246, Springer-Verlag, 2004.
  • [6] Fredriksson, K., Navarro, G.: Average-Optimal Single and Multiple Approximate String Matching, ACM Journal of Experimental Algorithmics (JEA), 9(1.4), 2004.
  • [7] Heaps, H. S.: Information retrieval: theoretical and computational aspects, Academic Press, New York, NY, 1978.
  • [8] Hyyrö, H.: Explaining and Extending the Bit-parallel Approximate String Matching Algorithm of Myers, Technical Report A-2001-10, Department of Computer and Information Sciences, University of Tampere, Tampere, Finland, 2001.
  • [9] Hyyrö, H., Fredriksson, K., Navarro, G.: Increased Bit-Parallelism for Approximate String Matching, Proc. 3rd Workshop on Efficient and Experimental Algorithms (WEA'04), LNCS 3059, 2004.
  • [10] Hyyrö, H., Navarro, G.: Bit-parallel Witnesses and their Applications to Approximate String Matching, Algorithmica, 41(3), 2005, 203-231.
  • [11] Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and FlexibleWord Searching on Compressed Text, ACM Transactions on Information Systems (TOIS), 18(2), 2000, 113-139.
  • [12] Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. Assoc. Comput. Mach., 46(3), 1999, 395-415.
  • [13] Navarro, G.: Approximate Text Searching, Ph.D. Thesis, Department of Computer Science, University of Chile, December 1998.
  • [14] Navarro, G., Raffinot, M.: A Bit-parallel approach to Suffix Automata: Fast Extended String Matching, Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM'98), LNCS v. 1448, Springer-Verlag, 1998.
  • [15] Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings - Practical on-line search algorithms for texts and biological sequences, Cambridge University Press, 2002, ISBN 0-521-81307-7. 280 pages.
  • [16] Sellers, P. H.: The theory and computation of evolutionary distances: Pattern recognition, J. Algorithms, 1(4), 1980, 359-373.
  • [17] Shang, H., Merrett, T. H.: Tries for Approximate String Matching, Knowledge and Data Engineering, 8(4), 1996, 540-547.
  • [18] Ukkonen, E.: Algorithms for approximate string matching, Inf. Control, 64(1-3), 1985, 100-118.
  • [19] Yao, A. C.: The complexity of pattern matching for a random string, SIAM J. Comput., 8(3), 1979, 368-387.
  • [20] Zipf, G.: Human Behaviour and the Principle of Least Effort, Addison-Wesley, 1949.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0010-0081
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.