PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Occurrence and Substring Heuristics for [ro]-Matching

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We consider a version of pattern matching useful in processing large musical data: d-matching, which consists in finding matches which are d-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols a, b is measured as |a-b|. We also consider (d,g)-matching, where g is a bound on the total sum of the differences. We first consider ``occurrence heuristics'' by adapting exact string matching algorithms to the two notions of approximate string matching. The resulting algorithms are efficient in practice. Then we consider ``substring heuristics''. We present d-matching algorithms fast on the average providing that the pattern is ``non-flat'' and the alphabet interval is large. The pattern is ``flat'' if its structure does not vary substantially. The algorithms, named d-BM1, d-BM2 and d-BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only ``occurrence heuristics'' have been considered. Our substring heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use d-versions of suffix tries and subword graphs. Surprisingly, in the context of d-matching subword graphs appear to be superior compared with compact suffix trees.
Wydawca
Rocznik
Strony
1--21
Opis fizyczny
bibliogr. 22 poz.
Twórcy
autor
autor
autor
  • LIFAR-ABISS, Faculté des Sciences et Techniques, Université de Rouen, 76821 Mont-Saint-Aignan CEDEX, France, mac@univ-mlv.fr
Bibliografia
  • [1] Aho, A. V., Corasick, M. J.: Efficient string matching: an aid to bibliographic search, Communications of the ACM, 18(6), 1975, 333-340.
  • [2] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J., Basic local alignment search tool, Journal of Molecular Biology, 215(3), 1990, 403-410.
  • [3] Baeza-Yates, R., Gonnet, G.: A new approach to text searching, Communications of the ACM, 35(10), 1992, 74-82.
  • [4] Boyer, R. S., Moore, J. S.: A fast string searching algorithm, Communications of the ACM, 20(10), 1977, 762-772.
  • [5] Cambouropoulos, E., Crawford, T., Iliopoulos, C. S.: Pattern Processing in Melodic Sequences: Challenges, Caveats and Prospects, Proc. of the AISB’99 Convention (Artificial Intelligence and Simulation of Behaviour), Edinburgh, U.K., 1999, 42-47.
  • [6] Cambouropoulos, E., Crochemore, M., Iliopoulos, C. S., Mouchard, L., Pinzon, Y. J.: Algorithms for computing approximate repetitions in musical sequences, Proc. 10th Australasian Workshop On Combinatorial Algorithms (R. Raman, J. Simpson, Eds.), Perth, WA, Australia, 1999, 129-144.
  • [7] Charras, C., Lecroq, T., Pehoushek, J. D.: A very fast string matching algorithm for small alphabets and long patterns, Proc. 9th Annual Symposium on Combinatorial Pattern Matching (M. Farach-Colton, Ed.), Piscataway, NJ, LNCS 1448, Springer-Verlag, Berlin, 1998, 55-64.
  • [8] Crawford, T., Iliopoulos, C. S., Raman, R.: String Matching Techniques for Musical Similarity and Melodic Recognition, Computing in Musicology, 11, 1998, 73-100.
  • [9] Crochemore, M., Czumaj, A., Ga¸sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding-up two string matching algorithms, Algorithmica, 12(4/5), 1994, 247-267.
  • [10] Crochemore, M., Iliopoulos, C. S., Lecroq, T., Pinzon, Y. J.: Approximate string matching in musical sequences, Proc. Prague Stringology Conference’01 (M. Bal´ık, M. Sim´anek, Eds.), Prague, Tcheque Republic, 2001, Annual Report DC-2001-06, 26-36.
  • [11] Fischetti, V. A., Landau, G. M., Schmidt, J. P., Sellers, P. H.: Identifying periodic occurrences of a template with applications to protein structure, Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, LNCS 644, 1992, 111-120.
  • [12] Fredkin, E.: Trie memory, Communications of the ACM, 3(9), 1960, 490-499.
  • [13] Hume, A., Sunday, D. M.: Fast string searching, Software-Practice and Experience, 21(11), 1991, 1221-1248.
  • [14] Karlin, S., Morris, M., Ghandour, G., Leung, M.-Y.: Efficient algorithms for molecular sequences analysis, Proc. Natl. Acad. Sci. USA, 85(3), 1988, 841-845.
  • [15] McGettrick, P.: MIDIMatch: Musical Pattern Matching in Real Time, MSc Dissertation, York University, U.K., 1997.
  • [16] Milosavljevic, A., Jurka, J.: Discovering simple DNA sequences by the algorithmic significance method, Computer Applications in Biosciences, 9(4), 1993, 407-411.
  • [17] Pevzner, P. A., Feldman, W.: Gray Code Masks for DNA Sequencing by Hybridization, Genomics, 23, 1993, 233-235.
  • [18] Rolland, P.-Y., Ganascia, J.-G.: Musical Pattern Extraction and Similarity Assessment, Readings in Music and Artificial Intelligence (E. R. Miranda, Ed.), Harwood Academic Publishers, 2000, 115-144.
  • [19] Schmidt, J. P.: All shortest paths in weighted grid graphs and its application to finding all approximate repeats in strings, SIAM Journal on Computing, 27(4), 1998, 972-992.
  • [20] Skiena, S. S., Sundaram, G.: Reconstructing strings from substrings, Journal of Computational Biology, 2, 1995, 333-353.
  • [21] Sunday, D. M.: A very fast substring search algorithm, Communications of the ACM, 33(8), 1990, 132-142.
  • [22] Wu, S., Manber, U.: Fast text searching allowing errors, Communications of the ACM, 35(10), 1992, 83-91.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0004-0121
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.