Occurrence and Substring Heuristics for [ro]-Matching

Crochemore, M.; Iliopoulos, C.S.; Lecroq, T.; Pinzon, Y.J.; Plandowski, W.; Rytter, W.

Artykuł - szczegóły

Tytuł artykułu

Occurrence and Substring Heuristics for [ro]-Matching

Autorzy

Crochemore M. , Iliopoulos C.S. , Lecroq T. , Pinzon Y.J. , Plandowski W. , Rytter W.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

We consider a version of pattern matching useful in processing large musical data: d-matching, which consists in finding matches which are d-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols a, b is measured as |a-b|. We also consider (d,g)-matching, where g is a bound on the total sum of the differences. We first consider ``occurrence heuristics'' by adapting exact string matching algorithms to the two notions of approximate string matching. The resulting algorithms are efficient in practice. Then we consider ``substring heuristics''. We present d-matching algorithms fast on the average providing that the pattern is ``non-flat'' and the alphabet interval is large. The pattern is ``flat'' if its structure does not vary substantially. The algorithms, named d-BM1, d-BM2 and d-BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only ``occurrence heuristics'' have been considered. Our substring heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use d-versions of suffix tries and subword graphs. Surprisingly, in the context of d-matching subword graphs appear to be superior compared with compact suffix trees.

Słowa kluczowe

string algorithm approximate string matching dynamic programming computer-assisted music analysis

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2003

Tom

Vol. 56, nr 1,2

Strony

1--21

Opis fizyczny

bibliogr. 22 poz.

Twórcy

autor

Crochemore M.

autor

Iliopoulos C.S.

autor

Lecroq T.

autor

Pinzon Y.J.

autor

Plandowski W.

autor

Rytter W.

LIFAR-ABISS, Faculté des Sciences et Techniques, Université de Rouen, 76821 Mont-Saint-Aignan CEDEX, France, mac@univ-mlv.fr

Bibliografia

[1] Aho, A. V., Corasick, M. J.: Efficient string matching: an aid to bibliographic search, Communications of the ACM, 18(6), 1975, 333-340.
[2] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J., Basic local alignment search tool, Journal of Molecular Biology, 215(3), 1990, 403-410.
[3] Baeza-Yates, R., Gonnet, G.: A new approach to text searching, Communications of the ACM, 35(10), 1992, 74-82.
[4] Boyer, R. S., Moore, J. S.: A fast string searching algorithm, Communications of the ACM, 20(10), 1977, 762-772.
[5] Cambouropoulos, E., Crawford, T., Iliopoulos, C. S.: Pattern Processing in Melodic Sequences: Challenges, Caveats and Prospects, Proc. of the AISB’99 Convention (Artificial Intelligence and Simulation of Behaviour), Edinburgh, U.K., 1999, 42-47.
[6] Cambouropoulos, E., Crochemore, M., Iliopoulos, C. S., Mouchard, L., Pinzon, Y. J.: Algorithms for computing approximate repetitions in musical sequences, Proc. 10th Australasian Workshop On Combinatorial Algorithms (R. Raman, J. Simpson, Eds.), Perth, WA, Australia, 1999, 129-144.
[7] Charras, C., Lecroq, T., Pehoushek, J. D.: A very fast string matching algorithm for small alphabets and long patterns, Proc. 9th Annual Symposium on Combinatorial Pattern Matching (M. Farach-Colton, Ed.), Piscataway, NJ, LNCS 1448, Springer-Verlag, Berlin, 1998, 55-64.
[8] Crawford, T., Iliopoulos, C. S., Raman, R.: String Matching Techniques for Musical Similarity and Melodic Recognition, Computing in Musicology, 11, 1998, 73-100.
[9] Crochemore, M., Czumaj, A., Ga¸sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding-up two string matching algorithms, Algorithmica, 12(4/5), 1994, 247-267.
[10] Crochemore, M., Iliopoulos, C. S., Lecroq, T., Pinzon, Y. J.: Approximate string matching in musical sequences, Proc. Prague Stringology Conference’01 (M. Bal´ık, M. Sim´anek, Eds.), Prague, Tcheque Republic, 2001, Annual Report DC-2001-06, 26-36.
[11] Fischetti, V. A., Landau, G. M., Schmidt, J. P., Sellers, P. H.: Identifying periodic occurrences of a template with applications to protein structure, Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, LNCS 644, 1992, 111-120.
[12] Fredkin, E.: Trie memory, Communications of the ACM, 3(9), 1960, 490-499.
[13] Hume, A., Sunday, D. M.: Fast string searching, Software-Practice and Experience, 21(11), 1991, 1221-1248.
[14] Karlin, S., Morris, M., Ghandour, G., Leung, M.-Y.: Efficient algorithms for molecular sequences analysis, Proc. Natl. Acad. Sci. USA, 85(3), 1988, 841-845.
[15] McGettrick, P.: MIDIMatch: Musical Pattern Matching in Real Time, MSc Dissertation, York University, U.K., 1997.
[16] Milosavljevic, A., Jurka, J.: Discovering simple DNA sequences by the algorithmic significance method, Computer Applications in Biosciences, 9(4), 1993, 407-411.
[17] Pevzner, P. A., Feldman, W.: Gray Code Masks for DNA Sequencing by Hybridization, Genomics, 23, 1993, 233-235.
[18] Rolland, P.-Y., Ganascia, J.-G.: Musical Pattern Extraction and Similarity Assessment, Readings in Music and Artificial Intelligence (E. R. Miranda, Ed.), Harwood Academic Publishers, 2000, 115-144.
[19] Schmidt, J. P.: All shortest paths in weighted grid graphs and its application to finding all approximate repeats in strings, SIAM Journal on Computing, 27(4), 1998, 972-992.
[20] Skiena, S. S., Sundaram, G.: Reconstructing strings from substrings, Journal of Computational Biology, 2, 1995, 333-353.
[21] Sunday, D. M.: A very fast substring search algorithm, Communications of the ACM, 33(8), 1990, 132-142.
[22] Wu, S., Manber, U.: Fast text searching allowing errors, Communications of the ACM, 35(10), 1992, 83-91.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0004-0121