We consider a version of pattern matching useful in processing large musical data: d-matching, which consists in finding matches which are d-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols a, b is measured as |a-b|. We also consider (d,g)-matching, where g is a bound on the total sum of the differences. We first consider ``occurrence heuristics'' by adapting exact string matching algorithms to the two notions of approximate string matching. The resulting algorithms are efficient in practice. Then we consider ``substring heuristics''. We present d-matching algorithms fast on the average providing that the pattern is ``non-flat'' and the alphabet interval is large. The pattern is ``flat'' if its structure does not vary substantially. The algorithms, named d-BM1, d-BM2 and d-BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only ``occurrence heuristics'' have been considered. Our substring heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use d-versions of suffix tries and subword graphs. Surprisingly, in the context of d-matching subword graphs appear to be superior compared with compact suffix trees.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.