Regexpcount, a symbolic package for counting problems on regular expressions and words

Nicodeme, P.

Artykuł - szczegóły

Tytuł artykułu

Regexpcount, a symbolic package for counting problems on regular expressions and words

Autorzy

Nicodeme P.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In previous work [10], we considered algorithms related to the statistics of matches with words and regular expressions in texts generated by Bernoulli or Markov sources. In this work these algorithms are extended for two purposes: to determine the statistics of simultaneous counting of different motifs, and to compute the waiting time for the first match with a motif in a model which may be constrained. This extension also handles matches with errors. The package is fully implemented and gives access to high and low level commands. We also consider an example corresponding to a practical biological problem: getting the statistics for the number of matches of words of size 8 in a genome (a Markovian sequence), knowing that an (overrepresented DNA protecting) pattern named Chi occurs a given number of times.

Słowa kluczowe

marked automata generating functions computer algebra

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2003

Tom

Vol. 56, nr 1,2

Strony

71--88

Opis fizyczny

bibliogr. 19 poz.

Twórcy

autor

Nicodeme P.

LIX-Ecole polytechniguq, 911128 Palaiseau cedex, France, nicodeme@lix.polytechnique.fr

Bibliografia

[1] Bender, E. A., Kochman, F.: The Distribution of Subword Counts is Usually Normal, European Journal of Combinatorics, 14, 1993, 265-275.
[2] Berry, G., Sethi, R.: From regular expressions to deterministic automata, Theoretical Computer Science, 48, 1986, 117-126.
[3] Chomsky, N., Sch¨utzenberger, M. P.: The algebraic theory of context-free languages, Computer Programming and Formal Languages,, 1963, 118-161, P. Braffort and D. Hirschberg, eds, North Holland.
[4] Greene, D. H., Knuth, D. E.: Mathematics for the Analysis of Algorithms, Birkh¨auser, 1981.
[5] Hopcroft, J. E.: An # $%& # Algorithm for Minimizing States in a Finite Automaton, Theory of Machines and Computation, Kohavi ed., Academic Press, 1971.
[6] Kelley, D.: Automata and Formal Languages, an Introduction, Prentice Hall, 1995.
[7] Kozen, D. C.: Automata and Computability, Springer Verlag. Undergraduate texts in Computer Science, 1997.
[8] Nicod`eme, P.: Regexpcount, a symbolic package for counting problems on regular expressions and words, Proceedings of the German Conference on Bioinformatics GCB00, Heidelberg, 2000.
[9] Nicod`eme, P.: Fast Approximate Motif Statistics, J. Comp. Biol., 8(3), 2001, 235-248.
[10] Nicod`eme, P., Salvy, B., Flajolet, P.: Motif Statistics, Theoretical Computer Science, 287(2), 2002, 593-618, Extended version of an article published in the proceedings of 7th Annual European Symposium on Algorithms ESA’99, Prague, July 1999.
[11] Pevzner, P. A., Borodovski, M. Y., Mironov, A. A.: Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words, J. Biomol. Struct. Dyn, 6, 1989, 1013-1026.
[12] Prum, B., Rodolphe, F., de Turckheim, E.: Finding words with unexpected frequencies in deoxyribonucleic acid sequences, J. R. statist. Soc. B, 57(1), 1995, 205-220.
[13] R´egnier, M.: A unified approach to words statistics, Second Annual International Conference on Computational Molecular Biology,, ACM Press, New-York, 1998.
[14] R´egnier, M.: A Unified Approach to Word Occurrences Probabilities, Discrete Applied Mathematics, 104(1), 2000, 259-280, Special issue on Computational Biology.
[15] R´egnier, M., Szpankowski, W.: On Pattern Frequency Occurrences in a Markovian Sequence, Algorithmica, 22(4), 1998, 631-649.
[16] Reinert, G., Schbath, S.: Compound Poisson Approximations for Occurrences of Multiple Words in Markov Chains, J. Comp. Biol., 5(2), 1998, 223-253.
[17] Sewell, R. F., Durbin, R.: Method for calculation of probability of matching a bounded regular expression in a random data string, J. Comp. Biol., 2(1), 1995, 25-31.
[18] Sinha, S., Tompa, M.: A Statistical Method for Finding Transcription Factor Binding Sites, Eigth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, 2000.
[19] Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, Wiley-Interscience, 2001

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0004-0124