PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Regexpcount, a symbolic package for counting problems on regular expressions and words

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In previous work [10], we considered algorithms related to the statistics of matches with words and regular expressions in texts generated by Bernoulli or Markov sources. In this work these algorithms are extended for two purposes: to determine the statistics of simultaneous counting of different motifs, and to compute the waiting time for the first match with a motif in a model which may be constrained. This extension also handles matches with errors. The package is fully implemented and gives access to high and low level commands. We also consider an example corresponding to a practical biological problem: getting the statistics for the number of matches of words of size 8 in a genome (a Markovian sequence), knowing that an (overrepresented DNA protecting) pattern named Chi occurs a given number of times.
Wydawca
Rocznik
Strony
71--88
Opis fizyczny
bibliogr. 19 poz.
Twórcy
autor
Bibliografia
  • [1] Bender, E. A., Kochman, F.: The Distribution of Subword Counts is Usually Normal, European Journal of Combinatorics, 14, 1993, 265-275.
  • [2] Berry, G., Sethi, R.: From regular expressions to deterministic automata, Theoretical Computer Science, 48, 1986, 117-126.
  • [3] Chomsky, N., Sch¨utzenberger, M. P.: The algebraic theory of context-free languages, Computer Programming and Formal Languages,, 1963, 118-161, P. Braffort and D. Hirschberg, eds, North Holland.
  • [4] Greene, D. H., Knuth, D. E.: Mathematics for the Analysis of Algorithms, Birkh¨auser, 1981.
  • [5] Hopcroft, J. E.: An # $%& # Algorithm for Minimizing States in a Finite Automaton, Theory of Machines and Computation, Kohavi ed., Academic Press, 1971.
  • [6] Kelley, D.: Automata and Formal Languages, an Introduction, Prentice Hall, 1995.
  • [7] Kozen, D. C.: Automata and Computability, Springer Verlag. Undergraduate texts in Computer Science, 1997.
  • [8] Nicod`eme, P.: Regexpcount, a symbolic package for counting problems on regular expressions and words, Proceedings of the German Conference on Bioinformatics GCB00, Heidelberg, 2000.
  • [9] Nicod`eme, P.: Fast Approximate Motif Statistics, J. Comp. Biol., 8(3), 2001, 235-248.
  • [10] Nicod`eme, P., Salvy, B., Flajolet, P.: Motif Statistics, Theoretical Computer Science, 287(2), 2002, 593-618, Extended version of an article published in the proceedings of 7th Annual European Symposium on Algorithms ESA’99, Prague, July 1999.
  • [11] Pevzner, P. A., Borodovski, M. Y., Mironov, A. A.: Linguistic of nucleotide sequences: The significance of deviation from mean statistical characteristics and prediction of the frequencies of occurrence of words, J. Biomol. Struct. Dyn, 6, 1989, 1013-1026.
  • [12] Prum, B., Rodolphe, F., de Turckheim, E.: Finding words with unexpected frequencies in deoxyribonucleic acid sequences, J. R. statist. Soc. B, 57(1), 1995, 205-220.
  • [13] R´egnier, M.: A unified approach to words statistics, Second Annual International Conference on Computational Molecular Biology,, ACM Press, New-York, 1998.
  • [14] R´egnier, M.: A Unified Approach to Word Occurrences Probabilities, Discrete Applied Mathematics, 104(1), 2000, 259-280, Special issue on Computational Biology.
  • [15] R´egnier, M., Szpankowski, W.: On Pattern Frequency Occurrences in a Markovian Sequence, Algorithmica, 22(4), 1998, 631-649.
  • [16] Reinert, G., Schbath, S.: Compound Poisson Approximations for Occurrences of Multiple Words in Markov Chains, J. Comp. Biol., 5(2), 1998, 223-253.
  • [17] Sewell, R. F., Durbin, R.: Method for calculation of probability of matching a bounded regular expression in a random data string, J. Comp. Biol., 2(1), 1995, 25-31.
  • [18] Sinha, S., Tompa, M.: A Statistical Method for Finding Transcription Factor Binding Sites, Eigth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, 2000.
  • [19] Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, Wiley-Interscience, 2001
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0004-0124
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.