PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.
Wydawca
Rocznik
Strony
259--277
Opis fizyczny
wykr., bibliogr. 36 poz.
Twórcy
autor
autor
autor
  • Department of Computer Science, King's College London, London WC2R 2LS.England, csi@dcs.kcl.ac.uk
Bibliografia
  • [1] Apostolico, A., Farach, M., Iliopoulos, C.: Optimal superprimitivity testing for strings, Information Processing Letters, 39, 1991, 17-20.
  • [2] Apostolico, A., Preparata, F.: Optimal off-line detection of repetitions in a string, Theoretical Computer Science, 22, 1983, 297-315.
  • [3] Arikat, F., Bakalis, A., Christodoulakis, M., Iliopoulos, C., Mouchard, L.: Experimental Results in Pattern Matching on Weighted Sequences, ICNAAM 2005, 2005.
  • [4] Brodal, G., Lyngso, R., Pedersen, C. S., Stoye, J.: Finding Maximal Pairs with Bounded Gap, 10th Symposium on Combinatorial Pattern Matching (CPM99), 1999.
  • [5] Celera: The Genome Sequence of Drosophila melanogaster, Science, 287, 2000, 2185-2195.
  • [6] Celera: The Sequence of the Human Genome, Science, 291, 2001, 1304-1351.
  • [7] Christodoulakis, M., Iliopoulos, C., Mouchard, L., Tsichlas, K.: Pattern Matching on Weighted Sequences, Algorithms and Computational Methods for Biochemical and Evolutionary Networks, 2004.
  • [8] Crochemore, M.: An Optimal Algorithm for Computing the Repetitions in a Word, Information Processing Letters, 12, 1981, 244-250.
  • [9] Grillo, G., Licciuli, F., Liuni, S., Sbisa, E., Pesole, G.: PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences, Nucleic Acids Res., 31, 2003, 3608-3612.
  • [10] Gupta, M., Liu, J.: Discovery of Conserved Sequence Patterns using a Stochastic Dictionary Model, J. Am. Stat. Assoc., 98, 2003, 55-66.
  • [11] Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.
  • [12] Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational Identification of cis-regulatory Elements Associated with Groups of Functionally Related Genes in sacharomyces cerevisae, J. Mol. Biol., 296, 2000, 1205-1214.
  • [13] Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in aWeighted Sequence usingWeighted Suffix Trees, European Conference on Computational Biology (ECCB 2003), Posters' Track, 2003.
  • [14] Iliopoulos, C., Mouchard, L., Perdikuri, K., Tsakalidis, A.: Computing the repetitions in a weighted sequence, Prague Stringology Conference (PSC 2003), 2003.
  • [15] Iliopoulos, C. S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Efficient Algorithms For Handling Molecular Weighted Sequences, 3rd IFIP Conference on Theoretical Computer Science - IFIP-TCS04 (J.-J. Lévy, E. W. Mayr, J. C. Mitchell, Eds.), Kluwer, 2004.
  • [16] Iliopoulos, C. S., Perdikuri, K., Theodoridis, E., Tsakalidis, A. K., Tsichlas, K.: Motif Extraction from Weighted Sequences., String Processing and Information Retrieval, 11th International Conference (SPIRE), 2004.
  • [17] Jensen, L., Knudsen, S.: Automatic Discovery of Regulatory Patterns in Promoter Regions Based on Whole Cell Expression Data and Functional Annotation, Bioinformatics, 16, 2000, 326-333.
  • [18] Keles, S., van der Laan, M., Dudoit, S., Xing, B., Eisen, M.: Supervised Detection of Regulatory Motifs in DNA Sequences, Statistical Applications in Genetics and Molecular Biology, 2, 2003.
  • [19] Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time, FOCS99, 1999.
  • [20] Kurtz, S., Schleiermacher, C.: REPuter: fast computation of maximal repeats in complete genomes, Bioinformatics, 15, 1999, 426-427.
  • [21] Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment, Science, 262, 1993, 208-214.
  • [22] Li, H., Rhodius, V., Gross, C., Siggia, E.: Identification of the binding sites of regulatory proteins in bacterial genomes, Genetics, 99, 2002, 11772-11777.
  • [23] Liu, Y., Wei, L., Batzoglou, S., Brutlag, D., Liu, J., Liu, X.: A Suite of Web-Based Programs to Search for Transcriptional Regulatory Motifs, Nucleic Acids Research, 32, 2004.
  • [24] Marsan, L., Sagot, M.: Extracting Structured Motifs Using a Suffix Tree- Algorithms and Application to Promoter Concensus Identification, RECOMB, 2000.
  • [25] McCreight, E.: A space-economical suffix tree construction, Journal of the ACM, 23, 1976, 262-272.
  • [26] McGuire, A., Hughes, J., Church, G.: Conservation of DNA Regulatory Motifs and Discovery of New Motifs in Microbial Genomes, Nucleic Acids Research, 10, 2000, 744-757.
  • [27] Ron, D., Singer, Y., Tishby, N.: The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length, Machine Learning, 25, 1996, 117-149.
  • [28] Roth, D., Hughes, J., Esterp, P., Church, G.: Finding DNA Regulatory Motifs within Unaligned Noncoding Sequence Clustered by whole Genome MRNA Quantitation, Nature Biotechnology, 16, 1998, 939-945.
  • [29] Stoye, J., Gusfield, D.: Simple and flexible detection of contiguous repeats using a suffix tree, 9th Symposium on Combinatorial Pattern Matching, 1998.
  • [30] Sun, Z., Yang, J., Deogun, J.: MISAE: A New Approach for Regulatory Motif Extraction, Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004), 2004.
  • [31] Tadesse, M., Vannucci, M., Lio, P.: Identification of DNA Regulatory Motifs Using Bayesian Variable Selection Suite, Bioinformatics, 20, 2004, 2553-2561.
  • [32] Thompson,W., Lawrence, C.: Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites, Nucleic Acids Research, 31, 2003, 3580-3585.
  • [33] Tompa, M., Sinha, S.: A Statistical Method for Finding Transcription Factor Binding Sites, Proceedings Int. Conf. Intell. Syst. Mol. Biol., 2000.
  • [34] Tsunoda, T., Fukagawa, M., Takagi, T.: Time and memory efficient algorithm for extracting palindromic and repetitive subsequences, Pacific Symposium on Biocomputing, 1999.
  • [35] Ukkonen, E.: On-line construction of suffix trees, Algorithmica, 14, 1995, 249-260.
  • [36] Wang, H., Perng, C., Fan, W., Park, S., Yu, P.: IndexingWeighted-Sequences in Large Databases., International Conference on Data Engineering (ICDE), 2003.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0010-0038
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.