The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Iliopoulos, C.S.; Makris, C.; Panagis, J.; Perdikuri, K.; Theodoridis, E.; Tsakalidis, A.

Artykuł - szczegóły

Tytuł artykułu

The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Autorzy

Iliopoulos C.S. , Makris C. , Panagis J. , Perdikuri K. , Theodoridis E. , Tsakalidis A.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.

Słowa kluczowe

molecular weighted sequences suffix tree pattern matching identifications of repetitions covers

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2006

Tom

Vol. 71, nr 2,3

Strony

259--277

Opis fizyczny

wykr., bibliogr. 36 poz.

Twórcy

autor

Iliopoulos C.S.

autor

Makris C.

autor

Panagis J.

autor

Perdikuri K.

autor

Theodoridis E.

autor

Tsakalidis A.

Department of Computer Science, King's College London, London WC2R 2LS.England, csi@dcs.kcl.ac.uk

Bibliografia

[1] Apostolico, A., Farach, M., Iliopoulos, C.: Optimal superprimitivity testing for strings, Information Processing Letters, 39, 1991, 17-20.
[2] Apostolico, A., Preparata, F.: Optimal off-line detection of repetitions in a string, Theoretical Computer Science, 22, 1983, 297-315.
[3] Arikat, F., Bakalis, A., Christodoulakis, M., Iliopoulos, C., Mouchard, L.: Experimental Results in Pattern Matching on Weighted Sequences, ICNAAM 2005, 2005.
[4] Brodal, G., Lyngso, R., Pedersen, C. S., Stoye, J.: Finding Maximal Pairs with Bounded Gap, 10th Symposium on Combinatorial Pattern Matching (CPM99), 1999.
[5] Celera: The Genome Sequence of Drosophila melanogaster, Science, 287, 2000, 2185-2195.
[6] Celera: The Sequence of the Human Genome, Science, 291, 2001, 1304-1351.
[7] Christodoulakis, M., Iliopoulos, C., Mouchard, L., Tsichlas, K.: Pattern Matching on Weighted Sequences, Algorithms and Computational Methods for Biochemical and Evolutionary Networks, 2004.
[8] Crochemore, M.: An Optimal Algorithm for Computing the Repetitions in a Word, Information Processing Letters, 12, 1981, 244-250.
[9] Grillo, G., Licciuli, F., Liuni, S., Sbisa, E., Pesole, G.: PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences, Nucleic Acids Res., 31, 2003, 3608-3612.
[10] Gupta, M., Liu, J.: Discovery of Conserved Sequence Patterns using a Stochastic Dictionary Model, J. Am. Stat. Assoc., 98, 2003, 55-66.
[11] Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.
[12] Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational Identification of cis-regulatory Elements Associated with Groups of Functionally Related Genes in sacharomyces cerevisae, J. Mol. Biol., 296, 2000, 1205-1214.
[13] Iliopoulos, C., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in aWeighted Sequence usingWeighted Suffix Trees, European Conference on Computational Biology (ECCB 2003), Posters' Track, 2003.
[14] Iliopoulos, C., Mouchard, L., Perdikuri, K., Tsakalidis, A.: Computing the repetitions in a weighted sequence, Prague Stringology Conference (PSC 2003), 2003.
[15] Iliopoulos, C. S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Efficient Algorithms For Handling Molecular Weighted Sequences, 3rd IFIP Conference on Theoretical Computer Science - IFIP-TCS04 (J.-J. Lévy, E. W. Mayr, J. C. Mitchell, Eds.), Kluwer, 2004.
[16] Iliopoulos, C. S., Perdikuri, K., Theodoridis, E., Tsakalidis, A. K., Tsichlas, K.: Motif Extraction from Weighted Sequences., String Processing and Information Retrieval, 11th International Conference (SPIRE), 2004.
[17] Jensen, L., Knudsen, S.: Automatic Discovery of Regulatory Patterns in Promoter Regions Based on Whole Cell Expression Data and Functional Annotation, Bioinformatics, 16, 2000, 326-333.
[18] Keles, S., van der Laan, M., Dudoit, S., Xing, B., Eisen, M.: Supervised Detection of Regulatory Motifs in DNA Sequences, Statistical Applications in Genetics and Molecular Biology, 2, 2003.
[19] Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time, FOCS99, 1999.
[20] Kurtz, S., Schleiermacher, C.: REPuter: fast computation of maximal repeats in complete genomes, Bioinformatics, 15, 1999, 426-427.
[21] Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment, Science, 262, 1993, 208-214.
[22] Li, H., Rhodius, V., Gross, C., Siggia, E.: Identification of the binding sites of regulatory proteins in bacterial genomes, Genetics, 99, 2002, 11772-11777.
[23] Liu, Y., Wei, L., Batzoglou, S., Brutlag, D., Liu, J., Liu, X.: A Suite of Web-Based Programs to Search for Transcriptional Regulatory Motifs, Nucleic Acids Research, 32, 2004.
[24] Marsan, L., Sagot, M.: Extracting Structured Motifs Using a Suffix Tree- Algorithms and Application to Promoter Concensus Identification, RECOMB, 2000.
[25] McCreight, E.: A space-economical suffix tree construction, Journal of the ACM, 23, 1976, 262-272.
[26] McGuire, A., Hughes, J., Church, G.: Conservation of DNA Regulatory Motifs and Discovery of New Motifs in Microbial Genomes, Nucleic Acids Research, 10, 2000, 744-757.
[27] Ron, D., Singer, Y., Tishby, N.: The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length, Machine Learning, 25, 1996, 117-149.
[28] Roth, D., Hughes, J., Esterp, P., Church, G.: Finding DNA Regulatory Motifs within Unaligned Noncoding Sequence Clustered by whole Genome MRNA Quantitation, Nature Biotechnology, 16, 1998, 939-945.
[29] Stoye, J., Gusfield, D.: Simple and flexible detection of contiguous repeats using a suffix tree, 9th Symposium on Combinatorial Pattern Matching, 1998.
[30] Sun, Z., Yang, J., Deogun, J.: MISAE: A New Approach for Regulatory Motif Extraction, Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004), 2004.
[31] Tadesse, M., Vannucci, M., Lio, P.: Identification of DNA Regulatory Motifs Using Bayesian Variable Selection Suite, Bioinformatics, 20, 2004, 2553-2561.
[32] Thompson,W., Lawrence, C.: Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites, Nucleic Acids Research, 31, 2003, 3580-3585.
[33] Tompa, M., Sinha, S.: A Statistical Method for Finding Transcription Factor Binding Sites, Proceedings Int. Conf. Intell. Syst. Mol. Biol., 2000.
[34] Tsunoda, T., Fukagawa, M., Takagi, T.: Time and memory efficient algorithm for extracting palindromic and repetitive subsequences, Pacific Symposium on Biocomputing, 1999.
[35] Ukkonen, E.: On-line construction of suffix trees, Algorithmica, 14, 1995, 249-260.
[36] Wang, H., Perng, C., Fan, W., Park, S., Yu, P.: IndexingWeighted-Sequences in Large Databases., International Conference on Data Engineering (ICDE), 2003.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0010-0038