PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

On Computing Average Common Substring Over Run Length Encoded Sequences

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The Average Common Substring (ACS) is a popular alignment-free distance measure for phylogeny reconstruction. The ACS of a sequence X[1; x] w.r.t. another sequence Y[1; y] is ACS(X;Y) =[formula] The lcp(., .) of two input sequences is the length of their longest common prefix. The ACS can be computed in O(n) space and time, where n = x + y is the input size. The compressed string matching is the study of string matching problems with the following twist: the input data is in a compressed format and the underling task must be performed with little or no decompression. In this paper, we revisit the ACS problem under this paradigm where the input sequences are given in their run-length encoded format. We present an algorithm to compute ACS(X,Y) in O(N log N) time using O(N) space, where N is the total length of sequences after run-length encoding.
Słowa kluczowe
Wydawca
Rocznik
Strony
267--273
Opis fizyczny
Bibliogr. 19 poz.
Twórcy
autor
  • Department of Computer Science, University of Central Florida, Orlando, USA
autor
  • Department of Computer Science, University of Central Florida, Orlando, USA
autor
  • School of Computational Science & Engineering, Georgia Institute of Technology, Atlanta, USA
  • Department of Computer Science, University of Central Florida, Orlando, USA
Bibliografia
  • [1] Burstein, D., Ulitsky, I., Tuller, T., Chor, B.: Information theoretic approaches to whole genome phylogenies, Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2005. doi:10.1007/11415770_22.
  • [2] Aluru, S., Apostolico, A., Thankachan, S. V.: Efficient alignment free sequence comparison with bounded mismatches, Proceedings of the 19th Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2015. doi:10.1007/978-3-319-16706-0_1.
  • [3] Apostolico, A., Guerra, C., Landau, G. M., Pizzi, C.: Sequence similarity measures based on bounded hamming distance, Theoretical Computer Science, 638, 2016, 76-90. doi:10.1016/j.tcs.2016.01.023.
  • [4] Leimeister, C.-A., Morgenstern, B.: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison, Bioinformatics, 30(14), 2014, 2000-2008. doi:10.1093/bioinformatics/btu331.
  • [5] Manzini, G.: Longest Common Prefix with Mismatches, Proceedings of the 22nd International Symposium on String Processing and Information Retrieval (SPIRE), Springer, 2015. doi:10.1007/978-3-319-23826-5_29.
  • [6] Thankachan, S. V., Apostolico, A., Aluru, S.: A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem, Journal of Computational Biology, 23(6), 2016, 472-482. doi:10.1089/cmb.2015.0235.
  • [7] Thankachan, S. V., Chockalingam, S. P., Liu, Y., Apostolico, A., Aluru, S.: ALFRED: a practical method for alignment-free distance computation, Journal of Computational Biology, 23(6), 2016, 452-460. doi:10.1089/cmb.2015.0217.
  • [8] Thankachan, S. V., Chockalingam, S. P., Liu, Y., Krishnan, A., Aluru, S.: A greedy alignment-free distance estimator for phylogenetic inference, Proceedings of 5th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), 2015. doi:10.1109/ICCABS.2015.7344711.
  • [9] Thankachan, S. V., Aluru, C., Chockalingam, S. P., Aluru, S.: Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis, Proceedings of the 922nd Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2018. doi:10.1007/978-3-319-89929-9_14.
  • [10] Apostolico, A.: Maximal words in sequence comparisons based on subword composition, in: Algorithms and Applications, Springer, 2010, 34-44. doi:10.1007/978-3-642-12476-1_2.
  • [11] Bonham-Carter, O., Steele, J., Bastola, D.: Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis, Briefings in bioinformatics, 2013, bbt052. doi:10.1093/bib/bbt052.
  • [12] Chang, G., Wang, T.: Phylogenetic analysis of protein sequences based on distribution of length about common substring, The Protein Journal, 30(3), 2011, 167-172. doi:10.1007/s10930-011-9318-0.
  • [13] Comin, M., Verzotto, D.: Alignment-free phylogeny of whole genomes using underlying subwords, Algorithms for Molecular Biology, 7(1), 2012, 1. doi:10.1186/1748-7188-7-34.
  • [14] Domazet-Lošo, M., Haubold, B.: Efficient estimation of pairwise distances between genomes, Bioinformatics, 25(24), 2009, 3221-3227. doi:10.1093/bioinformatics/btp590.
  • [15] Guyon, F., Brochier-Armanet, C., Guénoche, A.: Comparison of alignment free string distances for complete genome phylogeny, Advances in Data Analysis and Classification, 3(2), 2009, 95-108. doi:10.1007/s11634-009-0041-z.
  • [16] Weiner, P.: Linear pattern matching algorithms, Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory (SWAT), 1973. doi:10.1109/SWAT.1973.13.
  • [17] McCreight, E. M.: A space-economical suffix tree construction algorithm, Journal of the ACM (JACM), 23(2), 1976, 262-272. doi:10.1145/321941.321946.
  • [18] Farach, M.: Optimal Suffix Tree Construction with Large Alphabets, 38th Annual Symposium on Foundations of Computer Science, FOCS ’97, Miami Beach, Florida, USA, October 19-22, 1997, 1997. doi:10.1109/SFCS.1997.646102.
  • [19] Bender, M. A., Farach-Colton, M.: The LCA Problem Revisited, LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, 2000. doi:10.1007/10719839_9.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6fbc1b0e-1dc6-4dc3-aa76-2e9b0bca3d1a
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.