PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Comparing Degenerate Strings

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Uncertain sequences are compact representations of sets of similar strings. They highlight common segments by collapsing them, and explicitly represent varying segments by listing all possible options. A generalized degenerate string (GD string) is a type of uncertain sequence. Formally, a GD string Ŝ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote by W the sum of these lengths k0, k1, . . . , k n-1. Our main result is an 𝒪(N + M)-time algorithm for deciding whether two GD strings of total sizes N and M, respectively, over an integer alphabet, have a non-empty intersection. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in linear space. We then apply our string comparison tool to devise a simple algorithm for computing all palindromes in Ŝ in 𝒪(min{W , n 2}N )-time. We complement this upper bound by showing a similar conditional lower bound for computing maximal palindromes in Ŝ. We also show that a result, which is essentially the same as our string comparison linear-time algorithm, can be obtained by employing an automata-based approach.
Wydawca
Rocznik
Strony
41--58
Opis fizyczny
Bibliogr. 37 poz., rys., tab.
Twórcy
autor
  • Department of Informatics, King's College London, UK
  • Department of Informatics, King's College London, UK
  • DISCo, University of Milano - Bicocca, Italy
  • Department of Computer Science, University of Pisa, Italy
  • Department of Informatics, King's College London, UK
  • Department of Computer Science, University of Pisa, Italy
  • Life Sciences and Health research group, CWI Amsterdam, the Netherlands
  • Department of Computer Science, University of Pisa, Italy
Bibliografia
  • [1] Soldano H, Viari A, Champesme M. Searching for flexible repeated patterns using a non-transitive similarity relation. Pattern Recognition Letters, 1995. 16(3):233-246. doi:10.1016/0167-8655(94)00095-K.
  • [2] Pisanti N, Soldano H, Carpentier M. Incremental Inference of Relational Motifs with a Degenerate Alphabet. In: 16th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 3537 of Springer LNCS. 2005 pp. 229-240. doi:10.1007/11496656_20.
  • [3] Pisanti N, Soldano H, Carpentier M, Pothier J. A Relational Extension of the Notion of Motifs: Application to the Common 3D Protein Substructures Searching Problem. Journal of Computational Biology, 2009. 16(12):1635-1660. doi:10.1089/cmb.2008.0019.
  • [4] Abrahamson K. Generalized String Matching. SIAM Journal of Computing, 1987. 16(6):1039-1051. doi:10.1137/0216067.
  • [5] Crochemore M, Iliopoulos CS, Kociumaka T, Radoszewski J, Rytter W, Walen T. Covering problems for partial words and for indeterminate strings. Theoretical Computer Science, 2017. 698:25-39. doi:10.1016/j.tcs.2017.05.026.
  • [6] Iliopoulos CS, Radoszewski J. Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties. In: 27th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 54 of LIPIcs. 2016 pp. 8:1-8:12. doi:10.4230/LIPIcs.CPM.2016.8.
  • [7] Consortium TCPG. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 2016. pp. 1-18. doi:10.1093/bib/bbw089.
  • [8] Iliopoulos CS, Kundu R, Pissis SP. Efficient Pattern Matching in Elastic-Degenerate Texts. In: 11th International Conference on Language and Automata Theory and Applications (LATA), volume 10168 of Springer LNCS. 2017 pp. 131-142. doi:10.1007/978-3-319-53733-7_9.
  • [9] Myers EW. Approximate Matching of Network Expressions with Spacers. Journal of Computational Biology, 1996. 3(1):33-51. doi:10.1089/cmb.1996.3.33.
  • [10] Grossi R, Iliopoulos CS, Liu C, Pisanti N, Pissis SP, Retha A, Rosone G, Vayani F, Versari L. On-line pattern matching on similar texts. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 78 of LIPIcs. 2017 pp. 9:1-9:14. doi:10.4230/LIPIcs.CPM.2017.9.
  • [11] Aoyama K, Nakashima Y, Inenaga S, Bannai H, Takeda M, et al. Faster online elastic degenerate string matching. In: 29th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 105 of LIPIcs. 2018 pp. 9:1-9:10. doi:10.4230/LIPIcs.CPM.2018.9.
  • [12] Pissis SP, Retha A. Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line. In: 17th International Symposium on Experimental Algorithms (SEA), volume 103 of LIPIcs. 2018 pp. 16:1-16:14. doi:10.4230/LIPIcs.SEA.2018.16.
  • [13] Bernardini G, Gawrychowski P, Pisanti N, Pissis SP, Rosone G. Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. In: 46th International Colloquium on Automata, Languages, and Programming (ICALP), volume 132 of LIPIcs. 2019 pp. 21:1-21:15. doi:10.4230/LIPIcs.ICALP.2019.21.
  • [14] Bernardini G, Pisanti N, Pissis SP, Rosone G. Pattern Matching on Elastic-Degenerate Text with Errors. In: 24th International Symposium on String Processing and Information Retrieval (SPIRE), volume 10508 of Springer LNCS. 2017 pp. 74-90. doi:10.1007/978-3-319-67428-5_7.
  • [15] Bernardini G, Pisanti N, Pissis S, Rosone G. Approximate pattern matching on elastic-degenerate text. Theoretical Computer Science, 2020. 812:109-122. doi:10.1016/j.tcs.2019.08.012.
  • [16] Alzamel M, Ayad L, Bernardini G, Grossi R, Iliopoulos C, Pisanti N, Pissis S, Rosone G. Degenerate string comparison and applications. In: 18th International Workshop on Algorithms in Bioinformatics (WABI), volume 113 of LIPIcs. 2018 pp. 21:1-21:14. doi:10.4230/LIPIcs.WABI.2018.21.
  • [17] Manacher G. A New Linear-Time “On-Line” Algorithm for Finding the Smallest Initial Palindrome of a String. Journal of the ACM, 1975. 22(3):346-351. doi:10.1145/321892.321896.
  • [18] Apostolico A, Breslauer D, Galil Z. Parallel detection of all palindromes in a string. Theoretical Computer Science, 1995. 141(1):163-173. doi:10.1016/0304-3975(94)00083-U.
  • [19] Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, USA, 1997. ISBN: 0 521 58519 8.
  • [20] Lipton RJ. On The Intersection of Finite Automata, pp. 145-148. Springer US, Boston, MA, 2010. doi:10.1007/978-1-4419-7155-5_31.
  • [21] Impagliazzo R, Paturi R. On the Complexity of k-SAT. Journal of Computer and System Science, 2001. 62(2):367-375. doi:10.1006/jcss.2000.1727.
  • [22] Impagliazzo R, Paturi R, Zane F. Which Problems Have Strongly Exponential Complexity? Journal of Computer and System Science, 2001. 63(4):512-530. doi:10.1006/jcss.2001.1774.
  • [23] Farach M. Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1997 pp. 137-143. doi:10.1109/SFCS.1997.646102.
  • [24] Bender MA, Farach-Colton M. The LCA Problem Revisited. In: 9th Latin American Symposium (LATIN), volume 1776 of LNCS. Springer Berlin Heidelberg, 2000 pp. 88-94. doi:10.1007/10719839_9.
  • [25] Alatabbi A, Iliopoulos CS, Rahman MS. Maximal Palindromic Factorization. In: Prague Stringology Conference (PSC). 2013 pp. 70-77. url: http://www.stringology.org/event/2013/p07.html.
  • [26] Fici G, Gagie T, Kärkkäinen J, Kempa D. A Subquadratic Algorithm for Minimum Palindromic Factorization. Journal of Discrete Algorithms, 2014. 28:41-48. doi:10.1016/j.jda.2014.08.001.
  • [27] Rubinchik M, Shur AM. EERTREE: An Efficient Data Structure for Processing Palindromes in Strings. In: 27th International Workshop on Combinatorial Algorithms (IWOCA), volume 9538 of Springer LNCS, pp. 321-333. 2015. doi:10.1007/978-3-319-29516-9_27.
  • [28] Borozdin K, Kosolobov D, Rubinchik M, Shur AM. Palindromic Length in Linear Time. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 78 of LIPIcs. 2017 pp. 23:1-23:12. doi:10.4230/LIPIcs.CPM.2017.23.
  • [29] Alzamel M, Gao J, Iliopoulos CS, Liu C, Pissis SP. Efficient Computation of Palindromes in Sequences with Uncertainties. In: 18th Engineering Applications of Neural Networks (EANN), volume 744 of Springer CCIS, pp. 620-629. 2017. doi:10.1007/978-3-319-65172-9_52.
  • [30] Adamczyk M, Alzamel M, Charalampopoulos P, Iliopoulos CS, Radoszewski J. Palindromic Decompositions with Gaps and Errors. In: The 12th International Computer Science Symposium in Russia (CSR), volume 10304 of Springer LNCS. 2017 pp. 48-61. doi:10.1007/978-3-319-58747-9_7.
  • [31] I T, Sugimoto S, Inenaga S, Bannai H, Takeda M. Computing Palindromic Factorizations and Palindromic Covers On-line. In: 25th Symposium on Combinatorial Pattern Matching (CPM), volume 8486 of Springer LNCS. 2014 pp. 150-161. doi:10.1007/978-3-319-07566-2_16.
  • [32] Gally JA, Edelman GM. The Genetic Control of Immunoglobulin Synthesis. Annual Review of Genetics, 1972. 6(1):1-46. doi:10.1146/annurev.ge.06.120172.000245.
  • [33] Schuh RT. Major Patterns in Vertebrate Evolution. Systematic Biology, 1978. 27(2):172. doi:10.2307/2412989.
  • [34] IUPAC-IUB Commission on Biochemical Nomenclature. Abbreviations and symbols for nucleic acids, polynucleotides, and their constituents. Biochemistry, 1970. 9(20):4022-4027. doi:10.1111/j.1432-1033.1970.tb00995.x.
  • [35] Wuilmart C, Urbain J, Givol D. On the location of palindromes in immunoglobulin genes. Proceedings of the National Academy of Sciences of the United States of America, 1977. 74(6):2526-2530. doi:10.1073/pnas.74.6.2526.
  • [36] Gao J, Impagliazzo R. Orthogonal Vectors is hard for first-order properties on sparse graphs. Electronic Colloquium on Computational Complexity (ECCC), 2016. 23:53. url: http://eccc.hpiweb.de/report/2016/053.
  • [37] Williams R. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci, 2005. 348(2-3):357-365. doi:10.1016/j.tcs.2005.09.023.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-61ee7295-e936-48bf-86a7-6ff43512d253
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.