Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
We study a central problem of string processing: the compact representation of a string by its frequently-occurring substrings. In this paper we propose an effective, easily-computed form of quasi-periodicity in strings, the frequency cover; that is, the longest of those repeating substrings u of w, u > 1, that occurs the maximum number of times in w. The advantage of this generalization is that it is not only applicable to all strings but also that it is the only generalized notion of cover yet proposed, which can be computed efficiently in linear time and space. We describe a simple data structure called the repeating substring frequency array (RSF array) for the string w, which we show can be constructed in O(n) time and O(n) space, where w = n. We then use RSF to compute all the frequency covers of w in linear time and space. Our research also allows us to give an alternate algorithm to compute all non-extendible repeating substrings in w, also in O(n) time and space.
Słowa kluczowe
Wydawca
Czasopismo
Rocznik
Tom
Strony
275--289
Opis fizyczny
Bibliogr. 16 poz., rys., tab.
Twórcy
autor
- Department of Computing and Software, McMaster University, Canada
autor
- Department of Computing and Software, McMaster University, Canada
- School of Engineering & Information Technology, Murdoch University, Perth, WA, Australia
Bibliografia
- [1] Apostolico A, Ehrenfeucht A. Efficient Detection of Quasi-periodicities in Strings, Technical Report 90.5, The Leonadro Fibonacci Institute, Trento, Italy, 1990.
- [2] Apostolico A, Ehrenfeucht A. Efficient Detection of Quasiperiodicities in Strings, Theoretical Computer Science, 1993.119(2):247-265. URL https://doi.org/10.1016/0304-3975(93)90159-Q.
- [3] Flouri T, Iliopoulos CS, Kociumaka T, Pissis SP, Puglisi SJ, Smyth WF, Tyczyński W. Enhanced string covering, Theoretical Computer Science, 2013506:102-114. URL https://doi.org/10.1016/j.tcs.2013.08.013.
- [4] Apostolico A, Farach M, Iliopoulos C. Optimal superprimitivity testing for strings, Inform. Process. Lett., 1991.39(1):17-20. URL https://doi.org/10.1016/0020-0190(91)90056-N.
- [5] Breslauer D. An on-line string superprimitivity test, Information Processing Letters, 1992;46(6):345-347. URL https://doi.org/10.1016/0020-0190(92)90111-8.
- [6] Moore D, Smyth WF. An optimal algorithm to compute all the covers, Information Processing Letters, 1994;50(5):239-246. doi:10.1016/0020-0190(94)00045-X.
- [7] Li Y, Smyth WF. Computing the cover array in linear time, Algorithmica, 2002;32(1):95-106. doi:10.1007/s00453-001-0062-2.
- [8] Alatabbi A, Islam ASMS, Rahman MS, Simpson J, Smyth WF. Enhanced Covers of Regular & Indeterminate Strings using Prefix Tables, Journal of Automata, Languages & Combinatorics, 2016;21(3):131-147. doi:10.25596/jalc-2016-131.
- [9] Kociumaka T, Radoszewski J, Rytter W, Pissis SP, Waleń T. Fast Algorithm for Partial Covers in Words, Algorithmica, 2015;73(1):217-233. doi:10.1007/s00453-014-9915-3.
- [10] Franek F, Islam ASMS, Rahman MS, Smyth WF. Algorithms to Compute the Lyndon Array, Proceedings of the Prague Stringology Conference 2016, 2016. ISBN-978-80-01-05996-8.
- [11] Goto K, Bannai H. Simpler and faster Lempel-Ziv factorization, Data Compression Conference, 2013 pp. 133-142. doi:10.1109/DCC.2013.21.
- [12] Karlin S, Ghandour G, Ost F, Tavare S, Korn LJ. New approaches for computer analysis of nucleic acid sequences, Proc. Natl. Acad. Sci. USA, 1983;80(18):5660-5664.
- [13] Franek F, Simpson R, Smyth WF. The maximum number of runs in a string, Proceeding of 14th Australian Workshop on Combinatorial Algorithms, 2003 pp. 26-35.
- [14] Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science And Computational Biology, Cambridge University Press, 1997. ISBN 0-521-58519-8(hc).
- [15] Smyth B. Computing Patterns in Strings, Pearson/Addison-Wesley, 2003. ISBN 9780201398397.
- [16] Puglisi SJ, Smyth WF, Yusufu M. Fast, optimal algorithms for computing all the repeats in a string, Mathematics in Computer Science, 2010;3(4):373-389. doi:10.1007/s11786-010-0033-6.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-d0ad8dc8-cc62-4080-b39d-c72af1ee69fc