Minimum Unique Substrings and Maximum Repeats

Ilie, L.; Smyth, W.F.

Artykuł - szczegóły

Tytuł artykułu

Minimum Unique Substrings and Maximum Repeats

Autorzy

Ilie L. , Smyth W.F.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Unique substrings appear scattered in the stringology literature and have important applications in bioinformatics. In this paper we initiate a study of minimum unique substrings in a given string; that is, substrings that occur exactly once while all their substrings are repeats. We discover a strong duality between minimum unique substrings and maximum repeats which, in particular, allows fast computation of one from the other. We give several optimal algorithms, some of which are very simple and efficient. Their combinatorial properties are investigated and a number of open problems are proposed.

Słowa kluczowe

unique substrings maximum repeats

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2011

Tom

Vol. 110, nr 1-4

Strony

183--195

Opis fizyczny

Bibliogr. 28 poz.

Twórcy

autor

Ilie L.

autor

Smyth W.F.

Department of Computer Science, University ofWestern Ontario, London ON N6A 5B7, Canada, ilie@csd.uwo.ca

Bibliografia

[1] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch, Replacing suffix trees with enhanced suffix arrays, J. Discrete Algorithms 2 (2004) 53-86.
[2] A. Apostolico and F. P. Preparata, Optimal off-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) 297-315.
[3] A. Bakalis, C. S. Iliopoulos, C. Makris, S. Sioutas, E. Theodoridis, A. K. Tsakalidis and K. Tsichlas, Locating maximum multirepeats in multiple strings under various constraints, The Computer Journal 50-2 (2007) 178-185.
[4] G. S. Brodal, R. B. Lyngsø, C. N. S. Pedersen and J. Stoye, Finding maximum pairs with bounded gap, J. Discrete Algorithms 1 (2000) 77-103.
[5] J. Craig Venter et al., The Sequence of the Human Genome, Science 291 (5507) (2001) 1304-1351.
[6] N. G. de Bruijn, A combinatorial problem, Proc. Konin. Neder. Akad. Wet. 49 (1946) 758-764.
[7] M. Crochemore, An optimal algorithm for computing all the repetitions in a word, Inform. Process. Lett. 12-5 (1981) 244-248.
[8] M. Crochemore, C. S. Iliopoulos, M. Mohamed and M.-F. Sagot, Longest repeats with a block of k don't cares, Theoret. Comput. Sci. 362-1 (2006) 248-254.
[9] F. Franek, W. F. Smyth and Y. Tang, Computing all repeats using suffix arrays, J. Automata, Languages and Combinatorics 8-4 (2003) 579-591.
[10] S. Gräf, F. G. G. Nielsen, S. Kurtz, M. A. Huynen, E. Birney, H. Stunnenberg and P. Flicek, Optimized design and assessment of whole genome tiling arrays, Bioinformatics 23-13 (2007) i195-i204.
[11] D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press (1997) 534 pp.
[12] B. Haubold, N. Pierstorff, F. Möller and T. Wiehe, Genome comparison without alignment using shortest unique substrings, BMC Bioinformatics 6 (2005) 123-133.
[13] C. S. Iliopoulos,W. F. Smyth and M. Yusufu, Faster algorithms for computing maximum multirepeats in multiple sequences, Fundamenta Informaticae 97-3 (2009) 311-320.
[14] T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park, Linear-time longest-common-prefix computation in suffix arrays and its applications, Proc. 12th Annual Symp. Combinatorial Pattern Matching, LNCS 2089 (2001) 181-192.
[15] E. S. Lander et al., Initial sequencing and analysis of the human genome, Nature 409 (2001) 860-921.
[16] A. Lefebvre and T. Lecroq, A heuristic for computing repeats with a factor oracle: application to biological sequences, Internat. J. Comput. Math. 79-12 (2002) 1303-1315.
[17] M. G.Main and R. J. Lorentz, An O(n log n) algorithmfor finding all repetitions in a string, J. Algorithms 5 (1984) 422-432.
[18] U. Manber and G. W. Myers, Suffix arrays: a new method for on-line string searches, Proc. First Annual ACM-SIAM Symp. Discrete Algs. (1990) 319-327.
[19] K. Narisawa, S. Inenaga, H. Bannai andM. Takeda, Efficient computation of substring equivalence classes with suffix arrays, Proc. 18th Annual Symp. Combinatorial Pattern Matching, LNCS 4580 (2007) 340-351.
[20] G. Nong, S. Zhang and W. H. Chan, Linear time suffix array construction using D-critical substrings, Proc. 20th Annual Symp. Combinatorial Pattern Matching, LNCS 5577 (2009) 54-67.
[21] S. J. Puglisi, W. F. Smyth and A. H. Turpin, A taxonomy of suffix array construction algorithms, ACM Computing Surveys 39-2 (2007) 1-31.
[22] S. J. Puglisi, W. F. Smyth and M. Yusufu, Fast, practical algorithms for computing all the repeats in a string, Math. Comput. Sci. 3 (2010) 373-389.
[23] S. J. Puglisi and A. H. Turpin, Space-time tradeoffs for longest-common-prefix array computation, Proc. 19th Internat. Symp. Algs. and Computation, LNCS 5369 (2008) 124-135.
[24] S. Rahmann, Fast large scale oligonucleotide selection using the longest common factor approach, J. Bioinformatics and Computational Biology 1-2 (2003) 343-361.
[25] B. Smyth, Computing Patterns in Strings, Pearson Addison-Wesley (2003) 423 pp.
[26] A. Thue, ¨Uber unendliche zeichenreihen, Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana 7 (1906) 1-22.
[27] P. Weiner, Linear pattern matching algorithms, Proc. 14th Annual IEEE Symp. Switching and Automata Theory (1973) 1-11.
[28] J. Zheng, T. J. Close, T. Jiang and S. Lonardi, Efficient selection of unique and popular oligos for large EST databases Proc. 14th Annual Symp. Combinatorial Pattern Matching, LNCS 2676 (2003) 273-283.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0020-0073