PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Assembly of repetitive regions using next-generation sequencing data

Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads to properly reconstruct repetitive sequences. The mathematical model for error-free input data shows the upper limits of accuracy of the results as a function of read coverage. For high coverage, the estimation error depends linearly on repetitive sequence length and inversely proportional to the sequencing coverage. The model depicts, the smaller de Bruijn graph dimensions, the more accurate assembly of long repetitive regions. The algorithm requires high read depth, provided by the next-generation sequencers and could use the existing data. The tests on errorless reads, generated in silico from several model genomes, pointed the properly reconstructed repetitive sequences, where existing assemblers fail. The C++ sources, the Python scripts and the additional data are available at http://dnaasm.sourceforge.net.
Twórcy
autor
  • Electronic Systems Institute, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
Bibliografia
  • [1] Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26(10):1135–45.
  • [2] Pagani I, Liolios K, Jansson J, Chen I-MA, Smirnova T, Nosrat B, et al. The genomes online database (gold) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2012;40(D1):D571–9.
  • [3] Pevzner PA, Tang H, Waterman MS. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A 2001;98(17):9748–53.
  • [4] Myers EW. The fragment assembly string graph. Bioinformatics 2005;21(Suppl. 2):ii79–85.
  • [5] Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics 2010;95(6):315–27.
  • [6] Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS ONE 2011;6(3):e17915.
  • [7] Earl D, Bradnam K, John JS, Darling A, Lin D, Fass J, et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 2011;21(12):2224–41.
  • [8] Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2013;2(1):1–31.
  • [9] Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, et al. Gage: a critical evaluation of genome assemblies and assembly algorithms. Genome Res 2012;22 (3):557–67.
  • [10] Kingsford C, Schatz MC, Pop M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinform 2010;11(1):21.
  • [11] Cox R, Mirkin SM. Characteristic enrichment of DNA repeats in different genomes. Proc Natl Acad Sci U S A 1997;94(10):5237–42.
  • [12] van Belkum A, Scherer S, van Alphen L, Verbrugh H. Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev 1998;62(2):275–93.
  • [13] Cao MD, Tasker E, Willadsen K, Imelfort M, Vishwanathan S, Sureshkumar S, et al. Inferring short tandem repeat variation from paired-end short reads. Nucleic Acids Res 2013;gkt1313.
  • [14] Xie C, Tammi MT. Cnv-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinform 2009;10(1):80.
  • [15] Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 2009;19(9):1586–92.
  • [16] Chaisson MJ, Pevzner PA. Short read fragment assembly of bacterial genomes. Genome Res 2008;18(2):324–30.
  • [17] Pevzner P, Tang H, Tesler G. De novo repeat classification and fragment assembly. Genome Res 2004;14 (9):1786–96.
  • [18] Cormen T, Leiserson C, Rivest R, Stein C. Introduction to algorithms. The MIT Press; 2001.
  • [19] Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ. ABySS: a parallel assembler for short read sequence data. Genome Res 2009;19(6):1117–23.
  • [20] Chevreux B, Wetter T, Suhai S. Genome sequence assembly using trace signals and additional sequence information. German Conference on Bioinformatics. 1999. pp. 45–56.
  • [21] Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008;18 (5):821–9.
  • [22] Ronen R, Boucher C, Chitsaz H, Pevzner P. Sequel: improving the accuracy of genome assemblies. Bioinformatics 2012;28(12):i188–96.
  • [23] Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, et al. Allpaths: de novo assembly of whole-genome shotgun microreads. Genome Res 2008;18(5):810–20.
  • [24] Piotrowski P, Nowak R. New tool to combine contigs by usage of paired-end tags. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2013. International Society for Optics and Photonics; 2013. p. 890318.
  • [25] Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J Comput Biol 2011;18(11):1625–34.
  • [26] Bresler M, Sheehan S, Chan AH, Song YS. Telescoper: de novo assembly of highly repetitive regions. Bioinformatics 2012;28(18):i311–7.
  • [27] Nowak RM. Polyglot programming the applications to analyze genetic data. BioMed Res Int 2014;2014:1–7.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-eb7055f1-9260-4ac8-9cbc-c70da803c45e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.