Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The problem of reads mapping to a reference genome is one of the most essential problems in modern computational biology. The most popular algorithms used to solve this problem are based on the Burrows-Wheeler transform and the FM-index. However, this causes some issues with highly mutated sequences due to a limited number of mutations allowed. G-MAPSEQ is a novel, hybrid algorithm combining two interesting methods: alignment-free sequence comparison and an ultra fast sequence alignment. The former is a fast heuristic algorithm which uses k-mer characteristics of nucleotide sequences to find potential mapping places. The latter is a very fast GPU implementation of sequence alignment used to verify the correctness of these mapping positions. The source code of G-MAPSEQ along with other bioinformatic software is available at: http://gpualign.cs.put.poznan.pl.
Rocznik
Tom
Strony
123--142
Opis fizyczny
Bibliogr. 22 poz., fig., tab.
Twórcy
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poland
- European Center for Bioinformatics and Genomics, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- European Center for Bioinformatics and Genomics, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- European Center for Bioinformatics and Genomics, Poland
- Poznan Supercomputing and Networking Center, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- European Center for Bioinformatics and Genomics, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poland
- European Center for Bioinformatics and Genomics, Poland
Bibliografia
- [1] Blazewicz J., Frohmberg W., Kierzynka M., Pesch E., Wojciechowski P., Protein alignment algorithms with an efficient backtracking routine on multiple GPUs, BMC Bioinformatics, 12, 181, 2011.
- [2] Blazewicz J., Frohmberg W., Kierzynka M., Wojciechowski P., G-MSA – A GPU-based, fast and accurate algorithm for multiple sequence alignment, J. Parallel. Distr. Com., 73, 1, 2013, 32–41.
- [3] Ferragina P., Manzini G., Opportunistic Data Structures with Applications, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.
- [4] Fiannaca A., La Rosa M., Rizzo R., Urso A., A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network, Artificial intelligence in medicine, 64, 3, 2015, 173–184.
- [5] Fonseca N.A., Rung J., Brazma A., Marioni J.C., Tools for mapping high throughput sequencing data, Bioinformatics, 28, 24, 2012, 3169–3177.
- [6] Frohmberg W., Kierzynka M., Blazewicz J., Gawron P., Wojciechowski P., G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads, Bulletin of the Polish Academy of Sciences Technical Sciences, 61, 4, 2013, 989– 992.
- [7] Holtgrewe M., Mason – a read simulator for second generation sequencing data, Technical Report Institut für Mathematik und Informatik, Freie Universität Berlin, TR-B-10-06, 2010.
- [8] Holtgrewe M., Emde A.-K., Weese D., Reinert K., A Novel And Well-Defined Benchmarking Method For Second Generation Read Mapping BMC Bioinformatics, 12, 210, 2011.
- [9] Kierzynka M., GPU-accelerated graph construction for the whole genome assembly, Phd. thesis, Poznan University of Technology, Poznan, Poland, 2014.
- [10] Kuksa P., Pavlovic V., Efficient alignment-free DNA barcode analytics, BMC bioinformatics, 10, 14, 2009, 1–18.
- [11] Langmead B., Salzberg S.L., Fast gapped-read alignment with Bowtie 2, Nat Methods, 9, 4, 2013, 357–359.
- [12] Langmead B., Trapnell C., Pop M., Salzberg S.L., Ultrafast and memory efficient alignment of short DNA sequences to the human genome, Genome Biology, 10, 3, 2009, 1–10.
- [13] Liu Y., Schröder J., Schmidt B., Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data, Bioinformatics, 29, 3, 2013, 308–315.
- [14] Needleman S.B., Wunsch C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 3, 1970, 443–453.
- [15] Polychronopoulos D., Weitschek E., Dimitrieva S., Bucher P., Felici G., Almirantis Y., Classification of selectively constrained dna elements using feature vectors and rule-based classifiers, Genomics, 104, 2, 2014, 79–86.
- [16] Reinert G., Chew D., Sun F., and Waterman M.S., Alignment-free sequence comparison (I): statistics and power, Journal of Computational Biology, 16, 12, 2009, 1615-1634.
- [17] Vinga S., Almeida J., Alignment-free sequence comparison - a review, Bioinformatics, 19, 4, 2003, 513-523.
- [18] Wan L., Reinert G., Sun F., Waterman M.S., Alignment-free sequence comparison (II): theoretical power of comparison statistics, Journal of Computational Biology, 17, 11, 2010, 1467-1490.
- [19] Weese D., Emde A.-K., Rausch T., Döring A., Reinert K., RazerS – fast read mapping with sensitivity control, Genome Research, 19, 2009, 1646-1654.
- [20] Weese D., Holtgrewe M., Reinert K., RazerS 3: faster, fully sensitive read mapping, Bioinformatics, 28, 20, 2012, 2592-2599.
- [21] Weitschek E., Cunial F., Felici G., LAF: Logic Alignment Free and its application to bacterial genomes classification, BioData mining, 8, 1, 2015.
- [22] Weitschek E., Santoni D., Fiscon G., De Cola M.C., Bertolazzi P., Felici G., Next generation sequencing reads comparison with an alignment-free distance, BMC research notes, 7, 1, 2014, 1–13.
Uwagi
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-31194866-63dd-467b-b605-50c2ef008e79