G-MAPSEQ – a new method for mapping reads to a reference genome

Wojciechowski, P.; Frohmberg, W.; Kierzynka, M.; Zurkowski, P.; Blazewicz, J.

doi:10.1515/fcds-2016-0007

Artykuł - szczegóły

Tytuł artykułu

G-MAPSEQ – a new method for mapping reads to a reference genome

Autorzy

Wojciechowski P. , Frohmberg W. , Kierzynka M. , Zurkowski P. , Blazewicz J.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1515/fcds-2016-0007

Warianty tytułu

Języki publikacji

Abstrakty

The problem of reads mapping to a reference genome is one of the most essential problems in modern computational biology. The most popular algorithms used to solve this problem are based on the Burrows-Wheeler transform and the FM-index. However, this causes some issues with highly mutated sequences due to a limited number of mutations allowed. G-MAPSEQ is a novel, hybrid algorithm combining two interesting methods: alignment-free sequence comparison and an ultra fast sequence alignment. The former is a fast heuristic algorithm which uses k-mer characteristics of nucleotide sequences to ﬁnd potential mapping places. The latter is a very fast GPU implementation of sequence alignment used to verify the correctness of these mapping positions. The source code of G-MAPSEQ along with other bioinformatic software is available at: http://gpualign.cs.put.poznan.pl.

Słowa kluczowe

computational biology next generation sequencing parallel computing reads mapping

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2016

Tom

Vol. 41, No. 2

Strony

123--142

Opis fizyczny

Bibliogr. 22 poz., fig., tab.

Twórcy

autor

Wojciechowski P.

Pawel.Wojciechowski@cs.put.poznan.pl

Institute of Computing Science, Poznan University of Technology, Poland
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poland
European Center for Bioinformatics and Genomics, Poland

autor

Frohmberg W.

Institute of Computing Science, Poznan University of Technology, Poland
European Center for Bioinformatics and Genomics, Poland

autor

Kierzynka M.

Institute of Computing Science, Poznan University of Technology, Poland
European Center for Bioinformatics and Genomics, Poland
Poznan Supercomputing and Networking Center, Poland

autor

Zurkowski P.

Institute of Computing Science, Poznan University of Technology, Poland
European Center for Bioinformatics and Genomics, Poland

autor

Blazewicz J.

Institute of Computing Science, Poznan University of Technology, Poland
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poland
European Center for Bioinformatics and Genomics, Poland

Bibliografia

[1] Blazewicz J., Frohmberg W., Kierzynka M., Pesch E., Wojciechowski P., Protein alignment algorithms with an efficient backtracking routine on multiple GPUs, BMC Bioinformatics, 12, 181, 2011.
[2] Blazewicz J., Frohmberg W., Kierzynka M., Wojciechowski P., G-MSA – A GPU-based, fast and accurate algorithm for multiple sequence alignment, J. Parallel. Distr. Com., 73, 1, 2013, 32–41.
[3] Ferragina P., Manzini G., Opportunistic Data Structures with Applications, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.
[4] Fiannaca A., La Rosa M., Rizzo R., Urso A., A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network, Artificial intelligence in medicine, 64, 3, 2015, 173–184.
[5] Fonseca N.A., Rung J., Brazma A., Marioni J.C., Tools for mapping high throughput sequencing data, Bioinformatics, 28, 24, 2012, 3169–3177.
[6] Frohmberg W., Kierzynka M., Blazewicz J., Gawron P., Wojciechowski P., G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads, Bulletin of the Polish Academy of Sciences Technical Sciences, 61, 4, 2013, 989– 992.
[7] Holtgrewe M., Mason – a read simulator for second generation sequencing data, Technical Report Institut für Mathematik und Informatik, Freie Universität Berlin, TR-B-10-06, 2010.
[8] Holtgrewe M., Emde A.-K., Weese D., Reinert K., A Novel And Well-Deﬁned Benchmarking Method For Second Generation Read Mapping BMC Bioinformatics, 12, 210, 2011.
[9] Kierzynka M., GPU-accelerated graph construction for the whole genome assembly, Phd. thesis, Poznan University of Technology, Poznan, Poland, 2014.
[10] Kuksa P., Pavlovic V., Efficient alignment-free DNA barcode analytics, BMC bioinformatics, 10, 14, 2009, 1–18.
[11] Langmead B., Salzberg S.L., Fast gapped-read alignment with Bowtie 2, Nat Methods, 9, 4, 2013, 357–359.
[12] Langmead B., Trapnell C., Pop M., Salzberg S.L., Ultrafast and memory efficient alignment of short DNA sequences to the human genome, Genome Biology, 10, 3, 2009, 1–10.
[13] Liu Y., Schröder J., Schmidt B., Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data, Bioinformatics, 29, 3, 2013, 308–315.
[14] Needleman S.B., Wunsch C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 3, 1970, 443–453.
[15] Polychronopoulos D., Weitschek E., Dimitrieva S., Bucher P., Felici G., Almirantis Y., Classiﬁcation of selectively constrained dna elements using feature vectors and rule-based classifiers, Genomics, 104, 2, 2014, 79–86.
[16] Reinert G., Chew D., Sun F., and Waterman M.S., Alignment-free sequence comparison (I): statistics and power, Journal of Computational Biology, 16, 12, 2009, 1615-1634.
[17] Vinga S., Almeida J., Alignment-free sequence comparison - a review, Bioinformatics, 19, 4, 2003, 513-523.
[18] Wan L., Reinert G., Sun F., Waterman M.S., Alignment-free sequence comparison (II): theoretical power of comparison statistics, Journal of Computational Biology, 17, 11, 2010, 1467-1490.
[19] Weese D., Emde A.-K., Rausch T., Döring A., Reinert K., RazerS – fast read mapping with sensitivity control, Genome Research, 19, 2009, 1646-1654.
[20] Weese D., Holtgrewe M., Reinert K., RazerS 3: faster, fully sensitive read mapping, Bioinformatics, 28, 20, 2012, 2592-2599.
[21] Weitschek E., Cunial F., Felici G., LAF: Logic Alignment Free and its application to bacterial genomes classiﬁcation, BioData mining, 8, 1, 2015.
[22] Weitschek E., Santoni D., Fiscon G., De Cola M.C., Bertolazzi P., Felici G., Next generation sequencing reads comparison with an alignment-free distance, BMC research notes, 7, 1, 2014, 1–13.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-31194866-63dd-467b-b605-50c2ef008e79