Warianty tytułu
Języki publikacji
Abstrakty
Implementing a large genomic project is a demanding task, also from the computer science point of view. Besides collecting many genome samples and sequencing them, there is processing of a huge amount of data at every stage of their production and analysis. Efficient transfer and storage of the data is also an important issue. During the execution of such a project, there is a need to maintain work standards and control quality of the results, which can be difficult if a part of the work is carried out externally. Here, we describe our experience with such data quality analysis on a number of levels - from an obvious check of the quality of the results obtained, to examining consistency of the data at various stages of their processing, to verifying, as far as possible, their compatibility with the data describing the sample.
Słowa kluczowe
Rocznik
Tom
Strony
423--436
Opis fizyczny
Bibliogr. 24 poz., rys.
Twórcy
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- Laboratory of Genomics, Institute of Bioorganic Chemistry, Polish Academy of Sciences
autor
- Institute of Computing Science, Poznan University of Technology, Poland
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- Laboratory of Genomics, Institute of Bioorganic Chemistry, Polish Academy of Sciences
autor
- Institute of Computing Science, Poznan University of Technology, Poland
- Laboratory of Genomics, Institute of Bioorganic Chemistry, Polish Academy of Sciences
Bibliografia
- [1] Bai H., Guo X., Zhang D., et al. The genome of a Mongolian individual reveals the genetic imprints of Mongolians on modern human populations. Genome Biology and Evolution, 6(12):3122-3136, 2014.
- [2] Brittain H., Scott R., and Thomas E. The rise of the genome and personalised medicine. Clinical Medicine, 17(6):545-551, 2017.
- [3] Caulfield M., Davies J., Dennys M., et al. National genomic research library, 2020.
- [4] Chan T., Golub G., and Leveque R. Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician, 37(3):242-247, 1983.
- [5] Chen S., Zhou Y., Chen Y., et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17):i884-i890, 2018.
- [6] Cho Y., Kim H., Kim H., et al. An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes. Nature Communications, 7:13637, 2016.
- [7] Cibulskis K., McKenna A., Fennell T., et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics, 27(18):2601-2602, 2011.
- [8] Consortium T.G.P. A global reference for human genetic variation. Nature, 526(7571):68-74, 2015.
- [9] Danecek P., Bonfield J., Liddle J., et al. Twelve years of SAMtools and BCFtools. GigaScience, 10(2), 2021.
- [10] Durbin R., Altshuler D., Abecasis G., et al. A map of human genome variation from population-scale sequencing. Nature, 467(7319):1061-1073, 2010.
- [11] Fiévet A., Bernard V., Tenreiro H., et al. ART-DeCo: easy tool for detection and characterization of cross-contamination of DNA samples in diagnostic next-generation sequencing analysis. European Journal of Human Genetics, 27(5), 2019.
- [12] Fiorito G., Di Gaetano C., Guarrera S., et al. The Italian genome reflects the history of Europe and the Mediterranean basin. European Journal of Human Genetics, 24(7):1056-1062, 2016.
- [13] Guo J., Wu Y., Zhu Z., et al. Global genetic differentiation of complex traits shaped by natural selection in humans. Nature Communications, 9(1):1865, 2018.
- [14] Hehir-Kwa J., Marschall T., Kloosterman W., et al. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nature Communications, 7:12989, 2016.
- [15] Kehr B., Helgadottir A., and Melsted P. Diversity in non-repetitive human sequences not found in the reference genome. Nature Genetics, 49(4):588-593, 2017.
- [16] Li Q., Tian S., Yan B., et al. Building a Chinese pan-genome of 486 individuals. Communications Biology, 4(1):1016, 2021.
- [17] McDermott U. Next-generation sequencing and empowering personalised cancer medicine. Drug Discovery Today, 20(12):1470-1475, 2015.
- [18] Nagasaki M., Yasuda J., Katsuoka F., et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nature Communications, 6(1):8018, 2015.
- [19] Takayama J., Tadaka S., Yano K., et al. Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference. Nature Communications, 12(1):226, 2021.
- [20] Tishkoff S. and Kidd K. Implications of biogeography of human populations for ’race’ and medicine. Nature Genetics, 36(11):S21-S27, 2004.
- [21] Van der Auwera G. and O’Connor B. Genomics in the cloud : using Docker, GATK, and WDL in Terra. O’Reilly Media, Sebastopol, CA, first edition. edition, 2020.
- [22] Welford B. Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3):419-420, 1962.
- [23] Zhao S., Agafonov O., Azab A., et al. Accuracy and efficiency of germline variant calling pipelines for human genome data. Scientific Reports, 10(1):20222, 2020.
- [24] Zimani A., Peterlin B., and Kovanda A. Increasing genomic literacy through national genomic projects. Frontiers in Genetics, 12:693253, 2021.
Uwagi
PL
Badania wykonano w oparciu o grant nr POIR.04.02.00-30-A004/16
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-50157436-b62d-46d4-b0c9-3a4e3299f56b