EA-MOSGWA : a tool for identifying associated SNPs in Genome Wide Association Studies

Gola, A.; Bogdan, M.; Frommlet, F.

Artykuł - szczegóły

Tytuł artykułu

EA-MOSGWA : a tool for identifying associated SNPs in Genome Wide Association Studies

Autorzy

Gola A. , Bogdan M. , Frommlet F.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

EA-MOSGWÀ : narzędzie do identyfikacji przyczynowych SNPów w badaniach asocjacyjnych całego genomu

Języki publikacji

Abstrakty

This paper presents the current stage of the development of EA-MOSGWA - a tool for identifying causal genes in Genome Wide Association Studies (GWAS). The main goal of GWAS is to identify chromosomal regions which are associated with a particular disease (e.g. diabetes, cancer) or with some quantitative trait (e.g height or blood pressure). To this end hundreds of thousands of Single Nucleotide Polymorphisms (SNP) are genotyped. One is then interested to identify as many SNPs as possible which are associated with the trait in question, while at the same lime minimizing the number of false detections. The software package MOSGWA allows to detect SNPs via variable selection using the criterion mBIC2, a modified version of the Schwarz Bayesian Information Criterion. MOSGWA tries to minimize mBIC2 using some stepwise selection methods, whereas EA-MOSGWA applies some advanced evolutionary algorithms to achieve the same goal. We present results from an extensive simulation study where we compare the performance of EA-MOSGWA when using different parameter settings. We also consider using a clustering procedure to relax the multiple testing correction in mBlC2. Finally we compare results from EA-MOSGWA with the original stepwise search from MOSGWA, and show that the newly proposed algorithm has good properties in terms of minimizing the mBIC2 criterion, as well as in minimizing the misclassification rate of detected SNPs.

W artykule przedstawiony jest aktualny stan rozwoju programu EA-MOSGWA - narzędzia służącego do identyfikacji przyczynowych genów w badaniach asocjacyjnych całego genomu (ang. Genome Wide Association Studies, GWAS). Głównym celem tych badań jest określenie tych rejonów chromosomu, które są związane z występowaniem chorób genetycznych (np. cukrzyca, rak) lub wpływają na daną cechę (np. wysokość lub ciśnienie krwi). Sprowadzają się one do przebadania wielu tysięcy polimorfizmów pojedynczego nukleotydu (ang. Single Nucleotide Polymorphisme SNP) i powiązaniu ich (pojedynczych lub grupy SNPów) z przypadkami klinicznymi oraz możliwymi do zmierzenia cechami. Kluczową kwestią jest zidentyfikowanie jak największej liczby przyczynowych SNPów przy jednoczesnej minimalizacji fałszywych odkryć. Program MOSGWA umożliwia detekcje SNPów poprzez wybór zmiennych z użyciem kryterium mBIC2 - zmodyfikowanej wersji Bayesowskiego kryterium informacyjnego Schwarza. MOSGWA stara się zminimalizować mBIC2 przy pomocy metody selekcji Stepwise, podczas gdy EA-MOSGWA wykorzystuje w tym cclu zmodyfikowaną wersję algorytmu ewolucyjnego. W artykule prezentujemy wyniki szeroko zakrojonych badań symulacyjnych, w których możemy porównać wydajność EA-MOSGWA przy użyciu różnych ustawień parametrów. Również bierzemy pod uwagę klasteryzację SNPów, aby złagodzić korekcje wielokrotnego testowania w metodzie mBIC2. Przedstawiamy także porównanie wyników otrzymanych przez EA-MOSGWA z wynikami metody Stepsiwe używanej w programie MOSGWA, aby pokazać że proponowana metoda ma dobre właściwości minimalizacji kryterium mBIC2 oraz minimalizacji wskaźnika fałszywych detekcji.

Słowa kluczowe

evolutionary algorithm Genome Wide Association linear regression

Wydawca

Instytut Informatyki Teoretycznej i Stosowanej Polskiej Akademii Nauk

Czasopismo

Theoretical and Applied Informatics

Rocznik

2013

Tom

Vol. 25, No. 3-4

Strony

251--262

Opis fizyczny

Bibliogr. 10 poz., rys.

Twórcy

autor

Gola A.

Department of Mathematics and Computer Science, Jan Długosz University in Częstochowa, Poland

autor

Bogdan M.

Department of Mathematics and Computer Science, Wrocław University of Technology, Poland

autor

Frommlet F.

Department of Medical Statistics, Medical University Vienna, Austria

Bibliografia

[1] F Begum, D Ghosh, G. C Tseng, and E Fcingold. Comprehensive literature review and statistical considerations for gwas meta-analysis. Nucleic Acids Res,, 40(9):3777-3784, 2012.
[2] F Frommlet. Tag snp selection based on clustering according to dominant sets found using replicator dynamics. Adv. in Data Anal, and Classif, 4:65-83, 2010.
[3] F Frommlet, I Ljubic, H Arnardottir, and M Bogdan. Qtl mapping using a memelic algorithm with modifications of bic as fitness function. Statistical Applications in Genetics and Molecular Biology, 11(4): Article 2, 2012.
[4] F Frommlet, F Ruhaltinger, P Twarog, and M Bogdan. Modified versions of Bayesian information criterion for genome-wide association studies. CSDA, 56:1038-1051,2012.
[5] D. E Goldberg. Algorytmy genetyczne i ich zastosowania. Wydawnictwa Naukowo - Techniczne, Warszawa, 2003.
[6] Y Guan and M Stephens. Bayesian variable selection regression for genome-wide association studies, and other large-scale problems. Ann. Appl. Stat., 5:1780-1815, 2011.
[7] Q He and D Lin. A variable selection method for genome-wide association studies. Bioinformatics, 27:1-8, 2011.
[8] Zb Michalewicz. Algorytmy genetyczne + struktury danych = programy ewolucyjne. Wydawnictwa Naukowo - Techniczne, Warszawa, 2004.
[9] M R Nelson, K Bryc, K S King, A Indap, A R Boyko, J Novembre, L P Briley, Y Mamyama, G Waterworth, D M amd Waeber, P Vollenweider, J R Oksenberg, S L Hauser, H A Stirnadel, J S Kooner, J C Chambers, B Jones, V Mooser, C D Bustamante, A D Roses, D K Bums, M G Ehm, and E H Lai. The population reference sample, popres: a resource for population, disease, and pharmacological genetics research. Am. J.Hum. Genet. 83:347-358, 2008.
[10] T. T. Wu, Y F Chen, T Hastie, E Sobel, and K Lange. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25:714-721, 2011.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-1a3bd858-cb05-4fd8-a740-3bf724ebfdc7