Machine learning for the identification of the DNA variations for diseases diagnosis

Martyna, J.

Artykuł - szczegóły

Tytuł artykułu

Machine learning for the identification of the DNA variations for diseases diagnosis

Autorzy

Martyna J.

Identyfikatory

Warianty tytułu

Uczenie maszynowe dla identyfikacji zmian DNA do diagnozowania choroby

Języki publikacji

Abstrakty

In this paper we give an overview of a basic computational haplotype analysis, including the pairwaise association with the use of clustering, and tagged prediction (using Bayesian networks). Moreover, we present several machine learning methods in order to explore the association between human genetic variations and diseases. These methods include the clustering of SNPs based on some similarity measures and selecting of one SNP per cluster, the support vector machines, etc. The presented machine learning methods can help to generate a plausible hypothesis for some classification systems.

W pracy przedstawiono podstawowe metody uczenia maszynowego dla wyboru haplotypów, m.in. asocjacji par z użyciem klastrowania i przewidywania, znaczonego SNP (Single Nucleotide Polimorhisms), maszyny wektorów wspierających (ang. Support Vector Machines, SVM) itp. Metody te znajdują zastosowanie w przewidywaniu chorób. Mogą być także pomocne do generowania prawdopodobnych hipotez dla systemów klasyfikacji chorób.

Słowa kluczowe

haplotype computational haplotype analysis SNP selection

haplotyp obliczeniowa analiza haplotypów wybór SNP

Wydawca

Wydawnictwo Politechniki Śląskiej

Czasopismo

Studia Informatica

Rocznik

2011

Tom

Vol. 32, nr 3B

Strony

103--118

Opis fizyczny

Bibliogr. 33 poz.

Twórcy

autor

Martyna J.

Jagiellonian University, Institute of Computer Science, ul. Prof. S. Łojasiewicza 6, 30-348 Kraków, Poland, martyna@softlab.ii.uj.edu.pl

Bibliografia

1. Ao S. I., Yip K., Ng M., Cheung D., Fong P., Melhado L. Sham P. C.: CLUSTAG: Hierarchical Clustering and Graph Methods for Selecting Tag SNPs. Bioinformatics, Vol. 21, 2005. p. 1735-1736.
2. Bafna V., Halldörsson B. V., Schwartz R., Clark A. G., Istrail S.: Haplotypes and Informative SNP Selection Algorithms: Don't Block out Information, [in:] Proc. of the Seventh Int. Conf. on Computational Molecular Biology, 2003, p. 19-26.
3. Boser B. E., Guyon I. M., Vapnik V.: A Training Algorithm for Optimal Margin Classifiers. Fifth Annual Workshop on the Computational Learning Theory, ACM, 1992.
4. Byng M. C., Whittaker J. C., Cuthbert A. P., Mathew C. G., Lewis C. M.: SNP Subset Selection for Genetic Association Studies. Annals of Human Genetics, Vol. 67, 2003, p. 543-556.
5. Carlson C. S., Eberle M. A., Rieder M. J., Yi Q., Kruglyak L., Nickerson D. A.: Selecting a Maximally Informative Set of Single-nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium. American Journal of Human Genetics, Vol. 74, 2004, p. 106-120.
6. Cho J. H., Lee D., Park J. H., Lee I. B.: New Gene Selection Method for Classification of Cancer Subtypes Considering Within-Class Variation. FEBS Letters, Vol. 551, 2003, p. 3-7.
7. Cho J. H., Lee D., Park J. H., Lee L B : Gene Selection and Classification from Microarray Data Using Kernel Machine. FEBS Letters, Vol. 571, 2004, p. 93-98.
8. Daly M., Rioux J., Schaffner S., Hudson T., Lander E.: High-Resolution Haplotype Structure in the Human Genome. Nature Genetics, Vol. 29, 2001, p. 229-232.
9. Deb K. Reddy A. R.: Reliable Classification of Two-Class Cancer Using Evolutionary Algorithms. Biosystems, Vol. 72, 2003, p. 111-129.
10. Dempster A. P., Laird N. M., Rubin D. B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977, p. 1-38.
11. Deutsch J.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinformatics, Vol. 19, No. 1. 2003, p. 45-52.
12. Devlin B., Risch N.: A Comparison of Linkage Disequilibrium Measures for Fine Scale Mapping. Genomics, Vol. 29, 1995, p. 311-322.
13. Ding K., Zhou K., Zhang J., Knight J., Zhang X., Shen Y.: The Effect of Haplotype-Block Definitions on Inference of Haplotype-block Structure and htSNPs Selection. Molecular Biology and Evolution, Vol. 22, No. 1, 2005, p. 48-159.
14. Gusfield D., Orzack S. H.: Haplotype Inference. CRC Handbook in Bioinformatics, CRC Press, Boca Raton, 2005, p. 1-25.
15. Hastings W. K: Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika, Vol. 57, 1970, p. 97-109.
16. Hedrick P. W.: Genetics of Population. Jones and Bartlett Publishers, Sudbury 2004.
17. Huang H. L., Chang F. L.: ESVM: Evolutionary Support Vector Machine for Automatic Feature Selection and Classification of Microarray Data. Biosystems, Vol. 90, 2007, p. 516-528.
18. Jensen F.: Bayesian Networks and Decision Graphs. Springer-Verlag, New York, Berlin Heidelberg 1997.
19. Jorde L. B : Linkage Disequilibrium and the Search for Complex Disease Genes. Genome Research, Vol. 10, 2000, p. 1435-1444.
20. Keerthi S. S., Lin C. J.: Asymptotic Behaviour of Support Vector Machines with Gaussian Kernel. Neural Computing. Vol. 15, No. 7, 2003, p. 1667.
21. Lee K. E., Sha N., Dougherty E. R., Vannucci M., Mallick B. K: Gene Selection: A Bayesian Variable Selection Approach. Bioinformatics, Vol. 19, No. 1, 2003, p. 90-97.
22. Lee P. H.: Computational Haplotype Analysis: An Overview of Computational Methods in Genetic Variation Study. Technical Report 2006-512, Queen's University, 2006.
23. Lee P. H., Shatkay H.: BNTagger: Improved Tagging SNP Selection Using Bayesian Networks The 14th Annual Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), 2006.
24. Lee Y., Lee C. K.: Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data. Bioinformatics, Vol. 19, No. 1, 2003, p. 1132-1139.
25. Metropolis N., Rosenblum A. W., Rosenbluth M. N., Teller A. H ., Teller E.: Equation of State Calculation by Fast Computing Machines. Journal of Chemical Physics, Vol. 21, 1953. p. 1087-1091.
26. Nothnagel M.: The Definition of Multilocus Haplotype Blocks and Common Diseases. Ph D. Thesis, University of Berlin, 2004.
27. Phuong T. M., Liu Z., Altman R. B.: Choosing SNPs Using Feature Selection. Proc. of the IEEE Computational Systems Bioinformatics Conference, 2005, p. 301-309.
28. Schulze T. G., Zhang K., Chen Y., Akula N., Sun F., McMahonen F. J.: Defining Haplotype Blocks and Tag Single-nucleotide Polymorphisms in the Human Genome. Human Molecular Genetics, Vol. 13, No. 3, 2004, p. 335-342.
29. Sherry S. T., Ward M. H., Kholodov M. Baker J., Phan L., Smigielski E. M., Sirotkin K.: dbSNP: the NCBI Database of Genetic Variation Nucleic Acids Research, Vol. 29, 2001, p. 308-311.
30. Vapnik V.: Statistical Learning Theory. John Wiley and Sons, New York 1998.
31. Waddell M., Page D., Zhan F., Barlogie B., Shaughnessy J. Jr.: Predicting Cancer Susceptibility from Single-nucleotide Polymorphism Data: a Case Study in Multiple Myeloma Proc. of BIOKDD '05, Chicago, August 2005.
32. Wu X., Luke A., Rieder M., Lee K., Toth E. J., Nickerson D., Zhu X., Kan D., Cooper R. S.: An Association Study of Angiotensiongen Polymorphisms with Serum Level and Hypertension in an African-American Population. Journal of Hypertension, Vol. 21, No. 10, 2003, p. 1847-1852.
33. Yoonkyung L., Cheol-Koo L.: Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data, Bioinformatics, Vol. 19, No. 9, 2003, p. 1132.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSL2-0025-0086