Ensemble of data mining methods for gene ranking

Wiliński, A.; Osowski, S.

Artykuł - szczegóły

Tytuł artykułu

Ensemble of data mining methods for gene ranking

Autorzy

Wiliński A. , Osowski S.

Treść / Zawartość

Pełne teksty:

httpbpasts_czasopisma_pan_plimagesdatabpastswydaniano3september201209ensembleofdataminingmethodsforgeneranking.pdf

Pobierz

Identyfikatory

Języki publikacji

Abstrakty

The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.

Słowa kluczowe

gene expression array feature selection gene ranking methods classification SVM

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2012

Tom

Vol. 60, nr 3

Strony

461--470

Opis fizyczny

Bibliogr. 16 poz., rys., tab.

Twórcy

autor

Wiliński A.

autor

Osowski S.

University of Life Sciences, Faculty of Applied Informatics and Mathematics, 159 Nowoursynowska St., 02-776 Warszawa, Poland

Bibliografia

[1] R.O. Duda, P.E. Hart, and P. Stork, Pattern Classification andScene Analysis, Wiley, New York, 2003.
[2] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring”, Science 286, 531-537 (1999).
[3] I. Guyon, A.J.Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using SVM”, Machine Learning 46, 389-422 (2002).
[4] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection”, J. Machine Learning Research 3, 1158-1182 (2003).
[5] T.M. Huang and V. Kecman, “Gene extraction for cancer diagnosis by support vector machines - An improvement”, ArtificialIntelligence in Medicine 35, 185-194 (2005).
[6] X. Huang and W. Pan, “Linear regression and two-class classification with gene expression data”, Bioinformatics 19, 2072- 2078 (2003).
[7] B. Schölkopf and A. Smola, Learning with Kernels, MIT Press, Cambridge, 2002.
[8] Schurmann, Pattern Classification, a Unified View of Statisticaland Neural Approaches, Wiley, New York, 1996.
[9] P. Sprent and N.C. Smeeton, Applied Nonparametric StatisticalMethods, Boca Raton: Chapman & Hall/CRC, London, 2007.
[10] J.P. Vert, “Kernel methods in genomics and computational biology”, in Kernel Methods in Bioengineering, Signal and ImageProcessing, eds. G. Camps-Vals, J.L. Rojo-Alvarez, and M. Martinez-Ramon, pp. 42-64, Idea Group, London, 2007.
[11] X. Wang and O. Gotoh, “A robust gene selection method for microarray-based cancer classification”, Cancer Informatics 9, 15-30 (2010).
[12] A. Wiliński, “Selected exploration methods of diagnostic features in analysis of gene expression activity”, PhD Dissertation, Warsaw University of Technology, Warsaw, 2007.
[13] A. Wiliński and S. Osowski, “Gene selection for cancer classification”, COMPEL 28, 231-241 (2009).
[14] Matlab User Manual - Statistics Toolbox, MathWorks, Natick, 1999.
[15] http://discover1.mc.vanderbilt.edu/discover/public/mcsvm.
[16] http://datam.ir2.a-star.edu.sg/datasets/krbd/.

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG8-0096-0009