Fusion of feature selection methods in gene recognition

Gil, Fabian; Osowski, Stanislaw

doi:10.24425/bpasts.2021.136748

Artykuł - szczegóły

Tytuł artykułu

Fusion of feature selection methods in gene recognition

Autorzy

Gil Fabian , Osowski Stanislaw

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/bpasts.2021.136748

Warianty tytułu

Języki publikacji

Abstrakty

The paper presents the fusion approach of different feature selection methods in pattern recognition problems. The following methods are examined: nearest component analysis, Fisher discriminant criterion, refiefF method, stepwise fit, Kolmogorov-Smirnov criteria, T2-test, Kruskall-Wallis test, feature correlation with class, and SVM recursive feature elimination. The sensitivity to the noisy data as well as the repeatability of the most important features are studied. Based on this study, the best selection methods are chosen and applied in the process of selection of the most important genes and gene sequences in a dataset of gene expression microarray in prostate and ovarian cancers. The results of their fusion are presented and discussed. The small selected set of such genes can be treated as biomarkers of cancer.

Słowa kluczowe

diagnostic features selection methods genes recognition biomarkers

funkcja diagnostyczna metody selekcji geny rozpoznanie biomarkery

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2021

Tom

Vol. 69, nr 3

Strony

art. no. e136748

Opis fizyczny

Bibliogr. 24 poz., rys., tab.

Twórcy

autor

Gil Fabian

Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland

autor

Osowski Stanislaw

sto@iem.pw.edu.pl

Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland
Military University of Technology, ul. gen. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland

Bibliografia

[1] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection”, J. Mach. Learn. Res. 3, 1158–1182 (2003).
[2] I. Guyon, A.J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using SVM”, Mach. Learn. 46, 389‒422 (2003).
[3] P.N. Tan, M. Steinbach, and V Kumar, Introduction to data mining, Boston, Pearson Education Inc., 2006.
[4] H. Chen, Y. Zhang, and I. Gutman, “A kernel-based clustering method for gene selection with gene expression data”, J. Biomed. Inf orm. 62, 12‒20 (2016).
[5] P. Das, A. Roychowdhury, S. Das, S. Roychoudhury, and S. Tripathy, “sigFeature: novel significant feature selection method for classification of gene expression data using support vector machine and t statistic”, Front. Genet. 11, 247 (2020), doi: 10.3389/fgene.2020.00247.
[6] A. Wiliński and S. Osowski, “Ensemble of data mining methods for gene ranking”, Bull. Pol. Acad. Sci. Tech. Sci. 60, 461‒471 (2012).
[7] H. Mitsubayashi, S. Aso, T. Nagashima, and Y. Okada, “Accurate and robust gene selection for disease classification using simple statistics, Biomed. Inf orm. 391, 68–71 (2008).
[8] J. Xu, Y. Wang, K. Xu, and T. Zhang, “Feature genes selection using fuzzy rough uncertainty metric for tumour diagnosis”, Comput. Math. Method Med. 2019, 6705648 (2019), doi: 10.1155/2019/6705648.
[9] B. Lyu and A. Haque, “Deep learning based tumour type classification using gene expression data”, bioRxiv, p. 364323 (2018), doi: 10.1101/364323.
[10] F. Yang, “Robust feature selection for microarray data based on multi criterion fusion”, IEEE Trans. Comput. Biol. Bioinf . 8(4), 1080–1092 (2011).
[11] M. Muszyński and S. Osowski, “Data mining methods for gene selection on the basis of gene expression arrays”, Int. J. .Appl. Math. Comput. Sci. 24(3), 657‒668 (2014).
[12] T. Latkowski and S. Osowski, “Data mining for feature selection in gene expression autism data”, Expert Syst. Appl. 42(2), 864‒872 (2015).
[13] Matlab user manual. Natick (USA): MathWorks: (2020).
[14] P. Sprent, and N.C. Smeeton, Applied Nonparametric Statistical Methods. Boca Raton, Chapman & Hall/CRC, 2007.
[15] R.O. Duda, P.E. Hart, and P. Stork, Pattern Classif ication and Scene Analysis, New York: Wiley, 2003.
[16] Exxact. [Online]. https://blog.exxactcorp.com/scikitlearn-vs-mlrfor-machine-learning/
[17] Tutorialspoint. [Online]. https://www.tutorialspoint.com/weka/weka_feature_selection.htm
[18] R. Robnik-Sikonja, and I. Kononenko, “Theoretical and empirical analysis of Relief ”, Mach. Learn. 53, 23‒69 (2003).
[19] W. Yang, K. Wang, and W. Zuo. “Neighborhood Component Feature Selection for High-Dimensional Data”, J. Comput. 7(1), 161‒168 (2012).
[20] L. Breiman, “Random forests”, Mach. Learn. 45, 5–32 (2001).
[21] NCBI database. [Online]. http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4431, (2011).
[22] http://discover1.mc.vanderbilt.edu/discover/public/mcsvm/
[23] http://sdmc.lit.org.sg/GEDatasets/Datasets.html
[24] F. Gil and S. Osowski, “Feature selection methods in gene recognition problem”, in Proc. on-line Conf erence Computatational Methods in Electrical Engineering, 2020, pp. 1‒4.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-fcb41302-90ad-45a7-9852-5f4df0768dbb