Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The paper presents the fusion approach of different feature selection methods in pattern recognition problems. The following methods are examined: nearest component analysis, Fisher discriminant criterion, refiefF method, stepwise fit, Kolmogorov-Smirnov criteria, T2-test, Kruskall-Wallis test, feature correlation with class, and SVM recursive feature elimination. The sensitivity to the noisy data as well as the repeatability of the most important features are studied. Based on this study, the best selection methods are chosen and applied in the process of selection of the most important genes and gene sequences in a dataset of gene expression microarray in prostate and ovarian cancers. The results of their fusion are presented and discussed. The small selected set of such genes can be treated as biomarkers of cancer.
Rocznik
Tom
Strony
art. no. e136748
Opis fizyczny
Bibliogr. 24 poz., rys., tab.
Twórcy
autor
- Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland
autor
- Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland
- Military University of Technology, ul. gen. Sylwestra Kaliskiego 2, 00-908 Warsaw, Poland
Bibliografia
- [1] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection”, J. Mach. Learn. Res. 3, 1158–1182 (2003).
- [2] I. Guyon, A.J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using SVM”, Mach. Learn. 46, 389‒422 (2003).
- [3] P.N. Tan, M. Steinbach, and V Kumar, Introduction to data mining, Boston, Pearson Education Inc., 2006.
- [4] H. Chen, Y. Zhang, and I. Gutman, “A kernel-based clustering method for gene selection with gene expression data”, J. Biomed. Inf orm. 62, 12‒20 (2016).
- [5] P. Das, A. Roychowdhury, S. Das, S. Roychoudhury, and S. Tripathy, “sigFeature: novel significant feature selection method for classification of gene expression data using support vector machine and t statistic”, Front. Genet. 11, 247 (2020), doi: 10.3389/fgene.2020.00247.
- [6] A. Wiliński and S. Osowski, “Ensemble of data mining methods for gene ranking”, Bull. Pol. Acad. Sci. Tech. Sci. 60, 461‒471 (2012).
- [7] H. Mitsubayashi, S. Aso, T. Nagashima, and Y. Okada, “Accurate and robust gene selection for disease classification using simple statistics, Biomed. Inf orm. 391, 68–71 (2008).
- [8] J. Xu, Y. Wang, K. Xu, and T. Zhang, “Feature genes selection using fuzzy rough uncertainty metric for tumour diagnosis”, Comput. Math. Method Med. 2019, 6705648 (2019), doi: 10.1155/2019/6705648.
- [9] B. Lyu and A. Haque, “Deep learning based tumour type classification using gene expression data”, bioRxiv, p. 364323 (2018), doi: 10.1101/364323.
- [10] F. Yang, “Robust feature selection for microarray data based on multi criterion fusion”, IEEE Trans. Comput. Biol. Bioinf . 8(4), 1080–1092 (2011).
- [11] M. Muszyński and S. Osowski, “Data mining methods for gene selection on the basis of gene expression arrays”, Int. J. .Appl. Math. Comput. Sci. 24(3), 657‒668 (2014).
- [12] T. Latkowski and S. Osowski, “Data mining for feature selection in gene expression autism data”, Expert Syst. Appl. 42(2), 864‒872 (2015).
- [13] Matlab user manual. Natick (USA): MathWorks: (2020).
- [14] P. Sprent, and N.C. Smeeton, Applied Nonparametric Statistical Methods. Boca Raton, Chapman & Hall/CRC, 2007.
- [15] R.O. Duda, P.E. Hart, and P. Stork, Pattern Classif ication and Scene Analysis, New York: Wiley, 2003.
- [16] Exxact. [Online]. https://blog.exxactcorp.com/scikitlearn-vs-mlrfor-machine-learning/
- [17] Tutorialspoint. [Online]. https://www.tutorialspoint.com/weka/weka_feature_selection.htm
- [18] R. Robnik-Sikonja, and I. Kononenko, “Theoretical and empirical analysis of Relief ”, Mach. Learn. 53, 23‒69 (2003).
- [19] W. Yang, K. Wang, and W. Zuo. “Neighborhood Component Feature Selection for High-Dimensional Data”, J. Comput. 7(1), 161‒168 (2012).
- [20] L. Breiman, “Random forests”, Mach. Learn. 45, 5–32 (2001).
- [21] NCBI database. [Online]. http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4431, (2011).
- [22] http://discover1.mc.vanderbilt.edu/discover/public/mcsvm/
- [23] http://sdmc.lit.org.sg/GEDatasets/Datasets.html
- [24] F. Gil and S. Osowski, “Feature selection methods in gene recognition problem”, in Proc. on-line Conf erence Computatational Methods in Electrical Engineering, 2020, pp. 1‒4.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-fcb41302-90ad-45a7-9852-5f4df0768dbb