Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Microarray analysis is widely used for cancer diagnosis and classification. However, among a large amount of genes in microarray data, only a small fraction of them is effective for making a highly reliable model. There are two major challenges in this regard: Thus, one of the challenging tasks is how to identify significant genes from thousands of them in datasets that can improve the generated model and the other one is how to select the subset of genes with minimum dependency to the samples in datasets which is termed as stability of selected sets. Different approaches have been presented in previous works. In this study, we propose a new algorithm for gene selection based on the phase diagram method which has been proposed earlier. Ridge logistic regression has been used to estimate the probability of genes that are most likely to belong to a set of stable genes with high classification capability. In order to consider the stability issue, a method is proposed for the final selection of selected sets. The B632+ error estimation method has been applied to evaluate the performance of the model. The proposed method was applied to four cancer datasets and obtained results are compared with other validation methods and the results show that the selected genes have superiority in terms of the number of genes, degree of stability and classification accuracy.
Słowa kluczowe
Wydawca
Czasopismo
Rocznik
Tom
Strony
965--976
Opis fizyczny
Bibliogr. 27 poz., rys., tab., wykr.
Twórcy
autor
- Electrical Engineering Faculty, Najafabad Branch, Islamic Azad University, Najafabad, Iran
autor
- Electrical Engineering Faculty, Najafabad Branch, Islamic Azad University, Najafabad, Iran; Digital Processing and Machine Vision Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran
Bibliografia
- [1] Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286(5439):531.
- [2] van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415:530.
- [3] Chang C-F, Wai K-M, Patterton HG. Calculating the statistical significance of physical clusters of co-regulated genes in the genome: the role of chromatin in domain-wide gene regulation. Nucleic Acids Res 2004;32 (5):1798–807.
- [4] Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machi Learn 2002;46(1):389–422.
- [5] Zhu J, Hastie T. Classification of gene microarrays by penalized logistic regression. Biostatistics 2004;5(3): 427–43.
- [6] Blazadonakis ME, Zervakis M, Kounelakis M, Biganzoli E, Lama N, editors. Support vector machines and neural networks as marker selectors for cancer gene analysis. 2006 3rd international IEEE conference intelligent systems. 2006. 2006/09/04/6.
- [7] Liu Y, editor. Cancer identification based on DNA microarray data. Berlin Heidelberg: Springer; 2007.
- [8] Mahmoodian H. Predicting the continuous values of breast cancer relapse time by type-2 fuzzy logic system. Australasian Phys Eng Sci Med 2012;35(2):193–204.
- [9] Cai R, Hao Z, Yang X, Wen W. An efficient gene selection algorithm based on mutual information. Neurocomputing 2009;72(4):991–9.
- [10] Liu X, Krishnan A, Mondry A. An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinform 2005;6(1):76.
- [11] Liang Y, Liu C, Luan X-Z, Leung K-S, Chan T-M, Xu Z-B, et al. Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinform 2013;14 (1):198.
- [12] Cawley GC, Talbot NLC. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 2006;22(19): 2348–55.
- [13] Liu Z, Jiang F, Tian G, Wang S, Sato F, Meltzer Stephen J, et al. Sparse logistic regression with Lp penalty for biomarker identification. Stat Appl Genet Mol Biol 2007;6(1).
- [14] Algamal ZY, Lee MH. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst Appl 2015;42(23):9326–32.
- [15] Li H-D, Xu Q-S, Liang Y-Z. A phase diagram for gene selection and disease classification. Chemometr Intell Lab Syst 2017;167:208–13.
- [16] Frank IE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics 1993;35 (2):109–35.
- [17] Kalousis A, Prados J, Hilario M. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge Inform Syst 2007;12(1):95–116.
- [18] Kuncheva LI, editor. A stability index for feature selection. Anaheim, CA, USA: ACTA Press; 2007.
- [19] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, et al. Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 2009;25(13):1662–8.
- [20] Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 1999;96(12):6745.
- [21] Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 2002;8:68.
- [22] Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002;1(2):203–9.
- [23] Zhang L, Zhou W, Wang B, Zhang Z, Li F. Applying 1-norm SVM with squared loss to gene selection for cancer classification. Appl Intell 2018;48(7):1878–90.
- [24] Chen Y, Nguyen DV. Identification of relevant genes from microarray experiments based on partial least squares weights: application to cancer genomics. In: Pham T, editor. Computational biology: issues and applications in oncology. New York, NY: Springer; 2010. p. 1–17.
- [25] Yu B, Zhang Y. The analysis of colon cancer gene expression profiles and the extraction of informative genes; 2013, 1097 p.
- [26] Bø T, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biol 2002;3 (4). RESEARCH0017.
- [27] Saha S, Seal DB, Ghosh A, Dey KN. A novel gene ranking method using Wilcoxon rank sum test and genetic algorithm. Int J Bioinform Res Appl 2016;12(3):263–79.
Uwagi
PL
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-171f6ab2-2c36-46d6-8d46-2a384acf1e15