Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Microarray data play critical role in cancer classification. However, with respect to the samples scarcity compared to intrinsic high dimensionality, most approaches fail to classify small subset of genes. Feature selection techniques can reduce the dimension of the problem, which can reduce computational cost of the microarray data classification. However, previous studies have shown that feature extraction methods can also be useful in improving the performance of data classification. In this paper, we propose an ensemble schema for cancer diagnosis and classification that has three stages. At first, a hybrid filter-based feature selection method using modified Bayesian logistic regression (BLogReg), Ttest and Fisher ratio is applied for selecting genes. In the second stage, selected genes are mapped via the proposed PSO-dICA method which is a modification of dICA. Finally, mapped features are classified using SVM classifier. To demonstrate the effectiveness of the proposed method, some traditional microarray data including Colon, Lung cancer, DLBCL, SRBCT, Leukemia-ALL and Prostate Tumor datasets are used. Experimental results show the efficiency and effectiveness of the proposed method.
Wydawca
Czasopismo
Rocznik
Tom
Strony
521--529
Opis fizyczny
Bibliogr. 57 poz., rys., tab.
Twórcy
autor
- Young Researchers and Elite Club, Mashhad Branch, Islamic Azad University, Mashhad, Iran
autor
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran
Bibliografia
- [1] Liu KH, Li B, Zhang J, Du J-X. Ensemble component selection for improving ICA based microarray data prediction models. Pattern Recogn 2009;42(7):1274–83.
- [2] Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005;27(8):1226–38.
- [3] Kar S, Das Sharma K, Maitra M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 2015;42(1):612–27.
- [4] Li S, Wu X, Tan M. Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 2008;12:1039–48.
- [5] Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002;46(1–3):389–422.
- [6] Guyon AI, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res 2003;3(1):1157–82.
- [7] Gan JQ, Hasan BAS, Tsui CSL. A filter-dominating hybrid sequential forward floating search method for feature subset selection in high-dimensional space. Int J Mach Learn Cybern 2012;3(4):1–8.
- [8] Hall M. Correlation-based feature selection for machine learning (PhD thesis), Citeseer; 1999.
- [9] Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. Machine Learning- International Workshop then Conference 2003;20:856–63.
- [10] Hall M, Smith L. Practical feature subset selection for machine learning. Comput Sci 1998;98:181–91.
- [11] Kira K, Rendell L. The feature selection problem: traditional methods and a new algorithm. Proceedings of the National Conference on Artificial Intelligence; 1992. p. 129–34.
- [12] Inza I, Sierra B, Blanco R, Larranaga P. Gene selection by sequential search wrapper approaches in microarray cancer class prediction. J Intell Fuzzy Syst 2002;12(1):25–33.
- [13] Sharma A, Imoto S, Miyano S. top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinformatics (TCBB) 2012;9(3):754–64.
- [14] Chuang L-Y, Chang H-W, Tu C-J, Yang C-H. Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 2008;32(1):29–38.
- [15] Wang G, Song Q, Xu B, Zhou Y. Selecting feature subset for high dimensional data via the propositional foil rules. Pattern Recogn 2013;46(1):199–214.
- [16] Canul-Reich J, Hall L, Goldgof D, Korecki J, Eschrich S. Iterative feature perturbation as a gene selector for microarray data. Int J Pattern Recogn Artif Intell 2012;26 (05):1–25.
- [17] Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A. An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 2012;45(1):531–9.
- [18] Mundra P, Rajapaks J. SVM-RFE with mRMR filter for gene selection.IEEE Trans. NanoBiosci 2010;9(1):31–7.
- [19] Lee C, Leu Y. A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 2011;11 (1):208–13.
- [20] Leung Y, Hung Y. A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinformatics (TCBB) 2010;7(1):108–17.
- [21] Zhao W, Wang G, Wang HB, Chen HL, Dong H, Zha ZD. A novel framework for gene selection. Int J Adv Comput Technol 2011;3(3):184–91.
- [22] Chen LF, Su CT, Chen KH, Wang PC. Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput Appl 2011;21(8):2087–96.
- [23] Kar S, Das Sharma K, Maitra M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 2015;42(1): 612–27.
- [24] Chen KH, Wang KJ, Wang KM, Angelia MA. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 2014;24(0):773–80.
- [25] Shen Q, Mei Z, Ye BX. Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification. Comput Biol Med 2009;39:646–9.
- [26] Alshamlan HM, Badr GH, Alohali YA. Genetic Bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 2015;56 (0):49–60.
- [27] Wang SL, Li X, Zhang S, Gui J, Huang DS. Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Comput Biol Med 2010;40:179–89.
- [28] Pinto da Costa JF, Alonso H, Roque L. A weighted principal component analysis and its application to gene expression data. Computational biology and bioinformatics. IEEE/ACM Trans 2011;8(1):246–52.
- [29] Hyvarinen A. Survey on independent component analysis. Neural Comput Surveys 1999;2:94–128.
- [30] Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Willy; 2001.
- [31] Hyvarinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw 2000;13:411–30.
- [32] Lotfi E, Keshavarz A. Gene expression microarray classification using PCA–BEL. Comput Biol Med 2014;54 (0):180–7.
- [33] Liu KH, Li B, Wu QQ, Zhang J, Du J, Liu GY. Microarray data classification based on ensemble independent component selection. Comput Biol Med 2009;39(11):953–60.
- [34] Fan L, Poh KL, Zhou PA. sequential feature extraction approach for naïve Bayes classification of microarray data. Exp Syst Appl 2009;36:9919–23.
- [35] Li B, Zheng CH, Huang DS, Zhang L, Han K. Gene expression data classification using locally linear discriminant embedding. Comput Biol Med 2010;40:802–10.
- [36] Long F, He J, Ye X, Zhuang Z, Li B. Discriminant independent component analysis as a subspace representation. J Electron (China) 2006;23(1):103–6.
- [37] Dhir CS, Lee SY. Discriminant independent component analysis. IEEE Trans Neural Netw 2011;22(6):845–57.
- [38] Dhir CS, Lee J, Lee SY. Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 2011;30(2):359–75.
- [39] Mollaee M, Moattar MH, Seyyed Mahdavi SJ. Feature extraction based on modified discriminant independent component analysis via particle swarm optimization. Int J Softw Engin Appl 2014;8(12):91–102.
- [40] Kennedy J, Eberhart RC. Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Network IV; 1995. p. 1942–8.
- [41] Kennedy J, Eberhart RC. A discrete binary version of the particle swarm algorithm. Proceedings of the 1997 Conference on System, Man, and Cybernetics, IEEE Service Center; 1997. p. 4104–9.
- [42] Zhang Y, Zhang Y. Fault detection of non-Gaussian processes based on modified independent component analysis. Chem Eng Sci 2010;65(16):4630–9.
- [43] Cawley GC, Nicola L, Talbot C. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 2006;22(19):2348–55.
- [44] Alon U, Barkai N, Notterman DA, Gishdagger K, Ybarradagger S, Mackdagger D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999;96(12):6745–50.
- [45] Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7(6):673–9.
- [46] Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002;1(2):203–9.
- [47] Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large b-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nat Med 2002;8(1):68–74.
- [48] Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, et al. .MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002;30(1):41–7.
- [49] Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 2001;98 (24):13790–5.
- [50] Yassi M, Moattar MH. Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification. Biochem Biophys Res Commun 2014;446:850–6.
- [51] Burges JC. A tutorial on support vector machines for pattern recognition. Boston: Kluwer Academic Publishers; 1999.
- [52] Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2001.
- [53] Schölkopf B, Smola A, Müller KR. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp 1998;10:1299–319.
- [54] Tipping ME, Bishop CM. Probabilistic principal component analysis. J R Stat Soc: Ser B (Statistical Methodology) 1999;61:611–22.
- [55] van der Maaten LJ, Postma EO, van den Herik HJ. Dimensionality reduction: a comparative review. J Mach Learn Res 2009;10:66–71.
- [56] Lawrence ND. The gaussian process latent variable model. ed: Technical report. University of Sheffield; 2006.
- [57] Chun-Guang L, Jun G. Supervised isomap with explicit mapping. In innovative computing. First International Conference on Information and Control, 2006. ICICIC'06. 2006. pp. 345–8.
Uwagi
PL
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6e4a0fc4-63d1-46b7-9c49-7a4909b0bcbb