Classification methods for high-dimensional genetic data

Kalina, J.

doi:10.1016/j.bbe.2013.09.007

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Classification methods for high-dimensional genetic data

Autorzy

Kalina J.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1016/j.bbe.2013.09.007

Warianty tytułu

Języki publikacji

Abstrakty

Standard methods of multivariate statistics fail in the analysis of high-dimensional data. This paper gives an overview of recent classification methods proposed for the analysis of high-dimensional data, especially in the context of molecular genetics. We discuss methods of both biostatistics and data mining based on various background, explain their principles, and compare their advantages and limitations. We also include dimension reduction methods tailor-made for classification analysis and also such classification methods which reduce the dimension of the computation intrinsically. A common feature of numerous classification methods is the shrinkage estimation principle, which has obtained a recent intensive attention in high-dimensional applications.

Słowa kluczowe

multivariate statistics classification analysis shrinkage estimation dimension reduction data mining

statystyka wielowariancyjna analiza klasyfikacji redukcja wymiaru eksploracja danych

Wydawca

Nałęcz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences
Elsevier

Czasopismo

Biocybernetics and Biomedical Engineering

Rocznik

2014

Tom

Vol. 34, no. 1

Strony

10--18

Opis fizyczny

Bibliogr. 48 poz.

Twórcy

autor

Kalina J.

kalina@cs.cas.cz

Institute of Computer Science of the Academy of Sciences of the Czech Republic, Department of Medical Informatics and Biostatistics, Prague, Czech Republic

Bibliografia

[1] Mertens BJA. Microarrays, pattern recognition and exploratory data analysis. Stat Med 2003; 22: 1879–99.
[2] Boulesteix A-L. Reader's reaction to ‘‘dimension reduction for classification with gene expression microarray data’’ by Dai et al (2006). Stat Appl Genet Mol Biol 2006; 5(1). [article 16].
[3] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York: Springer; 2001.
[4] Stein C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press; 1956. pp. 197–206.
[5] Sundberg R. Shrinkage regression. In: El-Shaarawi AH, Piegorsch WW, editors. Encyclopedia of environmetrics, 4. Chichester: Wiley; 2002. pp. 1994–8.
[6] Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98(9): 5116–21.
[7] Donoho DL, Johnstone IM, Kerkyacharian G, Picard D. Wavelet shrinkage: asymptopia? J R Stat Soc B 1995; 57(2): 301–69.
[8] Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994; 81(3): 425–55.
[9] Breiman L. Heuristics of instability and stabilization in model selection. Ann Stat 1996; 24: 2350–83.
[10] Dai JJ, Lieu L, Rocke D. Dimension reduction for classification with gene expression microarray data. Stat Appl Genet Mol Biol 2006; 5(1) [article 6].
[11] Liu X, Krishnan A, Modry A. An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinf 2005; 6 [article 76].
[12] Vanden Branden K, Hubert M. Robust classification in high dimensions based on the SIMCA method. Chemom Intell Lab Syst 2005; 79: 10–21.
[13] Kalina J. On multivariate methods in robust econometrics. Prague Econ Pap 2012; 21(1): 69–82.
[14] Zuber V, Strimmer K. High-dimensional regression and variable selection using CAR scores. Stat Appl Genet Mol Biol 2011; 10(1) [article 34].
[15] Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 2005; 4(1) [article 32].
[16] Tebbens JD, Schlesinger P. Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Comput Stat Data Anal 2007; 52: 423–37.
[17] Devlin SJ, Gnanadesikan R, Kettenring JR. Robust estimation and outlier detection with correlation coefficients. Biometrika 1975; 62(3): 531–45.
[18] Kalina J. Highly robust statistical methods in medical image analysis. Biocybern Biomed Eng 2012; 32(2): 3–16.
[19] Friedman JH. Regularized discriminant analysis. J Am Stat Assoc 1989; 84(405): 165–75.
[20] Tibshirani R, Hastie T, Narasimhan B. Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat Sci 2003; 18(1): 104–17.
[21] Ledoit O, Wolf M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empirical Financ 2003; 10: 603–21.
[22] Ledoit O, Wolf M. A well-conditioned estimator for largedimensional covariance matrices. J Multivariate Anal 2004; 88: 365–411.
[23] Guo Y, Hastie T, Tibshirani R. Regularized discriminant analysis and its application in microarrays. Biostatistics 2007; 8(1): 86–100.
[24] Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 2002; 18(1): 39–50.
[25] Barker M, Rayens W. Partial least squares for discrimination. J Chemom 2003; 17: 166–73.
[26] Fearn T. Principal component discriminant analysis. Stat Appl Genet Mol Biol 2008; 7(2) [article 6].
[27] Tan Y, Shi L, Tong W, Hwang GTG, Wang C. Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models. Comput Biol Chem 2004; 28: 235–44.
[28] Ding B, Gentleman R. Classification using generalized partial least squares. J Comput Graph Stat 2005; 14(2): 280–98.
[29] Steyerberg EW, Eijkemans MJC, Habbema JDF. Application of shrinkage techniques in logistic regression analysis: a case study. Stat Neerl 2001; 55(1): 76–88.
[30] Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat 2004; 32(2): 407–51.
[31] Hesterberg T, Choi NH, Meier L, Fraley C. Least angle and l1 penalized regression: a review. Stat Surv 2008; 2: 61–93.
[32] Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006; 101(476): 1418–29.
[33] Ghosh D, Chinnaiyan AM. Classification and selection of biomarkers in genomic data using LASSO. J Biomed Biotechnol 2005; 2005(2): 147–54.
[34] Jurečková J, Kalina J. Nonparametric multivariate rank tests and their unbiasedness. Bernoulli 2012; 18(1): 229–51.
[35] Smyth GK. Limma: linear models for microarray data. In: Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W, editors. Bioinformatics and computational biology solutions using R and bioconductor. New York: Springer; 2005. pp. 397–420.
[36] Opgen-Rhein R, Strimmer K. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat Appl Genet Mol Biol 2007; 6(1) [article 9].
[37] Tsai C-A, Chen JJ. Multivariate analysis of variance test for gene set analysis. Bioinformatics 2009; 25(7): 897–903.
[38] Wang X, Dinu I, Liu W, Yasui Y. Linear combination test for hierarchical gene set analysis. Stat Appl Genet Mol Biol 2011; 10(1) [article 13].
[39] Martinez WL, Martinez AR, Solka JL. Exploratory data analysis with MATLAB, 2nd ed., London: Chapman & Hall/CRC; 2011.
[40] Furlanello C, Serafini M, Merler S, Jurman G. Entropybased gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinf 2003; 4 [article 54].
[41] Blankertz B, Tangermann M, Popescu F, Krauledat M, Fazli S, Dónaczy M, et al. The Berlin brain–computer interface. Lect Notes Comput Sci 2008; 5050: 79–101.
[42] Bobrowski L, Łukaszuk T. Relaxed linear separability (RLS) approach to feature (gene) subset selection. In: Xia X, editor. Selected works in bioinformatics. Rijeka: InTech; 2011. pp. 103–18.
[43] Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004; 2(4): 511–22.
[44] Niijima S, Okuno Y. Laplacian linear discriminant analysis approach to unsupervised feature selection. IEEE Trans Comput Biol Bioinf 2009; 6(4): 605–14.
[45] Gao J, Hitchcock DB. James–Stein shrinkage to improve k-means cluster analysis. Comput Stat Data Anal 2010; 54: 2113–27.
[46] Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics 2004; 20(18): 3583–93.
[47] Jelizarow M, Guillemot V, Tenenhaus A, Strimmer K, Boulesteix A-L. Over-optimism in bioinformatics: an illustration. Bioinformatics 2010; 26(16): 1990–8.
[48] Hausser J, Strimmer K. Entropy inference and the James–Stein estimator, with application to nonlinear gene association networks. J Mach Learn Res 2009; 10: 1469–84.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-0c6ae7a3-b4a7-4796-945d-6c193f7b8da9