Narzędzia help

Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
next last
cannonical link button


Control and Cybernetics

Tytuł artykułu

The Monte Carlo feature selection and interdependency discovery is unbiased

Autorzy Dramiński, M.  Kierczak, M.  Nowak-Brzezińska, A.  Koronecki, J.  Komorowski, J. 
Treść / Zawartość
Warianty tytułu
Języki publikacji EN
EN We show that the Monte Carlo feature selection algorithm for supervised classification proposed, by Dramiński et al. (2008), is not biased towards features with many categories (levels or values). While the algorithm, later extended to include the functionality of discovering interdependencies between features, is surprisingly simple and has been successfully used on many biological data and transactional data of commercial origin, and it has never revealed any bias of the type mentioned, the alleged property of its unbiasedness required a closer scrutiny which is thus provided here. Admittedly, the algorithm does reveal some bias coming from another source, but it is negligible. Hence our final claim is that the algorithm is practically unbiased and the results it provides can be considered fully reliable.
Słowa kluczowe
EN supervised classification   feature selection   feature interactions   high-dimensional problems   applications to genomic and proteomic data  
Wydawca Systems Research Institute, Polish Academy of Sciences
Czasopismo Control and Cybernetics
Rocznik 2011
Tom Vol. 40, no 2
Strony 199--211
Opis fizyczny Bibliogr. 18 poz., wykr.
autor Dramiński, M.
autor Kierczak, M.
autor Nowak-Brzezińska, A.
autor Koronecki, J.
autor Komorowski, J.
  • Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Archer, K.J. and Kimes, R.V. (2008) Empirical Characterization of Random Forest Variable Importance Measures. Comp Stat & Data Anal, 52(4), 2249-2260.
Breiman, L. and Cutler, A. (2008) Random Forests - Classification/Clustering Manual.
Chrysostomou, K., Chen, S.Y. and Liu, X. (2008) Combining Multiple Classifiers for Wrapper Feature Selection. Int. J. Data Mining, Modelling and Management, 1, 91-102.
Diaz-Uriarte, R. and de Andres, S.A. (2006) Gene Selection and Classification of Microarray Data Using Random Forest. BMC Bioinformatics, 7(3), doi:10.1186/1471-2105-7-3.
Dramiński,M.,Rada-Iglesias,A.,Enroth, S.,Wadelius,C.,Koronacki, J., Komorowski, J. (2008)Monte Carlo Feature Selection for Supervised Classification. Bioinformatics, 24(1), 110-117.
Dramiński, M., Kierczak, M., Koronacki, J., Komorowski, J. (2010) Monte Carlo feature selection and interdependency discovery in supervised classification. In: J. Koronacki, Z.W. Ras, S.T. Wierzchon, J. Kacprzyk, eds., Advances in Machine Learning, vol. II, Springer, 371-385.
Dudoit, S. and Fridlyand, J. (2003) Classification in Microarray Experiments. In: T. Speed, ed., Statistical Analysis of Gene Expression Microarray Data, Chapman & Hall/CRC, 93-158.
Hothorn, T., Hornik, K. and Zeileis, A. (2006) Unbiased Recursive Partitioning: A Conditional Inference Framework. J. Computational and Graphical Statistics, 15, 651-674.
Kierczak,M., Ginalski,K., Dramiński,M., Koronacki, J., Rudnicki, W. and Komorowski, J. (2009) A Rough Set-BasedModel of HIV-1 Reverse Transcriptase Resistome. Bioinformatics and Biology Insights, 3, 109- 127.
Kierczak,M., Dramiński,M., Koronacki, J. and Komorowski, J. (2010) Computational analysis of molecular interaction networks underlying change of HIV-1 resistance to selected reverse transcriptase inhibitors. Bioinformatics and Biology Insights 4, 137-146. underlying-ch-article-a2395)
Kierczak, M. (2009) From Physicochemical Properties to Interdependency Networks: A Monte Carlo Approach to Modeling HIV-1 Resistome and Post-translational Modifications. PhD Thesis, Uppsala University (for an introduction to the thesis see Publications at
Li,Y., Campbell,C. and Tipping,M. (2002) Bayesian Automatic Relevance Determination Algorithms for Classifying Gene Expression data. Bioinformatics, 18,(10), 1332-1339.
Lu,C., Devos,A., Suykens, J.A.K., Arus,C. and Van Huffel, S. (2007) Bagging Linear Sparse Bayesian Learning Models for Variable Selection in Cancer Diagnosis. IEEE Trans Inf Technol Biomed, 11, 338-347.
Saeys, Y., Inza, I. and Larrañaga, P. (2007) A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics, 23 (19), 2507-2517.
Strobl,C., Boulesteix,A.-L., Zeileis,A. and Hothorn,T. (2007) Bias in Random Forest Variable Importance Measures: Illustrations, Sources, and a Solution. BMC Bioinformatics, 8(25), doi:10.1186/1471-2105-8-25.
Strobl,C., Boulesteix,A.-L., Kneib,T., Augustin,T. and Zeileis,A. (2008) Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9(307), doi:10.1186/1471-2105-9-307.
Tibshirani,R., Hastie, T., Narasimhan, B. and Chu, G. (2002) Diagnosis of multiple cancer types by nearest shrunken centroids of gene expressions. Proc Natl Acad Sci USA, 99, 6567-6572.
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2003) Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays. Statistical Science, 18, 104-117.
Kolekcja BazTech
Identyfikator YADDA bwmeta1.element.baztech-article-BATC-0007-0082