PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Feature selection of protein structural classification using SVM classifier

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Recursive feature elimination method (RFE), cross validation coefficient (CV) and accuracy of classification of test data are applied as a criterion of feature selection in order to find relevant features and to analyze their influence on classifier accuracy. Feature selection method was compared to principal component analysis (PCA) to understand the effectiveness of feature reduction. Support vector machine classifier with radial basis function (RBF) kernel is applied to find the best set of features using grid model selection and to select and assess relevant features. The best selected feature set is then analyzed and interpreted as the source of knowledge about the protein structure and biochemical properties of amino acids included in the protein domain sequence.
Twórcy
autor
  • Silesian University of Technology, ul. Akademicka 16, 44-101 Gliwice, Poland
autor
  • Silesian University of Technology, ul. Akademicka 16, 44-101 Gliwice, Poland
Bibliografia
  • [1] Guyon I., Weston J., Barnhill S., Vapnik V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 2002, 46, 389–422.
  • [2] Guyon I., Vapnik V., Boser B., Bottou L.: Structural Risk Minimization for Character Recognition. S.A. Solla, AT&T Bell Laboratories, Holmdel, USA 1992.
  • [3] Dutkowski J.: Exploratory data analysis. Projection methods: principal component analysis and multidimensional scaling. (in Polish). http://www.mimuw.edu.pl/~aniag/SADM/pca.pdf (last access 08.08.2012).
  • [4] Twardowski T.: Numerical methods for technical computing, Lecture VII, Eigenvalues and eigenvectors, singular values and SVD decomposition (in Polish). http://galaxy.uci.agh.edu.pl/~ttward/numer/Warto%9Cci%20i%20wektory%20w%B3, asne.pdf (last access 08.08.2012).
  • [5] Kohavi R., John G.: Wrappers for Feature Subset Selection. Artificial Intelligence, December 1997, 97, 1–2, 273–324.
  • [6] Guyon I., Elisseeff A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 2003, 3, 1157–1182.
  • [7] Liu J., Ranka S., Kahveci T.: Classification and feature selection algorithms for multi-class CGH data. Bioinformatics 2008, July, 24, 13.
  • [8] Guyon I.: Feature selection and causal discovery fundamentals and applications. http://langtech.jrc.it/mmdss2007/htdocs/Presentations/Docs/MMDSS_Guyon.pdf (last access 08.08.2012).
  • [9] Guyon I., Elisseeff A.: An Introduction to Feature Extraction, Feature Extraction. Foundations and Applications. Springer 2006.
  • [10] Le Cun Y., Denker J., Solla S.: Optimal Brain Damage. AT&T Bell Laboratories, Holmdel, N. Y. 1990.
  • [11] Zhou X., Tuck D.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 2007.
  • [12] Abe S.: Support Vector Machines for Pattern Classification. Springer 2005.
  • [13] Levitt M., Chothia C.: Structural patterns in globular proteins. Nature 1976, June 17, 261, 5561, 552–558.
  • [14] Osuna E., Freund R. Girosi F.: A.I. Memo : Support Vector Machines Training and Applications No. 1602, C.B.C.L Paper No. 144, 1997.
  • [15] Vapnik V.: The Nature of Statistical Learning Theory. Second Edition, Springer 1995.
  • [16] Fradkin D., Muchnik I.: Support Vector Machines for Classification. IMACS Series in Discrete Mathematics and Theoretical Computer Science 2005.
  • [17] Hubbard T., Ailey B., Brenner S., Murzin A., Chothia C.: SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data. Acta Cryst. 1998, D54, 1147–1154.
  • [18] Hubbard T., Ailey B., Brenner S., Murzin A., Chothia C.: SCOP, Structural Classification of Proteins Database. Nucleid Acids Research 1999, 27, 1.
  • [19] Murzin A., Brenner S., Hubbard T., Chothia C.: SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 1995, 247, 536–540.
  • [20] Gu F., Chen H., Ni J.: Protein structural class prediction based on an improved statistical strategy. BMC Bioinformatics 2008, 9 (Suppl 6):S5.
  • [21] Krajewski Z.: Protein structural classification based on pseudo amino acid composition using SVM classifier. Biocybernetics and Biomedical Engineering 2013, vol 33 (in print).
  • [22] Chou K.: Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. Proteins: Structure, Function, and Bioinformatics, 2001, 43, 3, 246–255.
  • [23] Chou K., Cai Y.: Predicting Protein Quaternary Structure by Pseudo Amino Acid Composition. Proteins: Structure, Function, and Genetics 2003, 53, 282–289.
  • [24] Zhang G., Li H., Gao J., Fang B.: Predicting Lipase Types by Improved Chou’s Pseudo-Amino Acid Composition. Protein and Peptide Letters 2008, 15, 10, 1132–1137.
  • [25] Chou K.: Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology. Current Proteomics, 2009, 6, 262–274.
  • [26] Chou K., Cai Y.: Prediction of protease types in a hybridization space. Biochemical and Biophysical Research Communications 2006, 339, 1015–1020.
  • [27] Chou K., Cai Y.: Predicting Subcellular Localization of Proteins by Hybridizing Functional Domain Composition and Pseudo-Amino Acid Composition. Journal of Cellular Biochemistry 2004, 91, 6, 1197–1203.
  • [28] Chou K.: Progress in Protein Structural Class Prediction and its Impact to Bioinformatics and Proteomics. Current Protein and Peptide Science October 2005, 6, 5, 423–436.
  • [29] Cai Y., Zhou G., Chou K.: Support Vector Machines for Predicting Membrane Protein Types by Using Functional Domain Composition. Biophysical Journal, 1 May 2003, 84, 3257–3263.
  • [30] Shieh M., Yang C.: Multiclass SVM-RFE for product form feature selection. Expert Systems with Applications, July-August 2008, 35, 1–2, 531–541.
  • [31] Oza N., Turner K.: Dimensionality Reduction Through Classifier Ensembles. Technical Report NASA-ARC-IC-1999-126, NASA Ames Research Center, 1999.
  • [32] Fasman G. (Editor) : Prediction of Protein Structure and the Principles of Protein Conformation. Springer; 1 ed. (October 31, 1989). Chapter 9: Prevelige P., Fasman G.: Chou-Fasman Prediction of the Secondary Structure of Proteins The Chou-Fasman-revelige Algorithm.
  • [33] Edholm O.: The Chou-Fasman method for predicting secondary structure. Alba Nova University Center, KTH - Theoretica Physics , SE-106 91 Stockholm – Sweden.
  • [34] Singh M.: COS551 Intro. to Computational Biology. http://www.cs.princeton.edu/~mona/Lecture/sec--structure.pdf (last access 08.08.2012).
  • [35] Chothia C., Hubard T., Brenner S., Barns H., Murzin A.: Protein Folds in the all-α and all-β classes. Annu. Rev. Biophys. Biomol. Struct. 1997, 26, 597–627.
  • [36] Murzin A., Lesk A., Chothia C.: Principles Determining the Structure of β-Sheet Barrels in Proteins. J. Mol. Biol. 1994, 236, 1369–1381.
  • [37] Muńoz V., Cronet P., Lopez-Hernandez E., Serrano L.: Analysis of the effect of local interactions on protein stability. Folding and Design, June 1996, 1, 3, 167–178.
  • [38] Munoz V., Serrano L.: Local versus nonlocal interactions in protein folding and stability – an experimentalist’s point of view. Folding and Design, August 1996, 1, 4, R71–R77.
  • [39] The Biochemistry Questions. http://biochemistryquestions.wordpress.com/2008/10/02/secondary-structure-of-proteins/ (last access 08.08.2012).
  • [40] Alpha-Helix: Overview of Secondary Structure. http://mcdb-webarchive.mcdb.ucsb.edu/sears/biochemistry/(last access 08.08.2012).
  • [41] Overview of Beta-Pleated Sheet Secondary Structure. http://mcdb-webarchive.mcdb.ucsb.edu/sears/biochemistry/ (last access 08.08.2012).
  • [42] Chothia C.: Polyhedra for helical proteins. Nature, 19 January 1989, 337.
  • [43] Chothia C.: Principles that determine the structure of proteins. Ann. Rev. Biochem. 1984, 53, 537–72.
  • [44] Chothia C., Levitt M., Whaildson D.: Helix to Helix Packing in Proteins. J. Mol. Biol. 1981, 145, 215–250.
  • [45] Chothia C., Levitt M., Richardson D.: Structure of proteins: Packing of a-helices and pleated sheets. Proc. Nati. Acad. Sci. USA October 1977, 74, 10, 4130–4134.
  • [46] Chou K., Carlacci L.: Energetic Approach to the Folding of α/β Barrels. Proteins: Structure, Function and Genetics, 1991, 9, 280–295.
  • [47] Chothia C., Finkelstein A.: The classification and origins of protein folding patterns. Annu. Rev. Biochem. 1990. 59:1007–39.
  • [48] Janin J., Chothia C.: Packing of α-Helices onto β-Pleated Sheets and the Anatomy of α/β Proteins. J. Mol. Biol. 1980, 143, 95–128.
  • [49] Chothia C., Janin J.: Orthogonal Packing of β-Pleated Sheets in Proteins. Biochemistry 1982, 21, 3955–3965.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b9808e79-bace-455d-908b-6e3d0f3c68dd
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.