Protein fold classification based on machine learning paradigm – a review

Stąpor, K.

Artykuł - szczegóły

Tytuł artykułu

Protein fold classification based on machine learning paradigm – a review

Autorzy

Stąpor K.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Protein fold recognition using machine learning-based methods is crucial in the protein structure discovery, especially when the traditional sequence comparison methods fail because the structurally-similar proteins share little in the way of sequence homology. Many different machine learning-based fold classification methods have been proposed with still increasing accuracy and the main aim of this article is to cover all the major results in this field.

Słowa kluczowe

supervised learning algorithm classifier features protein fold recognition

Wydawca

Uniwersytet Jagielloński - Collegium Medicum
Index Copernicus Sp. z o.o.

Czasopismo

Bio-Algorithms and Med-Systems

Rocznik

2012

Tom

Vol. 8, no. 1

Strony

53--76

Opis fizyczny

Bibliogr. 64 poz., tab.

Twórcy

autor

Stąpor K.

Katarzyna.Stapor@polsl.pl

Silesian University of Technology, Institute of Computer Science Akademicka 16, 44-100 Gliwice

Bibliografia

1. Altschul S.F. et al., (1997), Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389-3402.
2. Anfinsen C.B. (1973), Principles that govern the folding of protein chains. Science 181, 223-230.
3. Apweiler R, Bairoch A, Wu CH, et al. (2004)., UniProt: the universal protein knowledgebase". Nucleic Acids Res. 32 (Database issue), D115–9.
4. Baldi P., Brunak S., Chauvin Y., Andersen C., Nielsen H. (2000), Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412-424.
5. Berman H. M. et al. (2000), The Protein DataBank. Nucleic Acids Res., 28 235-242.
6. Bishop Ch. M. (2006), Pattern recognition and machine learning. Springer, New York.
7. Bologna G., Appel R.D. (2002), A comparison study on protein fold recognition. Proc. 9th Int. Conf. Neural Information, V.5, Singapore, 2492-2496.
8. Breiman L. (2001) Random Forests. Machine Learning, 1, 5-32.
9. Chan H.S., Dill K. (1993), The protein folding problem. Physics Today (Feb.), 24-32.
10. Chandonia J.M. et al. (2004) The ASTRAL compendium in 2004. Nucleid Acids Res., 32, D189-D192.
11. Chen K.C., et al. (2006), Using pseudo amino acid composition and support vector machine to predict protein structural class. Journal of Theoretical Biology 243, 444-448.
12. Chen K., Kurgan L. (2007) PERES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23(21) 2843-2850.
13. Cheng J. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleid Acid Res., 33, 72-76.
14. Chinnasamy A., W.K. Sung, A. Mittal (2004), Protein structure and fold prediction using tree-augmented naïve Bayesian classifier. Proc. PSB, Stanford CA, World Scientific Press.
15. Chmielnicki W., Stąpor K. (2010), Protein fold recognition with combined RDA-SVM classifier. Lecture Notes on Artificial Intelligence, LNAI 6076, 162-169.
16. Chmielnicki W., Stąpor K. (2011), A hybrid discriminative/generative approach to protein fold recognition. Accepted to be published in Neurocomputing.
17. Chothia C. (1992), One thousand families for the molecular biologist. Nature 357, 543–544.
18. Chou K.C. (2009) Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology., Current Proteomics, 6, 262-274.
19. Chou K.C. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 43(3), 246-255.
20. Clearly J.G., Trigg I.E. (1995) K*: an instance-based learner using an entropic distance measure. Proc. Int. Conf. Machine Learning, 108-114.
21. Crammer K., Singer Y. (2000) On the learnability and design of output codes for multiclass problems. 13th Computational Learning Theory Conference, 35-46.
22. Craven M.W., Mural R.J., Hauser L.J., Uberbacher E.C.(1995), Predicting protein folding classes without overly relying on homology. Proc. of Intelligent Systems In Molecular Biology (ISMB) 3, 98-106.
23. Cymerman I.A. et al. (2004), Computational methods for protein structure prediction and fold-recognition. In Nucleic Acids and Molecular Biology series, "Practical Bioinformatics". Editor: Bujnicki J. M. Springer-Verlag.
24. Damoulas T., Girolami M. (2008), Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection, Bioinformatics, 24(10), 1264-1270.
25. Denoeux T. (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. Syst. Man Cybern., 25, 804-813.
26. Deschavanne P., Tuffery P. (2009), Enhanced protein fold recognition using a structural alphabet. Proteins 76, 129-137.
27. Ding C.H., Dubchak I. (2001), Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358.
28. Dong Q., Zhou S., Guan (2009), A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655-2662.
29. Dubchak I., Muchnik I. Holbrook S.R., Kim S.H. (1995), Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92, 8700-8704.
30. Friedman J.H. (1989), Regularized Discriminant Analysis. Journal of the American Statistical Association 84(405), 165-175.
31. Ghanty P., Pal N.R. (2009), Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers. IEEE Trans. on Nanobioscience 8, 100-110.
32. Ghahramani Z. An introduction to Hidden Markov Models and Bayesian Networks (2001), Int. Journal Pattern Recognition and Artificial Intelligence, 15(1), 9-42.
33. Guo X., Gao X. (2008), A novel hierarchical ensemble classifier for protein fold recognition, Protein Engineering, Design&Selection, 21(11), 659-664.
34. Huang C.D., Lin C.T., Pal N.R. (2003), Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification. IEEE Trans. on Nanobioscence 2, 221-232.
35. Jones D.T. (1999, Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195-202.
36. Jones D.T. et al. (1992), A new approach to protein fold recognition. Nature, 358, 86-89.
37. Konieczny L. Roterman I., Spólnik P. (2010), System Biology. The strategy of functioning of living organism (in Polish). Scientific Publishing House PWN, Warsaw.
38. Leslie C.S. et al., (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics, 20, 467-476.
39. Liao L., Noble W.S. (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships, J. Comput. Biology, 6, 857-868.
40. Lin K.L. et. al (2007) Feature selection and combination criteria for improving accuracy in protein structure prediction. IEEE Trans. NanoBioscience, 6(2),186-196
41. Lingner T., Menicke P. (2006) Remote homology detection based on oligomer distances. Bioinformatics, 22, 2224-2231.
42. Lo Conte L., Ailey B., Hubbard T.J.P., Brenner S.E., Murzin A.G., Chothia C. (2000), SCOP: a structural classification of protein database. Nucleic Acids Res. 28, 257-259.
43. Marchler-Bauer A., et. Al, (22007), ‚CDD’ a conserved domain database for interactive domain family analysis. Nucleid Acid Res., 35, D237-D240.
44. Mittelman D. et al. (2003) Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments. Bioinformatics 19, 1531-1539.
45. Nanni L. (2006), A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69, 2434-2437.
46. Okun O. (2004), Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics 2, Pisa, Italy, 51–57.
47. Pal N.R., Chakraborty D. (2003), Some new features for protein fold prediction. Proc. Int. Conf. On Artificial Neural Networks and Neural Information Processing, 1176-1183.
48. Rangwala H., Karypis G. (2005), Profile-based direct kernels for remote homology detection and fold recognition, Bioinformatics, 21(23), 4239-4247.
49. Roterman I., Bryliński M., Konieczny L., Jurkowski W. (2007), Early-stage protein folding - in silico model. In: Recent Advances in Structural Biology. A.G. de Brevern (eds.), Research Signpost, Trivandrum, Kerala, India.
50. Roterman I., Konieczny L, Bryliński M. (2009), Late-stage folding intermediate in silico model. In: Structure-function relation in proteins. Roterman I. (ed.). Transworld Research Network T.C. 37/661(2), Fort P.O. Trivandrum, Kerala, India.
51. Saigo H. et al (2004) Protein homology detection using string alignment kernels. Bioinformatics, 20, 1682-1689.
52. Schaffer A. et. al. (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleid Acids Res., 29, 2994-3005.
53. Shamim M. et. al (2007), Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs, Bioinformatics 23(24) 3320-3327.
54. Shawe-Taylor J., Cristiannini N., (2004), Kernel methods for pattern analysis. Cambridge University Press.
55. Shen H.B., Chou K.C. (2006), Ensemble classifier for protein fold pattern recognition. Bioinformatics 2(14), 1717–1722.
56. Shen H.B., Chou K.C. (2009), Predicting protein fold pattern with functional domain and sequential evolution information. J. Theor Biol., 256, 441-446.
57. Shi J. et al. (2001), FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol., 310, 243-257.
58. Stąpor K. (2011), Classification methods in computer vision (in Polish). Scientific Publishing House PWN, Warsaw.
59. Vapnik V. (1995), The Nature of Statistical Learning Theory. Springer, New York.
60. Yang Jian-Yi, Chen X. (2011), Improving taxonomy-based protein fold recognition by using global and local features. Proteins 79, 2053-2064.
61. Ying X., Dong X., Jie L. (2007), Computational methods for protein structure prediction and modelling. Vol. 2: Structure prediction, Springer, New York.
62. Ying Y., Huang K., Campbell C. (2009) Enhance protein fold recognition through novel data integration approach. BMC Bioinformatics 10, 267-287.
63. Yu L., Liu H. (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc. 10th Int. Conf. Machine Learning, 856-863.
64. Zouhal L.M., Denoeux T. (1998) An evidence-theoretic kNN rule with parameter optimization. IEEE Trans. Syst. Man Cybern., 28, 263-271.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-3a522b28-4ea8-4960-ab04-08646896dcd6