Czasopismo
Tytuł artykułu
Autorzy
Warianty tytułu
Języki publikacji
Abstrakty
In this paper a pattern recognition approach to classifying quantitative structure-property relationships (QSPR) of the CYP2C19 isoform is presented. QSPR is a correlative computer modelling of the properties of chemical molecules and is widely used in cheminformatics and the pharmaceutical industry. Predicting whether or not a particular chemical will be metabolized by 2C19 is of primary importance to the pharmaceutical industry. This task poses certain challenges. First of all analyzed data are characterized by a significant biological noise. Additionally the training set is unbalanced, with objects from negative class outnumbering the positives four times. Presented solution deals with those problems, additionally incorporating a throughout feature selection for improving the stability of received results. A strong emphasis is put on the outlier detection and proper model validation to achieve the best predictive power.
Czasopismo
Rocznik
Tom
Numer
Strony
38-44
Opis fizyczny
Daty
wydano
2012-02-01
online
2011-11-24
Twórcy
autor
- Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland, bartosz.krawczyk@pwr.wroc.pl
Bibliografia
- [1] http://www.simulations-plus.com/
- [2] Gasteiger J., Funatsu K., Chemoinformatics-An Important Scientific Discipline, Journal of Computational Chemistry Jpn., 2006, Vol. 5, No. 2:53–58 http://dx.doi.org/10.2477/jccj.5.53
- [3] Chawla N.V., Bowyer K.W., Hall L.O. and Kegelmeyer W.P., SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 2002, Volume 16:321–357
- [4] Chawla N.V., Lazarevic A., Hal L.O. and Bowyer K.W., Smoteboost: improving prediction of the minority class in boosting, Proceedings of the Principles of Knowledge Discovery in Databases, 2003, PKDD-2003:107–119
- [5] Han H., Wang W., and Mao B., Borderline-smote: A new over-sampling method in imbalanced data sets learning, Lecture Notes in Computer Science, 2005, vol. 3644:878–887 http://dx.doi.org/10.1007/11538059_91[Crossref]
- [6] Köknar-Tezel S., Latecki L.J., Improving SVM classification on imbalanced time series data sets with ghost points, Knowledge and Information Systems, 2010, DOI: 10.1007/s10115-010-0310-3 [WoS][Crossref]
- [7] Wang B.X., Japkowicz N., Boosting Support Vector Machines for Imbalanced Data Sets, Lecture Notes in Computer Science, 2008, Volume 4994/2008:38–47 http://dx.doi.org/10.1007/978-3-540-68123-6_4[Crossref]
- [8] Li B.Y., Peng J., Chen Y.Q. and Jin Y.Q., Classifying Unbalanced Pattern Groups by Training Neural Network, Lecture Notes in Computer Science, 2006, Volume 3972/2006:8–13 http://dx.doi.org/10.1007/11760023_2[Crossref]
- [9] Zhao Z., Huang D., An evolutionary modular neural network for unbalanced pattern classifications, Evolutionary Computation, 2007, CEC 2007:1662–1669
- [10] Gasteiger J.(Editor), Handbook of Chemoinformatics - From Data to Knowledge, Wiley-VCH, 2003
- [11] Lindsay K.R., Buchanan B.G., Feigenbaum E.A., Lederberg J., Applications of Artificial Intelligence for Organic Chemistry; the DendralProject, McGraw-Hill, New York, 1980
- [12] Brown F., Editorial Opinion: Chemoinformatics-a ten year update, Current Opinion in Drug Discovery & Development, 2005, 8(3):296–302
- [13] Anoyama, T., Suzuki, Y., Ichikawa, H., Neural networks applied to structure-active relationships. Journal of Medicinal Chemistry. 1990, 33, 905–908 http://dx.doi.org/10.1021/jm00165a004[Crossref]
- [14] King, R. D., Hirst, J. D., Sternberg, M. J. E., Comparison of artificial intellogence methods for modeling pharmaceutical QSARs. Applied Artificial Intelligence, 1995, 9, 213–233 http://dx.doi.org/10.1080/08839519508945474[Crossref]
- [15] Liu, Y., A comparative study on feature selection methods for drug discovery. Journal of Chem. Inf. Comput. Sci., 2004, 44, 1823–1828 http://dx.doi.org/10.1021/ci049875d[Crossref]
- [16] Burbidge, R., Trotter, M., Buxton, B., Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers and Chemistry, 2001, 26, 5–14 http://dx.doi.org/10.1016/S0097-8485(01)00094-8[Crossref]
- [17] Duda R.O., Hart P.E., Stork D.G., Pattern Classification, Wiley-Interscience, 2001
- [18] Vapnik V., Statistical Learning Theory, Willey 1998
- [19] Williams, C. K. I., Barber, D., Bayesian classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20, 1342–1351 http://dx.doi.org/10.1109/34.735807[WoS][Crossref]
- [20] Crammer, K., Singer, Y., On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2001, 2, 265–292
- [21] Redman T. C., Data Quality. The Field Guide, Boston Digital Press, 2001
- [22] Ben-Gal I., Outlier detection, Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005
- [23] Guyon I., Gunn S., Nikravesh M. and Zadeh L., Feature extraction, foundations and applications, Springer, 2006
- [24] Yu L., Liu H., Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 1205–1224
- [25] http://www.r-project.org/
- [26] Karatzoglou A., Smola A., Hornik K., Zeileis A., Kernlab - An S4 Package for Kernel Methods in R, Journal of Statistical Software, 2004, 11(9)
- [27] Karatzoglou A., Meyer D., Hornik K., Support Vector Machines in R, Journal of Statistical Software, 2006, 15(9)
- [28] Alpaydin, E., Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms, Neural Computation, 1998, 11:1885–1892 http://dx.doi.org/10.1162/089976699300016007[Crossref][WoS]
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.-psjd-doi-10_2478_s11536-011-0120-3