Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2012 | 7 | 1 | 38-44
Tytuł artykułu

Pattern recognition approach to classifying CYP 2C19 isoform

Treść / Zawartość
Warianty tytułu
Języki publikacji
In this paper a pattern recognition approach to classifying quantitative structure-property relationships (QSPR) of the CYP2C19 isoform is presented. QSPR is a correlative computer modelling of the properties of chemical molecules and is widely used in cheminformatics and the pharmaceutical industry. Predicting whether or not a particular chemical will be metabolized by 2C19 is of primary importance to the pharmaceutical industry. This task poses certain challenges. First of all analyzed data are characterized by a significant biological noise. Additionally the training set is unbalanced, with objects from negative class outnumbering the positives four times. Presented solution deals with those problems, additionally incorporating a throughout feature selection for improving the stability of received results. A strong emphasis is put on the outlier detection and proper model validation to achieve the best predictive power.
Opis fizyczny
  • Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland ,
  • [1]
  • [2] Gasteiger J., Funatsu K., Chemoinformatics-An Important Scientific Discipline, Journal of Computational Chemistry Jpn., 2006, Vol. 5, No. 2:53–58
  • [3] Chawla N.V., Bowyer K.W., Hall L.O. and Kegelmeyer W.P., SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 2002, Volume 16:321–357
  • [4] Chawla N.V., Lazarevic A., Hal L.O. and Bowyer K.W., Smoteboost: improving prediction of the minority class in boosting, Proceedings of the Principles of Knowledge Discovery in Databases, 2003, PKDD-2003:107–119
  • [5] Han H., Wang W., and Mao B., Borderline-smote: A new over-sampling method in imbalanced data sets learning, Lecture Notes in Computer Science, 2005, vol. 3644:878–887[Crossref]
  • [6] Köknar-Tezel S., Latecki L.J., Improving SVM classification on imbalanced time series data sets with ghost points, Knowledge and Information Systems, 2010, DOI: 10.1007/s10115-010-0310-3 [WoS][Crossref]
  • [7] Wang B.X., Japkowicz N., Boosting Support Vector Machines for Imbalanced Data Sets, Lecture Notes in Computer Science, 2008, Volume 4994/2008:38–47[Crossref]
  • [8] Li B.Y., Peng J., Chen Y.Q. and Jin Y.Q., Classifying Unbalanced Pattern Groups by Training Neural Network, Lecture Notes in Computer Science, 2006, Volume 3972/2006:8–13[Crossref]
  • [9] Zhao Z., Huang D., An evolutionary modular neural network for unbalanced pattern classifications, Evolutionary Computation, 2007, CEC 2007:1662–1669
  • [10] Gasteiger J.(Editor), Handbook of Chemoinformatics - From Data to Knowledge, Wiley-VCH, 2003
  • [11] Lindsay K.R., Buchanan B.G., Feigenbaum E.A., Lederberg J., Applications of Artificial Intelligence for Organic Chemistry; the DendralProject, McGraw-Hill, New York, 1980
  • [12] Brown F., Editorial Opinion: Chemoinformatics-a ten year update, Current Opinion in Drug Discovery & Development, 2005, 8(3):296–302
  • [13] Anoyama, T., Suzuki, Y., Ichikawa, H., Neural networks applied to structure-active relationships. Journal of Medicinal Chemistry. 1990, 33, 905–908[Crossref]
  • [14] King, R. D., Hirst, J. D., Sternberg, M. J. E., Comparison of artificial intellogence methods for modeling pharmaceutical QSARs. Applied Artificial Intelligence, 1995, 9, 213–233[Crossref]
  • [15] Liu, Y., A comparative study on feature selection methods for drug discovery. Journal of Chem. Inf. Comput. Sci., 2004, 44, 1823–1828[Crossref]
  • [16] Burbidge, R., Trotter, M., Buxton, B., Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers and Chemistry, 2001, 26, 5–14[Crossref]
  • [17] Duda R.O., Hart P.E., Stork D.G., Pattern Classification, Wiley-Interscience, 2001
  • [18] Vapnik V., Statistical Learning Theory, Willey 1998
  • [19] Williams, C. K. I., Barber, D., Bayesian classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20, 1342–1351[WoS][Crossref]
  • [20] Crammer, K., Singer, Y., On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2001, 2, 265–292
  • [21] Redman T. C., Data Quality. The Field Guide, Boston Digital Press, 2001
  • [22] Ben-Gal I., Outlier detection, Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005
  • [23] Guyon I., Gunn S., Nikravesh M. and Zadeh L., Feature extraction, foundations and applications, Springer, 2006
  • [24] Yu L., Liu H., Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 1205–1224
  • [25]
  • [26] Karatzoglou A., Smola A., Hornik K., Zeileis A., Kernlab - An S4 Package for Kernel Methods in R, Journal of Statistical Software, 2004, 11(9)
  • [27] Karatzoglou A., Meyer D., Hornik K., Support Vector Machines in R, Journal of Statistical Software, 2006, 15(9)
  • [28] Alpaydin, E., Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms, Neural Computation, 1998, 11:1885–1892[Crossref][WoS]
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.