Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding

Jastrzębski, S.; Czarnecki, W. M.

doi:10.4467/20838476SI.15.001.3023

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Schedae Informaticae

2015 | Vol. 24 | 9--19

Tytuł artykułu

Analysis of Compounds Activity Concept Learned by SVM Using Robust Jaccard Based Low-dimensional Embedding

Autorzy

Jastrzębski, S. , Czarnecki, W. M.

Wybrane pełne teksty z tego czasopisma

http://www.ejournals.eu/Schedae-Informaticae/

Warianty tytułu

Języki publikacji

Abstrakty

Support Vector Machines (SVM) with RBF kernel is one of the most successful models in machine learning based compounds biological activity prediction. Unfortunately, existing datasets are highly skewed and hard to analyze. During our research we try to answer the question how deep is activity concept modeled by SVM. We perform analysis using a model which embeds compounds’ representations in a low-dimensional real space using near neighbour search with Jaccard similarity. As a result we show that concepts learned by SVM is not much more complex than slightly richer nearest neighbours search. As an additional result, we propose a classification technique, based on Locally Sensitive ashing approximating the Jaccard similarity through minhashing technique, which performs well on 80 tested datasets (consisting of 10 proteins with 8 different representations) while in the same time allows fast classification and efficient online training.

Słowa kluczowe

support vector machine Locally Sensitive Hashing Jaccard similarity

Wydawca

Czasopismo

Schedae Informaticae

Rocznik

2015

Tom

Vol. 24

Strony

9--19

Opis fizyczny

Bibliogr. 11 poz., rys.

Twórcy

autor

Jastrzębski, S.

Faculty of Mathematics and Computer Science Jagiellonian University ul. Łojasiewicza 6, 30-348 Kraków, stanislaw.jastrzebski@uj.edu.pl

autor

Czarnecki, W. M.

Faculty of Mathematics and Computer Science Jagiellonian University ul. Łojasiewicza 6, 30-348 Kraków, wojciech.czarnecki}@uj.edu.pl

Bibliografia

[1] Kurczab R., Smusz S., Bojarski A.J., Evaluation of different machine learning methods for ligand-based virtual screening. J. Cheminformatics, 2011, 3(S-1), pp.P41.
[2] Cortes C., Vapnik V., Support-vector networks. Machine Learning, 1995, 20(3),pp. 273–297.
[3] Vapnik V., The nature of statistical learning theory. Springer, New York, 2000.
[4] Berlinet A., Thomas-Agnan C., Reproducing kernel Hilbert spaces in probability and statistics. vol. 3. Springer, 2004.
[5] Drineas P., Mahoney M.W., On the nystr¨om method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 2005, 6, pp. 2153–2175.
[6] Joachims T., Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2006, pp. 217–226.
[7] Rajaraman A., Ullman J.D., Mining of massive datasets. Cambridge University Press, 2011.
[8] Lewis D.D., Gale W.A., A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Springer-Verlag New York, Inc., 1994, pp. 3–12.
[9] Swamidass S.J., Chen J., Bruand J., Phung P., Ralaivola L., Baldi P., Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 2005, 21(suppl 1), pp. i359–i368.
[10] Yap C.W., Padel-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry, 2011, 32(7), pp. 1466–1474.
[11] Smusz S., Czarnecki W.M., Warszycki D., Bojarski A.J., Exploiting uncertainty measures in compounds activity prediction using support vector machines. Bioorganic & Medicinal Chemistry Letters, 2015, 25(1), pp. 100–105.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.4467/20838476SI.15.001.3023

Identyfikator YADDA

bwmeta1.element.baztech-001f76a2-ddac-4f21-98db-47facb05185a