Mixture of Metrics Optimization for Machine Learning Problems

Wiercioch, M.; Śmieja, M.

doi:10.4467/20838476SI.15.008.3030

Artykuł - szczegóły

Tytuł artykułu

Mixture of Metrics Optimization for Machine Learning Problems

Autorzy

Wiercioch M. , Śmieja M.

Wybrane pełne teksty z tego czasopisma

http://www.ejournals.eu/Schedae-Informaticae/

Identyfikatory

DOI

10.4467/20838476SI.15.008.3030

Warianty tytułu

Języki publikacji

Abstrakty

The selection of data representation and metric for a given data set is one of the most crucial problems in machine learning since it affects the results of classification and clustering methods. In this paper we investigate how to combine a various data representations and metrics into a single function which better reflects the relationships between data set elements than a single representation-metric pair. Our approach relies on optimizing a linear combination of selected distance measures with use of least square approximation. The application of our method for classification and clustering of chemical compounds seems to increase the accuracy of these methods.

Słowa kluczowe

metric learning classification clustering chemical compound activity fingerprint

Wydawca

Wydawnictwo Uniwersytetu Jagiellońskiego

Czasopismo

Schedae Informaticae

Rocznik

2015

Tom

Vol. 24

Strony

83--92

Opis fizyczny

Bibliogr. 27 poz., rys.

Twórcy

autor

Wiercioch M.

Faculty of Mathematics and Computer Science ul. Lojasiewicza 6, 30-348 Kraków

autor

Śmieja M.

Faculty of Mathematics and Computer Science ul. Lojasiewicza 6, 30-348 Kraków

Bibliografia

[1] Aczel A., Sounderpandian J., Complete Business Statistics. McGraw Hill, New York 2009.
[2] Atkeson C., Moore A., Schaal S., Locally weighted learning. Artificial Intelligence Review, 1997, 11, pp. 11–73.
[3] Bar-Hillel A., Hertz T., Shental N., Weinshall D., Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 2005, 6, pp. 937–965.
[4] Cover T., Hart P., Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 1967, 13, pp. 21–27.
[5] Cox T.F., Cox M.A.A., Multidimensional Scaling. Chapman and Hall, London 1994.
[6] Deng Z., Chuaqui C., Singh J., Knowledge-based design of target-focused libraries using protein-ligand interaction constraints. Journal of Medicinal Chemistry, 2006, 49(2), pp. 490–500.
[7] Domeniconi C., Gunopulos D., Adaptive nearest neighbor classification using support vector machines. Advances in Neural Information Processing Systems, 2002, 14, pp. 665–672.
[8] Geppert H., Vogt M., Bajorath J., Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. Journal of Chemical Information and Modeling, 2010, 50, pp. 205–216.
[9] Goldberger J., Roweis S., Hinton G., Salakhutdinov R., Neighbourhood Components Analysis. Advances in Neural Information Processing Systems, 2004, 17, pp. 513–520.
[10] Hastie T., Tibshirani R., Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18, pp. 607–616.
[11] Hubert L., Arabie P., Comparing partitions. Journal of Classification, 1985, 2, pp. 193–218.
[12] Jaakkola T.S., Haussler D., Exploiting Generative Models in Discriminative Classifiers. Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, 1999, pp. 487–493.
[13] Kedem D., Tyree S., Weinberger K.Q., Sha F., Lanckriet G., Non-linear Metric Learning. Advances in Neural Information Processing Systems, 2012, 25, pp. 2582–2590. Available via http://books.nips.cc/papers/files/nips25/NIPS2012 1223.pdf.
[14] Klekota J., Roth F.P., Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 21, pp. 2518–2525.
[15] Kohavi R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI ’95), 1995, pp. 1137–1143.
[16] Lloyd S., Least Squares Quantization in PCM. IEEE Trans. Inf. Theor., 1982, 28, pp. 129–137.
[17] Roweis S.T., Saul L.K., Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290, pp. 2323–2326.
[18] Scholkopf B., Smola A.J., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001.
[19] Shalev-Shwartz S., Singer Y., Ng A.Y., Online and Batch Learning of Pseudometrics. Proceedings of the Twenty-first International Conference on Machine Learning (ICML ’04), 2004, pp. 743–750.
[20] Shental N., Hertz T., Weinshall D., Pavel M., Adjustment Learning and Relevant Component Analysis. Proceedings of the 7th European Conference on Computer Vision-Part IV (ECCV ’02), 2002, pp. 776–792.
[21]Śmieja M., Warszycki D., Tabor J., Bojarski A.J., Asymmetric Clustering Index in a Case Study of 5-HT1A Receptor Ligands. PloS ONE 9(7): e102069, doi:10.1371/journal.pone.0102069, 2014.
[22] Sneath P.H.A., The Application of Computers to Taxonomy. J. Gen. Microbiol., 1957, 17, pp. 201–226.
[23] Takeda H., Farsiu S. and Milanfar P., Robust kernel regression for restoration and reconstruction of images from sparse noisy data. IEEE International Conference on Image Processing, 2006, pp. 1257–1260.
[24] Xing E.P., Ng A.Y., Jordan M.I., Russell S., Distance Metric Learning, With Application To Clustering With Side-Information,. Advances in Neural Information Processing Systems, 2003, 15, pp. 505–512.
[25] Warszycki D., Mordalski S., Kristiansen K., Kafel R., Sylte I., Chilmonczyk, Z., Bojarski A. J., A Linear Combination of Pharmacophore Hypotheses as a New Tool in Search of New Active Compounds An Application for 5-HT1A Receptor Ligands. PloS ONE 8(12): e84510, doi:10.1371/journal.pone.0084510, 2013.
[26] Weinberger K.Q., Saul L.K., Distance Metric Learning for Large Margin Nearest Neighbor Classification. J. Mach. Learn. Res., 2009, 10, pp. 207–244.
[27] Weinberger K.Q., Saul L.K., Fast solvers and efficient implementations for distance metric learning. ACM International Conference Proceeding Series, 2008, 307, pp. 1160–1167.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-13fb7a2a-1681-43c9-9803-7400dbdede32