Predicting pairwise relations with neural similarity encoders

Horn, F.; Müller, K. R.

doi:10.24425/bpas.2018.125929

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Predicting pairwise relations with neural similarity encoders

Autorzy

Horn F. , Müller K. R.

Treść / Zawartość

Pełne teksty:

07_821-830_00901_Bpast.No.66-6_31.12.18_K2.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2018.125929

Warianty tytułu

Języki publikacji

Abstrakty

Matrix factorization is at the heart of many machine learning algorithms, for example, dimensionality reduction (e.g. kernel PCA) or recommender systems relying on collaborative filtering. Understanding a singular value decomposition (SVD) of a matrix as a neural network optimization problem enables us to decompose large matrices efficiently while dealing naturally with missing values in the given matrix. But most importantly, it allows us to learn the connection between data points’ feature vectors and the matrix containing information about their pairwise relations. In this paper we introduce a novel neural network architecture termed similarity encoder (SimEc), which is designed to simultaneously factorize a given target matrix while also learning the mapping to project the data points’ feature vectors into a similarity preserving embedding space. This makes it possible to, for example, easily compute out-of-sample solutions for new data points. Additionally, we demonstrate that SimEc can preserve non-metric similarities and even predict multiple pairwise relations between data points at once.

Słowa kluczowe

neural networks kernel PCA dimensionality reduction matrix factorization SVD similarity preserving embeddings

sieci neuronowe jądro PCA redukcja wymiarowości faktoryzacja macierzy SVD

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2018

Tom

Vol. 66, nr 6

Strony

821--830

Opis fizyczny

Bibliogr. 55 poz., rys., wykr.

Twórcy

autor

Horn F.

Machine Learning Group, Technische Universität Berlin, Berlin, Germany

autor

Müller K. R.

klaus-robert.mueller@tu-berlin.de

Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea
Max-Planck-Institut für Informatik, Saarbrücken, Germany
Machine Learning Group, Technische Universität Berlin, Berlin, Germany

Bibliografia

[1] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics, New York, NY, USA: Springer New York Inc., 2001.
[3] T. Hofmann and J. M. Buhmann, “Pairwise data clustering by deterministic annealing,” IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1), 1–14, 1997.
[4] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[5] L.v.d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research 9, 2579– 2605, 2008.
[6] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, 10 (5), 1299–1319, 1998.
[7] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science 290 (5500), 2319–2323, 2000.
[8] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science 290 (5500), 2323–2326, 2000.
[9] F. Horn, “Interactive exploration and discovery of scientific publications with pubvis,” arXiv preprint arXiv:1706.08094, 2017.
[10] S. Mika, B. Schölkopf, A. J. Smola, K.-R. Müller, M. Scholz, and G. Rätsch, “Kernel pca and de-noising in feature spaces,” in Advances in Neural Information Processing Systems, 536– 542, 1999.
[11] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An introduction to kernel-based learning algorithms,” IEEE Transactions on Neural Networks 12 (2), 181–201, 2001.
[12] B. Schölkopf, S. Mika, C. J. Burges, Knirsch, K.-R. Müller, G. Rätsch, and A. J. Smola, “Input space versus feature space in kernel-based methods,” IEEE Transactions on Neural Networks, 10 (5), 1000–1017, 1999.
[13] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[14] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems, 3111–3119, 2013.
[15] Z. S. Harris, “Distributional structure,” Word 10 (2‒3), pp. 146–162, 1954.
[16] F. Horn, “Context encoders as a simple but powerful extension of word2vec,” in Proceedings of the 2nd Workshop on Representation Learning for NLP, 10–14, Association for Computational Linguistics, 2017.
[17] O. Levy and Y. Goldberg, “Neural word embedding as implicit matrix factorization,” in Advances in Neural Information Processing Systems, 2177–2185, 2014.
[18] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and Kuksa, “Natural language processing (almost) from scratch,” Journal of Machine Learning Research 12, pp. 2493–2537, 2011.
[19] Q.V. Le and T. Mikolov, “Distributed representations of sentences and documents,” arXiv preprint arXiv:1405.4053, 2014.
[20] J. Turian, L. Ratinov, and Y. Bengio, “Word representations: a simple and general method for semi-supervised learning,” in Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 384–394, Association for Computational Linguistics, 2010.
[21] M. Gönen, “Predicting drug–target interactions from chemical and genomic kernels using bayesian matrix factorization,” Bioinformatics, 28 (18), 2304–2310, 2012.
[22] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer 42 (8), 2009.
[23] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system-a case study,” tech. rep., Minnesota Univ Minneapolis Dept of Computer Science, 2000.
[24] O. Barkan and N. Koenigstein, “Item2vec: neural item embedding for collaborative filtering,” in 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6, IEEE, 2016.
[25] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, “Distributed large-scale natural graph factorization,” in Proceedings of the 22nd International Conference on World Wide Web, 37–48, ACM, 2013.
[26] W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,” arXiv preprint arXiv:1709.05584, 2017.
[27] F. Chollet et al., “Keras.” https://keras.io, 2015.
[28] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit feedback datasets,” in Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, 263–272, IEEE, 2008.
[29] E. Oja, “Simplified neuron model as a principal component analyzer,” Journal of Mathematical Biology 15 (3), 267–273, 1982.
[30] A. Cichocki, “Neural network for singular value decomposition,” Electronics Letters 28 (8), 784–786, 1992.
[31] A. Cichocki and R. Unbehauen, “Neural networks for computing eigenvalues and eigenvectors,” Biological Cybernetics 68 (2), 155–164, 1992.
[32] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer- Verlag New York, Inc., 1995.
[33] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems, pp. 1177–1184, 2007.
[34] M. Alber, P.-J. Kindermans, K. Schütt, K.-R. Müller, and F. Sha, “An empirical study on the properties of random bases for kernel methods,” in Advances in Neural Information Processing Systems, pp. 2760–2771, 2017.
[35] Y. Bengio, J.-f. Paiement, Vincent, O. Delalleau, N. L. Roux, and M. Ouimet, “Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering,” in Advances in Neural Information Processing Systems, 177–184, 2004.
[36] J. Mao and A. K. Jain, “Artificial neural networks for feature extraction and multivariate data projection,” IEEE Transactions on Neural Networks 6 (2), 296–317, 1995.
[37] Y. W. Teh and S. T. Roweis, “Automatic alignment of local representations,” in Advances in Neural Information Processing Systems, 865–872, 2003.
[38] M. A. Carreira-Perpinán and M. Vladymyrov, “A fast, universal algorithm to learn parametric nonlinear embeddings,” in Advances in Neural Information Processing Systems, 253–261, 2015.
[39] L. van der Maaten, “Learning a parametric embedding by preserving local structure,” in International Conference on Artificial Intelligence and Statistics, 384–391, 2009.
[40] K. Bunte, M. Biehl, and B. Hammer, “A general framework for dimensionality-reducing data visualization mapping,” Neural Computation 24 (3), 771–804, 2012.
[41] D. Lowe and M. Tipping, “Feed-forward neural networks and topographic mappings for exploratory data analysis,” Neural Computing & Applications 4 (2), 83–95, 1996.
[42] M. Kampffmeyer, S. Løkse, F. M. Bianchi, R. Jenssen, and L. Livi, “Deep kernelized autoencoders,” in Scandinavian Conference on Image Analysis, 419–430, Springer, 2017.
[43] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Computer vision and pattern recognition, 2006 IEEE computer society conference on, 2, 1735–1742, IEEE, 2006.
[44] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck, “Learning deep structured semantic models for web search using clickthrough data,” in Proceedings of the 22nd ACM international conference on information & knowledge management, pp. 2333–2338, ACM, 2013.
[45] L. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston, “Starspace: Embed all the things!,” arXiv preprint arXiv:1709.03856, 2017.
[46] J. Laub and K.-R. Müller, “Feature discovery in non-metric pairwise data,” Journal of Machine Learning Research 5, 801–818, 2004.
[47] F. Horn. https://github.com/cod3licious/simec/blob/ master/experiments_paper.ipynb.
[48] J. Laub, K.-R. Müller, F.A. Wichmann, and J.H. Macke, “Inducing metric violations in human similarity judgements,” in Advances in Neural Information Processing Systems, 777–784, 2007.
[49] S. Bosse, D. Maniry, K.-R. Müller, T. Wiegand, and W. Samek, “Deep neural networks for no-reference and full-reference image quality assessment,” IEEE Transactions on Image Processing, 27 (1), 206–219, 2018.
[50] K.T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, and A. Tkatchenko, “Quantum-chemical insights from deep tensor neural networks,” Nature communications 8, 13890, 2017.
[51] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller, “Schnet–a deep learning architecture for molecules and materials,” The Journal of Chemical Physics 148 (24), 241722, 2018.
[52] L. Arras, F. Horn, G. Montavon, K.-R. Müller, and W. Samek, “what is relevant in a text document?”: An interpretable machine learning approach,” PLOS ONE 12 (8), e0181142, 2017.
[53] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, andW. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PLOS ONE, 10 (7), e0130140, 2015.
[54] P.-J. Kindermans, K.T. Schütt, M. Alber, K.-R. Müller, and S. Dähne, “Patternnet and patternlrp–improving the interpretability of neural networks,” arXiv preprint arXiv:1705.05598, 2017.
[55] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Müller, “Explaining nonlinear classification decisions with deep taylor decomposition,” Pattern Recognition 65, 211– 222, 2017.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-06f0d3a1-c37f-4865-af04-140f471614b9