Exponential machines

Novikov, A.; Trofimov, M.; Oseledets, I.

doi:10.24425/bpas.2018.125926

Artykuł - szczegóły

Tytuł artykułu

Exponential machines

Autorzy

Novikov A. , Trofimov M. , Oseledets I.

Treść / Zawartość

Pełne teksty:

04_789-798_00914_Bpast.No.66-6_31.12.18_K3.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2018.125926

Warianty tytułu

Języki publikacji

Abstrakty

Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called tensor train (TT). The tensor train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with ¼ 256 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100 K.

Słowa kluczowe

tensor decomposition tensor train factorization machines Riemannian optimization

optymalizacja dekompozycja tensorowa tensor metryczny Riemanna

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2018

Tom

Vol. 66, nr 6

Strony

789--797

Opis fizyczny

Bibliogr. 31 poz., rys., wykr., tab.

Twórcy

autor

Novikov A.

novikov@bayesgroup.ru

National Research University Higher School of Economics
Institute of Numerical Mathematics RAS

autor

Trofimov M.

Federal Research Center “Computer Science and Control” RAS

autor

Oseledets I.

Institute of Numerical Mathematics RAS
Skolkovo Institute of Science and Technology

Bibliografia

[1] M. Abadi et al., “Tensorflow: Large-scale machine learning on heterogeneous systems”, 2015. Software available from tensorflow.org.
[2] I. Bayer, “FASTFM: A library for factorization machines”, Journal of Machine Learning Research, 2016.
[3] M. Blondel, A. Fujino, N. Ueda, and M. Ishihata, “Higher-order factorization machines”, Advances in Neural Information Processing Systems 29 (NIPS), 2016.
[4] M. Blondel, M. Ishihata, A. Fujino, and N. Ueda, “Polynomial networks and factorization machines: New insights and efficient training algorithms”, In Advances in Neural Information Processing Systems 29 (NIPS). 2016.
[5] A. Bordes, S. Ertekin, J. Weston, and L. Bottou, “Fast kernel classifiers with online and active learning”, The Journal of Machine Learning Research, 6, 1579–1619, 2005.
[6] B.E. Boser, I.M. Guyon, and V.N. Vapnik. “A training algorithm for optimal margin classifiers”, In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
[7] J.D. Caroll and J.J. Chang, “Analysis of individual differences in multidimensional scaling via n-way generalization of eckartyoung decomposition”, Psychometrika, 35, 283–319, 1970.
[8] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. Mandic, “Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions”, Foundations and Trends® in Machine Learning, 9(4‒5), 249–429, 2016.
[9] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. Oseledets, M. Sugiyama, and D. Mandic, “Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives”, Foundations and Trends® in Machine Learning, 9(6), 431–673, 2017.
[10] D. Dheeru and E.K. Taniskidou, “UCI machine learning repository”, 2017.
[11] F.M. Harper and A.J. Konstan, “The movielens datasets: History and context”, ACM Transactions on Interactive Intelligent Systems (TiiS), 2015.
[12] R.A. Harshman, “Foundations of the parafac procedure: models and conditions for an explanatory multimodal factor analysis”, UCLA Working Papers in Phonetics, 16, 1–84, 1970.
[13] S. Holtz, T. Rohwedder, and R. Schneider, “On manifolds of tensors of fixed tt-rank”, Numerische Mathematik, pages 701‒731, 2012.
[14] V. Khrulkov, A. Novikov, and I. Oseledets, “Expressive power of recurrent neural networks”, In International Conference on Learning Representations (ICLR), 2018.
[15] D. Kingma and J. Ba. Adam, “A method for stochastic optimization”, In International Conference on Learning Representations (ICLR), 2015.
[16] V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky. “Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In International Conference on Learning Representations (ICLR), 2014.
[17] R. Livni, S. Shalev-Shwartz, and O. Shamir. “On the computational efficiency of training neural networks”, In Advances in Neural Information Processing Systems 27 (NIPS), 2014.
[18] C. Lubich, I. V. Oseledets, and B. Vandereycken, “Time integration of tensor trains”, SIAM Journal on Numerical Analysis, pages 917–941, 2015.
[19] G. Meyer, S. Bonnabel, and R. Sepulchre, “Regression on fixedrank positive semidefinite matrices: a Riemannian approach”, The Journal of Machine Learning Research, 593–625, 2011.
[20] A. Novikov, P. Izmailov, V. Khrulkov, M. Figurnov, and I. Oseledets, “Tensor train decomposition on tensorflow (t3f)”, arXiv preprint arXiv:1801.01928, 2018.
[21] A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, “Tensorizing neural networks”, In Advances in Neural Information Processing Systems 28 (NIPS). 2015.
[22] I. V. Oseledets, “Tensor-train decomposition”, SIAM J. Scientific Computing, 33(5), 2295–2317, 2011.
[23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in python”, Journal of Machine Learning Research, 12, 2825–2830, 2011.
[24] S. Rendle. “Factorization machines”, In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 995–1000, 2010.
[25] U. Schollwöck, “The density-matrix renormalization group in the age of matrix product states”, Annals of Physics, 326(1), 96–192, 2011.
[26] E. Stoudenmire and D. J. Schwab, “Supervised learning with tensor networks”, In Advances in Neural Information Processing Systems 29 (NIPS). 2016.
[27] M. Tan, I.W. Tsang, L. Wang, B. Vandereycken, and S.J. Pan, “Riemannian pursuit for big matrix recovery”, In Proceedings of The 31st International Conference on Machine Learning (ICML), 2014.
[28] S. Wahls, V. Koivunen, H.V. Poor, and M. Verhaegen. “Learning multidimensional fourier series with tensor trains”, In Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pages 394–398. IEEE, 2014.
[29] Z. Xu and Y. Ke. “Stochastic variance reduced Riemannian eigensolver”, arXiv preprint arXiv:1605.08233, 2016.
[30] J. Yang and A. Gittens, “Tensor machines for learning targetspecific polynomial features”, arXiv preprint arXiv:1504.01697, 2015.
[31] H. Zhang, S.J. Reddi, and S. Sra. Riemannian, “SVRG: Fast stochastic optimization on riemannian manifolds”, Advances in Neural Information Processing Systems 29 (NIPS), 2016.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-0df3b067-aabc-4060-8afa-3fbb598d3783