Probabilistic adaptive computation time

Figurnov, M.; Sobolev, A.; Vetrov, D.

doi:10.24425/bpas.2018.125928

Artykuł - szczegóły

Tytuł artykułu

Probabilistic adaptive computation time

Autorzy

Figurnov M. , Sobolev A. , Vetrov D.

Treść / Zawartość

Pełne teksty:

06_811-820_00938_Bpast.No.66-6_31.12.18_K2.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2018.125928

Warianty tytułu

Języki publikacji

Abstrakty

We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed adaptive computation time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of adaptive computation time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.

Słowa kluczowe

deep learning probabilistic models adaptive computation time

uczenie głębokie modele probabilistyczne adaptacyjny czas obliczeniowy

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2018

Tom

Vol. 66, nr 6

Strony

811--820

Opis fizyczny

Bibliogr. 48 poz., rys., wykr., tab.

Twórcy

autor

Figurnov M.

michael@figurnov.ru

National Research University Higher School of Economics, Moscow, Russia

autor

Sobolev A.

Luka Inc., Moscow, Russia

autor

Vetrov D.

National Research University Higher School of Economics, Moscow, Russia

Bibliografia

[1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CVPR, 2015.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CVPR, 2016.
[5] T.N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, “Low-rank matrix factorization for deep neural network training with high-dimensional output targets,” ICASSP, 2013.
[6] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up convolutional neural networks with low rank expansions,” BMVC, 2014.
[7] K. Neklyudov, D. Molchanov, A. Ashukha, and D.P. Vetrov, “Structured bayesian pruning via log-normal multiplicative noise,” NIPS, 2017.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” ECCV, 2016.
[9] E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” ICLR Workshop, 2016.
[10] A. Eslami, N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K. Kavukcuoglu, and G. E. Hinton, “Attend, infer, repeat: Fast scene understanding with generative models,” NIPS, 2016.
[11] M. McGill and P. Perona, “Deciding how to decide: Dynamic routing in artificial neural networks,” ICML, 2017.
[12] M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, D. Vetrov, and R. Salakhutdinov, “Spatially adaptive computation time for residual networks,” CVPR, 2017.
[13] A. Mnih and K. Gregor, “Neural variational inference and learning in belief networks,” ICML, 2014.
[14] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, 1992.
[15] Z. Li, Y. Yang, X. Liu, S. Wen, and W. Xu, “Dynamic computational time for visual attention,” ICCV, 2017.
[16] A. Graves, “Adaptive computation time for recurrent neural networks,” arXiv, 2016.
[17] M. Neumann, P. Stenetorp, and S. Riedel, “Learning to reason with adaptive computation,” NIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016.
[18] M. Ryabinin and E. Lobacheva, “Adaptive prediction time for sequence classification,” arXiv, 2018.
[19] D.P. Kingma and M.Welling, “Auto-encoding variational bayes,” ICLR, 2014.
[20] J. Staines and D. Barber, “Variational optimization,” arXiv, 2012.
[21] J. Staines and D. Barber, “Optimization by variational bounding,” ESANN, 2013.
[22] C.J. Maddison, A. Mnih, and Y.W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” ICLR, 2017.
[23] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” ICLR, 2017.
[24] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” NIPS, 2015.
[25] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” ICLR, 2015.
[26] J. Ba, R.R. Salakhutdinov, R. B. Grosse, and B.J. Frey, “Learning wake-sleep recurrent attention models,” NIPS, 2015.
[27] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” ICML, 2015.
[28] M. Titsias and M. Lázaro-Gredilla, “Doubly stochastic variational bayes for non-conjugate inference,” ICML, 2014.
[29] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” NIPS, 2016.
[30] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” TPAMI, 2017.
[31] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Computer Science Department, University of Toronto, Tech. Rep., 2009.
[32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, 1997.
[33] Y. Jernite, E. Grave, A. Joulin, and T. Mikolov, “Variable computation in recurrent neural networks,” ICLR, 2017.
[34] A.W. Yu, H. Lee, and Q.V. Le, “Learning to skim text,” ACL, 2017.
[35] V. Campos, B. Jou, X. Giró-i Nieto, J. Torres, and S.-F. Chang, “Skip rnn: Learning to skip state updates in recurrent neural networks,” ICLR, 2018.
[36] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv, 2013.
[37] S. Leroux, P. Molchanov, P. Simoens, B. Dhoedt, T. Breuel, and J. Kautz, “Iamnn: Iterative and adaptive mobile neural network for efficient image classification,” ICLR Workshop, 2018.
[38] Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris, “BlockDrop: Dynamic inference paths in residual networks,” CVPR, 2018.
[39] A. Veit and S. Belongie, “Convolutional networks with adaptive computation graphs,” arXiv, 2017.
[40] X. Wang, F. Yu, Z.-Y. Dou, and J.E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” arXiv, 2017.
[41] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” JMLR, 2003.
[42] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, 1990.
[43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, 2014.
[44] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” NIPS, 2015.
[45] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” ICML, 2016.
[46] Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” NIPS, 2016.
[47] D. Molchanov, A. Ashukha, and D. Vetrov, “Variational dropout sparsifies deep neural networks,” ICML, 2017.
[48] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e59dd8d7-b139-4dc4-bc1a-628e3c43bfe6