PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Probabilistic adaptive computation time

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed adaptive computation time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of adaptive computation time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.
Rocznik
Strony
811--820
Opis fizyczny
Bibliogr. 48 poz., rys., wykr., tab.
Twórcy
autor
  • National Research University Higher School of Economics, Moscow, Russia
autor
  • Luka Inc., Moscow, Russia
autor
  • National Research University Higher School of Economics, Moscow, Russia
Bibliografia
  • [1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012.
  • [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015.
  • [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CVPR, 2015.
  • [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CVPR, 2016.
  • [5] T.N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, “Low-rank matrix factorization for deep neural network training with high-dimensional output targets,” ICASSP, 2013.
  • [6] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up convolutional neural networks with low rank expansions,” BMVC, 2014.
  • [7] K. Neklyudov, D. Molchanov, A. Ashukha, and D.P. Vetrov, “Structured bayesian pruning via log-normal multiplicative noise,” NIPS, 2017.
  • [8] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” ECCV, 2016.
  • [9] E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” ICLR Workshop, 2016.
  • [10] A. Eslami, N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K. Kavukcuoglu, and G. E. Hinton, “Attend, infer, repeat: Fast scene understanding with generative models,” NIPS, 2016.
  • [11] M. McGill and P. Perona, “Deciding how to decide: Dynamic routing in artificial neural networks,” ICML, 2017.
  • [12] M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, D. Vetrov, and R. Salakhutdinov, “Spatially adaptive computation time for residual networks,” CVPR, 2017.
  • [13] A. Mnih and K. Gregor, “Neural variational inference and learning in belief networks,” ICML, 2014.
  • [14] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, 1992.
  • [15] Z. Li, Y. Yang, X. Liu, S. Wen, and W. Xu, “Dynamic computational time for visual attention,” ICCV, 2017.
  • [16] A. Graves, “Adaptive computation time for recurrent neural networks,” arXiv, 2016.
  • [17] M. Neumann, P. Stenetorp, and S. Riedel, “Learning to reason with adaptive computation,” NIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016.
  • [18] M. Ryabinin and E. Lobacheva, “Adaptive prediction time for sequence classification,” arXiv, 2018.
  • [19] D.P. Kingma and M.Welling, “Auto-encoding variational bayes,” ICLR, 2014.
  • [20] J. Staines and D. Barber, “Variational optimization,” arXiv, 2012.
  • [21] J. Staines and D. Barber, “Optimization by variational bounding,” ESANN, 2013.
  • [22] C.J. Maddison, A. Mnih, and Y.W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” ICLR, 2017.
  • [23] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” ICLR, 2017.
  • [24] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” NIPS, 2015.
  • [25] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” ICLR, 2015.
  • [26] J. Ba, R.R. Salakhutdinov, R. B. Grosse, and B.J. Frey, “Learning wake-sleep recurrent attention models,” NIPS, 2015.
  • [27] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” ICML, 2015.
  • [28] M. Titsias and M. Lázaro-Gredilla, “Doubly stochastic variational bayes for non-conjugate inference,” ICML, 2014.
  • [29] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” NIPS, 2016.
  • [30] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” TPAMI, 2017.
  • [31] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Computer Science Department, University of Toronto, Tech. Rep., 2009.
  • [32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, 1997.
  • [33] Y. Jernite, E. Grave, A. Joulin, and T. Mikolov, “Variable computation in recurrent neural networks,” ICLR, 2017.
  • [34] A.W. Yu, H. Lee, and Q.V. Le, “Learning to skim text,” ACL, 2017.
  • [35] V. Campos, B. Jou, X. Giró-i Nieto, J. Torres, and S.-F. Chang, “Skip rnn: Learning to skip state updates in recurrent neural networks,” ICLR, 2018.
  • [36] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv, 2013.
  • [37] S. Leroux, P. Molchanov, P. Simoens, B. Dhoedt, T. Breuel, and J. Kautz, “Iamnn: Iterative and adaptive mobile neural network for efficient image classification,” ICLR Workshop, 2018.
  • [38] Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris, “BlockDrop: Dynamic inference paths in residual networks,” CVPR, 2018.
  • [39] A. Veit and S. Belongie, “Convolutional networks with adaptive computation graphs,” arXiv, 2017.
  • [40] X. Wang, F. Yu, Z.-Y. Dou, and J.E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” arXiv, 2017.
  • [41] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” JMLR, 2003.
  • [42] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, 1990.
  • [43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, 2014.
  • [44] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” NIPS, 2015.
  • [45] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” ICML, 2016.
  • [46] Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” NIPS, 2016.
  • [47] D. Molchanov, A. Ashukha, and D. Vetrov, “Variational dropout sparsifies deep neural networks,” ICML, 2017.
  • [48] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e59dd8d7-b139-4dc4-bc1a-628e3c43bfe6
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.