Powiadomienia systemowe
- Sesja wygasła!
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed adaptive computation time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of adaptive computation time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.
Rocznik
Tom
Strony
811--820
Opis fizyczny
Bibliogr. 48 poz., rys., wykr., tab.
Twórcy
autor
- National Research University Higher School of Economics, Moscow, Russia
autor
- Luka Inc., Moscow, Russia
autor
- National Research University Higher School of Economics, Moscow, Russia
Bibliografia
- [1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012.
- [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015.
- [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CVPR, 2015.
- [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CVPR, 2016.
- [5] T.N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, “Low-rank matrix factorization for deep neural network training with high-dimensional output targets,” ICASSP, 2013.
- [6] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up convolutional neural networks with low rank expansions,” BMVC, 2014.
- [7] K. Neklyudov, D. Molchanov, A. Ashukha, and D.P. Vetrov, “Structured bayesian pruning via log-normal multiplicative noise,” NIPS, 2017.
- [8] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” ECCV, 2016.
- [9] E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” ICLR Workshop, 2016.
- [10] A. Eslami, N. Heess, T. Weber, Y. Tassa, D. Szepesvari, K. Kavukcuoglu, and G. E. Hinton, “Attend, infer, repeat: Fast scene understanding with generative models,” NIPS, 2016.
- [11] M. McGill and P. Perona, “Deciding how to decide: Dynamic routing in artificial neural networks,” ICML, 2017.
- [12] M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, D. Vetrov, and R. Salakhutdinov, “Spatially adaptive computation time for residual networks,” CVPR, 2017.
- [13] A. Mnih and K. Gregor, “Neural variational inference and learning in belief networks,” ICML, 2014.
- [14] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, 1992.
- [15] Z. Li, Y. Yang, X. Liu, S. Wen, and W. Xu, “Dynamic computational time for visual attention,” ICCV, 2017.
- [16] A. Graves, “Adaptive computation time for recurrent neural networks,” arXiv, 2016.
- [17] M. Neumann, P. Stenetorp, and S. Riedel, “Learning to reason with adaptive computation,” NIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016.
- [18] M. Ryabinin and E. Lobacheva, “Adaptive prediction time for sequence classification,” arXiv, 2018.
- [19] D.P. Kingma and M.Welling, “Auto-encoding variational bayes,” ICLR, 2014.
- [20] J. Staines and D. Barber, “Variational optimization,” arXiv, 2012.
- [21] J. Staines and D. Barber, “Optimization by variational bounding,” ESANN, 2013.
- [22] C.J. Maddison, A. Mnih, and Y.W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” ICLR, 2017.
- [23] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” ICLR, 2017.
- [24] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” NIPS, 2015.
- [25] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” ICLR, 2015.
- [26] J. Ba, R.R. Salakhutdinov, R. B. Grosse, and B.J. Frey, “Learning wake-sleep recurrent attention models,” NIPS, 2015.
- [27] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” ICML, 2015.
- [28] M. Titsias and M. Lázaro-Gredilla, “Doubly stochastic variational bayes for non-conjugate inference,” ICML, 2014.
- [29] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” NIPS, 2016.
- [30] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” TPAMI, 2017.
- [31] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Computer Science Department, University of Toronto, Tech. Rep., 2009.
- [32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, 1997.
- [33] Y. Jernite, E. Grave, A. Joulin, and T. Mikolov, “Variable computation in recurrent neural networks,” ICLR, 2017.
- [34] A.W. Yu, H. Lee, and Q.V. Le, “Learning to skim text,” ACL, 2017.
- [35] V. Campos, B. Jou, X. Giró-i Nieto, J. Torres, and S.-F. Chang, “Skip rnn: Learning to skip state updates in recurrent neural networks,” ICLR, 2018.
- [36] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv, 2013.
- [37] S. Leroux, P. Molchanov, P. Simoens, B. Dhoedt, T. Breuel, and J. Kautz, “Iamnn: Iterative and adaptive mobile neural network for efficient image classification,” ICLR Workshop, 2018.
- [38] Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris, “BlockDrop: Dynamic inference paths in residual networks,” CVPR, 2018.
- [39] A. Veit and S. Belongie, “Convolutional networks with adaptive computation graphs,” arXiv, 2017.
- [40] X. Wang, F. Yu, Z.-Y. Dou, and J.E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” arXiv, 2017.
- [41] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” JMLR, 2003.
- [42] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, 1990.
- [43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, 2014.
- [44] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” NIPS, 2015.
- [45] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” ICML, 2016.
- [46] Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” NIPS, 2016.
- [47] D. Molchanov, A. Ashukha, and D. Vetrov, “Variational dropout sparsifies deep neural networks,” ICML, 2017.
- [48] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-e59dd8d7-b139-4dc4-bc1a-628e3c43bfe6