Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
We review recent work characterizing the classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.
Rocznik
Tom
Strony
761--773
Opis fizyczny
Bibliogr. 45 poz., rys., wykr.
Twórcy
autor
- Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139
autor
- Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139
Bibliografia
- [1] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao, “Theory i: Why and when can deep networks avoid the curse of dimensionality?,” tech. rep., MIT Center for Brains, Minds and Machines, 2016.
- [2] F. Anselmi, L. Rosasco, C. Tan, and T. Poggio, “Deep convolutional network are hierarchical kernel machines,” Center for Brains, Minds and Machines (CBMM) Memo No. 35, also in arXiv, 2015.
- [3] T. Poggio, L. Rosasco, A. Shashua, N. Cohen, and F. Anselmi, “Notes on hierarchical splines, dclns and i-theory,” tech. rep., MIT Computer Science and Artificial Intelligence Laboratory, 2015.
- [4] T. Poggio, F. Anselmi, and L. Rosasco, “I-theory on depth vs width: hierarchical function composition,” CBMM memo 041, 2015.
- [5] H. Mhaskar, Q. Liao, and T. Poggio, “Learning real and boolean functions: When is deep better than shallow?,” Center for Brains, Minds and Machines (CBMM) Memo No. 45, also in arXiv, 2016.
- [6] H. Mhaskar and T. Poggio, “Deep versus shallow networks: an approximation theory perspective,” Center for Brains, Minds and Machines (CBMM) Memo No. 54, also in arXiv, 2016.
- [7] D. L. Donoho, “High-dimensional data analysis: The curses and blessings of dimensionality,” in AMS CONFERENCE ON MATH CHALLENGES OF THE 21ST CENTURY, 2000.
- [8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, pp. 436–444, 2015.
- [9] K. Fukushima, “Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
- [10] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. 2, pp. 1019–1025, Nov. 1999.
- [11] H. Mhaskar, “Approximation properties of a multilayered feedforward artificial neural network,” Advances in Computational Mathematics, pp. 61–80, 1993.
- [12] C. Chui, X. Li, and H. Mhaskar, “Neural networks for localized approximation,” Mathematics of Computation, vol. 63, no. 208, pp. 607–623, 1994.
- [13] C. K. Chui, X. Li, and H. N. Mhaskar, “Limitations of the approximation capabilities of neural networks with one hidden layer,” Advances in Computational Mathematics, vol. 5, no. 1, pp. 233–243, 1996.
- [14] A. Pinkus, “Approximation theory of the mlp model in neural networks,” Acta Numerica, vol. 8, pp. 143–195, 1999.
- [15] T. Poggio and S. Smale, “The mathematics of learning: Dealing with data,” Notices of the American Mathematical Society (AMS), vol. 50, no. 5, pp. 537–544, 2003.
- [16] B.B. Moore and T. Poggio, “Representations properties of multilayer feedforward networks,” Abstracts of the First annual INNS meeting, vol. 320, p. 502, 1998.
- [17] R. Livni, S. Shalev-Shwartz, and O. Shamir, “A provably efficient algorithm for training deep networks,” CoRR, vol. abs/1304.7045, 2013.
- [18] O. Delalleau and Y. Bengio, “Shallow vs. deep sum-product networks,” in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12‒14 December 2011, Granada, Spain., pp. 666–674, 2011.
- [19] R. Montufar, G.F. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” Advances in Neural Information Processing Systems, vol. 27, pp. 2924–2932, 2014.
- [20] H.N. Mhaskar, “Neural networks for localized approximation of real functions,” in Neural Networks for Processing [1993] III. Proceedings of the 1993 IEEE-SP Workshop, pp. 190–196, IEEE, 1993.
- [21] N. Cohen, O. Sharir, and A. Shashua, “On the expressive power of deep learning: a tensor analysis,” CoRR, vol. abs/1509.0500, 2015.
- [22] M. Telgarsky, “Representation benefits of deep feedforward networks,” arXiv preprint arXiv:1509.08101v2 [cs.LG] 29 Sep 2015, 2015.
- [23] I. Safran and O. Shamir, “Depth separation in relu networks for approximating smooth non-linear functions,” arXiv: 1610.09887v1, 2016.
- [24] H.N. Mhaskar, “Neural networks for optimal approximation of smooth and analytic functions,” Neural Computation, vol. 8, no. 1, pp. 164–177, 1996.
- [25] E. Corominas and F.S. Balaguer, “Condiciones para que una funcion infinitamente derivable sea un polinomio,” Revista matemática hispanoamericana, vol. 14, no. 1, pp. 26–43, 1954.
- [26] R.A. DeVore, R. Howard, and C.A. Micchelli, “Optimal nonlinear approximation,” Manuscripta mathematica, vol. 63, no. 4, pp. 469–478, 1989.
- [27] H.N. Mhaskar, “On the tractability of multivariate integration and approximation by neural networks,” J. Complex., vol. 20, pp. 561–590, Aug. 2004.
- [28] F. Bach, “Breaking the curse of dimensionality with convex neural networks,” arXiv:1412.8690, 2014.
- [29] D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
- [30] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.
- [31] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. Software available from tensorflow.org.
- [32] R. Eldan and O. Shamir, “The power of depth for feedforward neural networks,” arXiv preprint arXiv:1512.03965v4, 2016.
- [33] M. Lin and H. Tegmark, “Why does deep and cheap learning work so well?,” arXiv:1608.08225, pp. 1–14, 2016.
- [34] J.T. Hastad, Computational Limitations for Small Depth Circuits. MIT Press, 1987.
- [35] M. Furst, J. Saxe, and M. Sipser, “Parity, circuits, and the polynomial-time hierarchy,” Math. Systems Theory, vol. 17, pp. 13–27, 1984.
- [36] N. Linial, M. Y., and N. N., “Constant depth circuits, fourier transform, and learnability,” Journal of the ACM, vol. 40, no. 3, p. 607–620, 1993.
- [37] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards ai,” in Large-Scale Kernel Machines (L. Bottou, O. Chapelle, and J. DeCoste, D. Weston, eds.), MIT Press, 2007.
- [38] Y. Mansour, “Learning boolean functions via the fourier transform,” in Theoretical Advances in Neural Computation and Learning (V. Roychowdhury, K. Siu, and A. Orlitsky, eds.), pp. 391–424, Springer US, 1994.
- [39] S. Soatto, “Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control,” arXiv:1110.2053, pp. 0–151, 2011.
- [40] F. Anselmi, J.Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, “Unsupervised learning of invariant representations,” Theoretical Computer Science, 2015.
- [41] F. Anselmi and T. Poggio, Visual Cortex and Deep Networks. MIT Press, 2016.
- [42] L. Grasedyck, “Hierarchical Singular Value Decomposition of Tensors,” SIAM J. Matrix Anal. Appl., no. 31,4, pp. 2029–2054, 2010.
- [43] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge eBooks, 2014.
- [44] T. Poggio and W. Reichardt, “On the representation of multiinput systems: Computational properties of polynomial algorithms.,” Biological Cybernetics, 37, 3, 167‒186., 1980.
- [45] M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge MA: The MIT Press, ISBN 0‒262‒63022‒2, 1972.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-716e526e-90b7-45f8-8637-0dba92b852cc