Theory I: Deep networks and the curse of dimensionality

Poggio, T.; Liao, Q.

doi:10.24425/bpas.2018.125924

Artykuł - szczegóły

Tytuł artykułu

Theory I: Deep networks and the curse of dimensionality

Autorzy

Poggio T. , Liao Q.

Treść / Zawartość

Pełne teksty:

02_761-774_00966_Bpast.No.66-6_28.12.18_K1.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2018.125924

Warianty tytułu

Języki publikacji

Abstrakty

We review recent work characterizing the classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.

Słowa kluczowe

deep network shallow network convolutional neural network function approximation deep learning

sieci neuronowe aproksymacja funkcji uczenie głębokie

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2018

Tom

Vol. 66, nr 6

Strony

761--773

Opis fizyczny

Bibliogr. 45 poz., rys., wykr.

Twórcy

autor

Poggio T.

tp@csail.mit.edu

Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139

autor

Liao Q.

Center for Brains, Minds, and Machines, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139

Bibliografia

[1] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao, “Theory i: Why and when can deep networks avoid the curse of dimensionality?,” tech. rep., MIT Center for Brains, Minds and Machines, 2016.
[2] F. Anselmi, L. Rosasco, C. Tan, and T. Poggio, “Deep convolutional network are hierarchical kernel machines,” Center for Brains, Minds and Machines (CBMM) Memo No. 35, also in arXiv, 2015.
[3] T. Poggio, L. Rosasco, A. Shashua, N. Cohen, and F. Anselmi, “Notes on hierarchical splines, dclns and i-theory,” tech. rep., MIT Computer Science and Artificial Intelligence Laboratory, 2015.
[4] T. Poggio, F. Anselmi, and L. Rosasco, “I-theory on depth vs width: hierarchical function composition,” CBMM memo 041, 2015.
[5] H. Mhaskar, Q. Liao, and T. Poggio, “Learning real and boolean functions: When is deep better than shallow?,” Center for Brains, Minds and Machines (CBMM) Memo No. 45, also in arXiv, 2016.
[6] H. Mhaskar and T. Poggio, “Deep versus shallow networks: an approximation theory perspective,” Center for Brains, Minds and Machines (CBMM) Memo No. 54, also in arXiv, 2016.
[7] D. L. Donoho, “High-dimensional data analysis: The curses and blessings of dimensionality,” in AMS CONFERENCE ON MATH CHALLENGES OF THE 21ST CENTURY, 2000.
[8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, pp. 436–444, 2015.
[9] K. Fukushima, “Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
[10] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. 2, pp. 1019–1025, Nov. 1999.
[11] H. Mhaskar, “Approximation properties of a multilayered feedforward artificial neural network,” Advances in Computational Mathematics, pp. 61–80, 1993.
[12] C. Chui, X. Li, and H. Mhaskar, “Neural networks for localized approximation,” Mathematics of Computation, vol. 63, no. 208, pp. 607–623, 1994.
[13] C. K. Chui, X. Li, and H. N. Mhaskar, “Limitations of the approximation capabilities of neural networks with one hidden layer,” Advances in Computational Mathematics, vol. 5, no. 1, pp. 233–243, 1996.
[14] A. Pinkus, “Approximation theory of the mlp model in neural networks,” Acta Numerica, vol. 8, pp. 143–195, 1999.
[15] T. Poggio and S. Smale, “The mathematics of learning: Dealing with data,” Notices of the American Mathematical Society (AMS), vol. 50, no. 5, pp. 537–544, 2003.
[16] B.B. Moore and T. Poggio, “Representations properties of multilayer feedforward networks,” Abstracts of the First annual INNS meeting, vol. 320, p. 502, 1998.
[17] R. Livni, S. Shalev-Shwartz, and O. Shamir, “A provably efficient algorithm for training deep networks,” CoRR, vol. abs/1304.7045, 2013.
[18] O. Delalleau and Y. Bengio, “Shallow vs. deep sum-product networks,” in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12‒14 December 2011, Granada, Spain., pp. 666–674, 2011.
[19] R. Montufar, G.F. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” Advances in Neural Information Processing Systems, vol. 27, pp. 2924–2932, 2014.
[20] H.N. Mhaskar, “Neural networks for localized approximation of real functions,” in Neural Networks for Processing [1993] III. Proceedings of the 1993 IEEE-SP Workshop, pp. 190–196, IEEE, 1993.
[21] N. Cohen, O. Sharir, and A. Shashua, “On the expressive power of deep learning: a tensor analysis,” CoRR, vol. abs/1509.0500, 2015.
[22] M. Telgarsky, “Representation benefits of deep feedforward networks,” arXiv preprint arXiv:1509.08101v2 [cs.LG] 29 Sep 2015, 2015.
[23] I. Safran and O. Shamir, “Depth separation in relu networks for approximating smooth non-linear functions,” arXiv: 1610.09887v1, 2016.
[24] H.N. Mhaskar, “Neural networks for optimal approximation of smooth and analytic functions,” Neural Computation, vol. 8, no. 1, pp. 164–177, 1996.
[25] E. Corominas and F.S. Balaguer, “Condiciones para que una funcion infinitamente derivable sea un polinomio,” Revista matemática hispanoamericana, vol. 14, no. 1, pp. 26–43, 1954.
[26] R.A. DeVore, R. Howard, and C.A. Micchelli, “Optimal nonlinear approximation,” Manuscripta mathematica, vol. 63, no. 4, pp. 469–478, 1989.
[27] H.N. Mhaskar, “On the tractability of multivariate integration and approximation by neural networks,” J. Complex., vol. 20, pp. 561–590, Aug. 2004.
[28] F. Bach, “Breaking the curse of dimensionality with convex neural networks,” arXiv:1412.8690, 2014.
[29] D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
[30] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.
[31] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. Software available from tensorflow.org.
[32] R. Eldan and O. Shamir, “The power of depth for feedforward neural networks,” arXiv preprint arXiv:1512.03965v4, 2016.
[33] M. Lin and H. Tegmark, “Why does deep and cheap learning work so well?,” arXiv:1608.08225, pp. 1–14, 2016.
[34] J.T. Hastad, Computational Limitations for Small Depth Circuits. MIT Press, 1987.
[35] M. Furst, J. Saxe, and M. Sipser, “Parity, circuits, and the polynomial-time hierarchy,” Math. Systems Theory, vol. 17, pp. 13–27, 1984.
[36] N. Linial, M. Y., and N. N., “Constant depth circuits, fourier transform, and learnability,” Journal of the ACM, vol. 40, no. 3, p. 607–620, 1993.
[37] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards ai,” in Large-Scale Kernel Machines (L. Bottou, O. Chapelle, and J. DeCoste, D. Weston, eds.), MIT Press, 2007.
[38] Y. Mansour, “Learning boolean functions via the fourier transform,” in Theoretical Advances in Neural Computation and Learning (V. Roychowdhury, K. Siu, and A. Orlitsky, eds.), pp. 391–424, Springer US, 1994.
[39] S. Soatto, “Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control,” arXiv:1110.2053, pp. 0–151, 2011.
[40] F. Anselmi, J.Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, “Unsupervised learning of invariant representations,” Theoretical Computer Science, 2015.
[41] F. Anselmi and T. Poggio, Visual Cortex and Deep Networks. MIT Press, 2016.
[42] L. Grasedyck, “Hierarchical Singular Value Decomposition of Tensors,” SIAM J. Matrix Anal. Appl., no. 31,4, pp. 2029–2054, 2010.
[43] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge eBooks, 2014.
[44] T. Poggio and W. Reichardt, “On the representation of multiinput systems: Computational properties of polynomial algorithms.,” Biological Cybernetics, 37, 3, 167‒186., 1980.
[45] M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge MA: The MIT Press, ISBN 0‒262‒63022‒2, 1972.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-716e526e-90b7-45f8-8637-0dba92b852cc