A Generalized Learning Approach to Deep Neural Networks

Ponti, Francesca; Frezza, Fabrizio; Simeoni, Patrizio; Parisi, Raffaele

doi:10.26636/jtit.2024.3.1454

Artykuł - szczegóły

Tytuł artykułu

A Generalized Learning Approach to Deep Neural Networks

Autorzy

Ponti Francesca , Frezza Fabrizio , Simeoni Patrizio , Parisi Raffaele

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.26636/jtit.2024.3.1454

Warianty tytułu

Języki publikacji

Abstrakty

Optimization of machine learning architectures is essential in determining the efficacy and the applicability of any neural architecture to real world problems. In this work a generalized Newton's method (GNM) is presented as a powerful approach to learning in deep neural networks (DNN). This technique was compared to two popular approaches, namely the stochastic gradient descent (SGD) and the Adam algorithm, in two popular classification tasks. The performance of the proposed approach confirmed it as an attractive alternative to state-of-the-art first order solutions. Due to the good results presented in the case of shallow DNN, in the last part of the article an hybrid optimization Method is presented. This method consists in combining two optimization algorithms, i.e. GNM and Adam or GNM and SGD, during the training phase within the layers of the neural network. This configuration aims to benefit from the strengths of both first- and second-order algorithms. In this case a convolutional neural network is considered and its parameters are updated with a different optimization algorithm. Also in this case, the hybrid approach returns the best performance with respect to the first order algorithms.

Słowa kluczowe

deep neural network machine learning optimization

Wydawca

Instytut Łączności - Państwowy Instytut Badawczy

Czasopismo

Journal of Telecommunications and Information Technology

Rocznik

2024

Tom

nr 3

Strony

36--42

Opis fizyczny

Bibliogr. 28 poz., rys., tab.

Twórcy

autor

Ponti Francesca

francesca.ponti@uniroma.it

"Sapienza" University of Rome, Rome, Italy

https://orcid.org/0000-0003-1855-7280

autor

Frezza Fabrizio

abrizio.frezza@uniroma1.it

"Sapienza" University of Rome, Rome, Italy

https://orcid.org/0000-0001-9457-7617

autor

Simeoni Patrizio

patrizio.simeoni@setu.ie

South East Technological University, Carlow, Ireland

https://orcid.org/0009-0008-7365-7624

autor

Parisi Raffaele

"Sapienza" University of Rome, Rome, Italy

Bibliografia

[1] Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning", Nature, vol. 521, pp. 436-444, 2015.
[2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016 (http://www.deeplearningbook.org).
[3] Y. Bengio, Y. LeCun, and G. Hinton, "Deep Learning for AI", Communications of the ACM, vol. 64, no. 7, pp. 58-65, 2021.
[4] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 502 p., 1996 (ISBN: 9780198538646).
[5] R.D. Reed and R.J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, 1999.
[6] R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "Fast Adaptive Digital Equalization by Recurrent Neural Networks", IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2731-2739, 1997.
[7] R. Battiti, "First- and Second-order Methods for Learning: Between Steepest Descent and Newton’s Method", Neural Computation, vol. 4, no. 2, pp. 141-166, 1992.
[8] L. Bottou, F.E. Curtis, and J. Nocedal, "Optimization Methods for Large-scale Machine Learning", SIAM Review, vol. 60, no. 2, pp. 223-311, 2018.
[9] J. Nocedal and S.J. Wright, Numerical Optimization, Springer, 664 p., 2006.
[10] A.S. Berahas, M. Jahani, P. Richtárik, and M. Takác, "Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample", arXiv, 2019.
[11] D. Goldfarb, Y. Ren, and A. Bahamou, "Practical Quasi-Newton Methods for Training Deep Neural Networks", arXiv, 2020.
[12] A.S. Berahas, R. Bollapragada, and J. Nocedal, "An Investigation of Newton-Sketch and Subsampled Newton Methods", Optimization Methods and Software, vol. 35, no. 4, pp. 661-680, 2020.
[13] A.S. Berahas and M. Takác, "A Robust Multi-batch L-BFGS Method for Machine Learning", Optimization Methods and Software, vol. 35, no. 1, pp. 191-219, 2020.
[14] J.E. Dennis, Jr. and J.J. Moré, "Quasi-Newton Methods, Motivation and Theory", SIAM Review, vol. 19, no. 1, pp. 46-89, 1977.
[15] Z. Yao et al., "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning", arXiv, 2020.
[16] R. Anil et al., "Scalable Second Order Optimization for Deep Learning", arXiv, 2020.
[17] J.D. Lee et al., "First-order Methods Almost Always Avoid Saddle Points", arXiv, 2017.
[18] R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "A Generalized Learning Paradigm Exploiting the Structure of Feedforward Neural Networks", IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1450-1460, 1996.
[19] S. Ruder, "An Overview of Gradient Descent Optimization Algorithms", arXiv, 2016.
[20] H. Robbins and S. Monro, "A Stochastic Approximation Method", The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400-407, 1951.
[21] R. Rojas, Neural Networks. A Systematic Introduction, Springer, 504 p., 2006.
[22] D.E. Rumelhart and J.L. McClelland, "Learning Internal Representations by Error Propagation", in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 318-362, 1987 (ISBN: 9780262291408).
[23] D.P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv, 2014.
[24] J.D. Lee et al., "Basic Classification: Classify Images of Clothing" (https://www.tensorflow.org/tutorials/keras/classification).
[25] Y. LeCun, C. Cortes, and C.J.C. Burges, The MNIST Database of Handwritten Digits, 2012 (http://yann.lecun.com/exdb/mnist/).
[26] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems", arXiv, 2016.
[27] P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-energy Physics with Deep Learning", Nature Communications, vol. 5, art. no. 4308, 2014.
[28] D.-Y. Ge et al., "Design of High Accuracy Detector for MNIST Handwritten Digit Recognition Based on Convolutional Neural Network", 2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xiangtan, China, 2019.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6eba900f-1cee-46ed-a747-62524908e26f