Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Optimization of machine learning architectures is essential in determining the efficacy and the applicability of any neural architecture to real world problems. In this work a generalized Newton's method (GNM) is presented as a powerful approach to learning in deep neural networks (DNN). This technique was compared to two popular approaches, namely the stochastic gradient descent (SGD) and the Adam algorithm, in two popular classification tasks. The performance of the proposed approach confirmed it as an attractive alternative to state-of-the-art first order solutions. Due to the good results presented in the case of shallow DNN, in the last part of the article an hybrid optimization Method is presented. This method consists in combining two optimization algorithms, i.e. GNM and Adam or GNM and SGD, during the training phase within the layers of the neural network. This configuration aims to benefit from the strengths of both first- and second-order algorithms. In this case a convolutional neural network is considered and its parameters are updated with a different optimization algorithm. Also in this case, the hybrid approach returns the best performance with respect to the first order algorithms.
Słowa kluczowe
Rocznik
Tom
Strony
36--42
Opis fizyczny
Bibliogr. 28 poz., rys., tab.
Twórcy
autor
- "Sapienza" University of Rome, Rome, Italy
autor
- "Sapienza" University of Rome, Rome, Italy
autor
- South East Technological University, Carlow, Ireland
autor
- "Sapienza" University of Rome, Rome, Italy
Bibliografia
- [1] Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning", Nature, vol. 521, pp. 436-444, 2015.
- [2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016 (http://www.deeplearningbook.org).
- [3] Y. Bengio, Y. LeCun, and G. Hinton, "Deep Learning for AI", Communications of the ACM, vol. 64, no. 7, pp. 58-65, 2021.
- [4] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 502 p., 1996 (ISBN: 9780198538646).
- [5] R.D. Reed and R.J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, 1999.
- [6] R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "Fast Adaptive Digital Equalization by Recurrent Neural Networks", IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2731-2739, 1997.
- [7] R. Battiti, "First- and Second-order Methods for Learning: Between Steepest Descent and Newton’s Method", Neural Computation, vol. 4, no. 2, pp. 141-166, 1992.
- [8] L. Bottou, F.E. Curtis, and J. Nocedal, "Optimization Methods for Large-scale Machine Learning", SIAM Review, vol. 60, no. 2, pp. 223-311, 2018.
- [9] J. Nocedal and S.J. Wright, Numerical Optimization, Springer, 664 p., 2006.
- [10] A.S. Berahas, M. Jahani, P. Richtárik, and M. Takác, "Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample", arXiv, 2019.
- [11] D. Goldfarb, Y. Ren, and A. Bahamou, "Practical Quasi-Newton Methods for Training Deep Neural Networks", arXiv, 2020.
- [12] A.S. Berahas, R. Bollapragada, and J. Nocedal, "An Investigation of Newton-Sketch and Subsampled Newton Methods", Optimization Methods and Software, vol. 35, no. 4, pp. 661-680, 2020.
- [13] A.S. Berahas and M. Takác, "A Robust Multi-batch L-BFGS Method for Machine Learning", Optimization Methods and Software, vol. 35, no. 1, pp. 191-219, 2020.
- [14] J.E. Dennis, Jr. and J.J. Moré, "Quasi-Newton Methods, Motivation and Theory", SIAM Review, vol. 19, no. 1, pp. 46-89, 1977.
- [15] Z. Yao et al., "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning", arXiv, 2020.
- [16] R. Anil et al., "Scalable Second Order Optimization for Deep Learning", arXiv, 2020.
- [17] J.D. Lee et al., "First-order Methods Almost Always Avoid Saddle Points", arXiv, 2017.
- [18] R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "A Generalized Learning Paradigm Exploiting the Structure of Feedforward Neural Networks", IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1450-1460, 1996.
- [19] S. Ruder, "An Overview of Gradient Descent Optimization Algorithms", arXiv, 2016.
- [20] H. Robbins and S. Monro, "A Stochastic Approximation Method", The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400-407, 1951.
- [21] R. Rojas, Neural Networks. A Systematic Introduction, Springer, 504 p., 2006.
- [22] D.E. Rumelhart and J.L. McClelland, "Learning Internal Representations by Error Propagation", in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 318-362, 1987 (ISBN: 9780262291408).
- [23] D.P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv, 2014.
- [24] J.D. Lee et al., "Basic Classification: Classify Images of Clothing" (https://www.tensorflow.org/tutorials/keras/classification).
- [25] Y. LeCun, C. Cortes, and C.J.C. Burges, The MNIST Database of Handwritten Digits, 2012 (http://yann.lecun.com/exdb/mnist/).
- [26] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems", arXiv, 2016.
- [27] P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-energy Physics with Deep Learning", Nature Communications, vol. 5, art. no. 4308, 2014.
- [28] D.-Y. Ge et al., "Design of High Accuracy Detector for MNIST Handwritten Digit Recognition Based on Convolutional Neural Network", 2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xiangtan, China, 2019.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6eba900f-1cee-46ed-a747-62524908e26f