Accelerating neural network training with FSGQR : a scalable and high-performance alternative to ADAM

Bilski, Jarosław; Kowalczyk, Bartosz; Dymova, Ludmila; Xiao, Min

doi:10.2478/jaiscr-2025-0006

Artykuł - szczegóły

Tytuł artykułu

Accelerating neural network training with FSGQR : a scalable and high-performance alternative to ADAM

Autorzy

Bilski Jarosław , Kowalczyk Bartosz , Dymova Ludmila , Xiao Min

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2025-0006

Warianty tytułu

Języki publikacji

Abstrakty

This paper introduces a significant advancement in neural network training algorithms through the development of a Fast Scaled Givens rotations in QR decomposition (FSGQR) method based on the recursive least squares (RLS) method. The algorithm represents an optimized variant of existing rotation-based training approaches, distinguished by its complete elimination of scale factors from calculations while maintaining mathematical precision. Through extensive experimentation across multiple benchmarks, including complex tasks like the MNIST digit recognition and concrete strength prediction, FSGQR demonstrates superior performance compared to the widely-used ADAM optimizer and other conventional training methods. The algorithm achieves faster convergence with fewer training epochs while maintaining or improving accuracy.In some tasks, FSGQR completed training in up to five times fewer epochs compared to the ADAM algorithm, while it achieved higher recognition accuracy in the MNIST training set. The paper provides comprehensive mathematical foundations for the optimization, detailed implementation guidelines, and extensive empirical validation across various neural network architectures. The results conclusively demonstrate that FSGQR offers a compelling alternative to current deep learning optimization methods, particularly for applications requiring rapid training convergence without sacrificing accuracy. The algorithm’s effectiveness is particularly noteworthy in feedforward neural networks with differentiable activation functions, making it a valuable tool for modern machine learning applications.

Słowa kluczowe

neural network training algorithm QR decomposition scaled Givens rotation approximation classification

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2025

Tom

Vol. 15, No. 2

Strony

95--113

Opis fizyczny

Bibliogr. 45 poz., rys.

Twórcy

autor

Bilski Jarosław

jaroslaw.bilski@pcz.pl

Department of Artificial Intelligence, Częstochowa University of Technology, al. Armii Krajowej 36, 42-200 Częstochowa, Poland

https://orcid.org/0000-0003-1769-3934

autor

Kowalczyk Bartosz

Department of Artificial Intelligence, Częstochowa University of Technology, al. Armii Krajowej 36, 42-200 Częstochowa, Poland

https://orcid.org/0000-0002-7683-9051

autor

Dymova Ludmila

Information Technology Institute, SAN University, 90-113, Łódź, Poland

https://orcid.org/0000-0002-5387-9990

autor

Xiao Min

College of Automation & College of Artificial Intelligence Nanjing University of Posts and Telecommunications Nanjing 210003, China

https://orcid.org/0000-0002-8992-153X

Bibliografia

[1] Oveis Abedinia, Nima Amjady, and Noradin Ghadimi. Solar energy forecasting based on hybrid neural network and improved meta-heuristic algorithm. Computational Intelligence, 34(1):241–260, 2018.
[2] U. Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Hojjat Adeli. Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Computers in Biology and Medicine, 100:270–278, 2018.
[3] Igor Aizenberg, Dmitriy V. Paliy, Jacek M. Zurada, and Jaakko T. Astola. Blur identification by multilayer neural network based on multi-valued neurons. IEEE Transactions on Neural Networks, 19(5):883–898, 2008.
[4] E. Angelini, G. di Tollo, and A. Roli. A neural network approach for credit risk evaluation. The Quarterly Review of Economics and Finance, 48(4):733–755, 2008.
[5] Jarosław Bilski. Struktury równoległe dla jednokierunkowych i dynamicznych sieci neuronowych. Akademicka Oficyna Wydawnicza EXIT, 2013.
[6] Jarosław Bilski and Alexander I. Galushkin. A new proposition of the activation function for significant improvement of neural networks performance. In Artificial Intelligence and Soft Computing, volume 9602 of Lecture Notes in Computer Science, pages 35–45. Springer-Verlag Berlin Heidelberg, 2016.
[7] Jarosław Bilski and Leszek Rutkowski. A fast training algorithm for neural networks. IEEE Transaction on Circuits and Systems Part II, 45(6):749–753, 1998.
[8] Jarosław Bilski and Jacek Smoląg. Parallel realisation of the recurrent multi layer perceptron learning. Artificial Intelligence and Soft Computing, Springer-Verlag Berlin Heidelberg, (LNAI 7267):12–20, 2012.
[9] Jarosław Bilski and Jacek Smoląg. Parallel approach to learning of the recurrent Jordan neural network. Artificial Intelligence and Soft Computing, Springer-Verlag Berlin Heidelberg, (LNAI 7895):32–40, 2013.
[10] Jarosław Bilski and Jacek Smoląg. Parallel architectures for learning the RTRN and Elman dynamic neural network. IEEE Transactions on Parallel and Distributed Systems, 26(9):2561–2570, 2015.
[11] Jarosław Bilski, Jacek Smoląg, and Alexander I. Galushkin. The parallel approach to the conjugate gradient learning algorithm for the feed-forward neural networks. In Artificial Intelligence and Soft Computing, volume 8467 of Lecture Notes in Computer Science, pages 12–21. Springer-Verlag Berlin Heidelberg, 2014.
[12] Jarosław Bilski, Jacek Smoląg, and Jacek M. Zurada. Parallel approach to the Levenberg-Marquardt learning algorithm for feedforward neural networks. In Artificial Intelligence and Soft Computing, volume 9119 of Lecture Notes in Computer Science, pages 3–14. Springer-Verlag Berlin Heidelberg, 2015.
[13] Jarosław Bilski, Bartosz Kowalczyk, and Andrzej Cader. Modifications of the Givens training algorithm for artificial neural networks. In Leszek Rutkowski, Rafał Scherer, Marcin Korytkowski, Witold Pedrycz, Ryszard Tadeusiewicz, and Jacek M. Zurada, editors, Artificial Intelligence and Soft Computing, pages 14–28, Cham, 2019. Springer International Publishing.
[14] Jarosław Bilski, Bartosz Kowalczyk, Marek Kisiel-Dorohinicki, Agnieszka Siwocha, and Jacek Żurada. Towards a very fast feedforward multilayer neural networks training algorithm. Journal of Artificial Intelligence and Soft Computing Research, 12(3):181–195, 2022.
[15] Jarosław Bilski, Bartosz Kowalczyk, Andrzej Marjański, Michał Gandor, and Jacek Zurada. A novel fast feedforward neural networks training algorithm. Journal of Artificial Intelligence and Soft Computing Research, 11(4):287–306, 2021.
[16] Jarosław Bilski, Jacek Smoląg, Bartosz Kowalczyk, Konrad Grzanek, and Ivan Izonin. Fast computational approach to the Levenberg-Marquardt algorithm for training feedforward neural networks. Journal of Artificial Intelligence and Soft Computing Research, 12(2):45–61, 2023.
[17] W. Duch, K. Swaminathan, and J. Meller. Artificial intelligence approaches for rational drug design and discovery. Current Pharmaceutical Design, 13(14):1497–1508, 2007.
[18] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 07 2011.
[19] Marcin Gabryel, Eliza Kocić, Milan Kocić, Zofia Patora-Wysocka, Min Xiao, and Mirosław Pawlak. Accelerating user profiling in e-commerce using conditional GAN networks for synthetic data generation. Journal of Artificial Intelligence and Soft Computing Research, 14(4):309–319, 2024.
[20] Morven W. Gentleman. Least Squares Computations by Givens Transformations Without Square Roots. IMA Journal of Applied Mathematics, 12(3):329–336, 12 1973.
[21] Ghosh and Reilly. Credit card fraud detection with a neural-network. In 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, volume 3, pages 621–630, Jan 1994.
[22] Wallace Givens. Computation of plain unitary rotations transforming a general matrix to triangular form. Journal of The Society for Industrial and Applied Mathematics, 6:26–50, 1958.
[23] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen. Recent advances in convolutional neural networks. Pattern Recognition, 77:354–377, 2018.
[24] Martin T. Hagan and Mohammad B. Menhaj. Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neuralnetworks, 5:989–993, 1994.
[25] A. Horzyk and R. Tadeusiewicz. Self-optimizing neural networks. In Fu-Liang Yin, Jun Wang, and Chengan Guo, editors, Advances in Neural Networks – ISNN 2004, pages 150–155, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
[26] Andrzej Kiełbasiński and Hubert Schwetlick. Numeryczna Algebra Liniowa: Wprowadzenie do Obliczeń Zautomatyzowanych. Wydawnictwa Naukowo-Techniczne, Warszawa, 1992.
[27] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
[28] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[29] Dominik Lewy and Jacek Mańdziuk. Training CNN classifiers solely on webly data. Journal of Artificial Intelligence and Soft Computing Research, 13(1):75–92, 2023.
[30] Y. Li, R. Cui, Z. Li, and D. Xu. Neural network approximation based near-optimal motion planning with kinodynamic constraints using RRT. IEEE Transactions on Industrial Electronics, 65(11):8718–8729, Nov 2018.
[31] H. Liu, X. Mi, and Y. Li. Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network. Energy Conversion and Management, 156:498–514, 2018.
[32] M. Mazurowski, P. Habas, J. Zurada, J. Lo, J. Baker, and G. Tourassi. Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural networks : the official journal of the International Neural Network Society, 21:427–36, 03 2008.
[33] Warwick Nash, Tracy Sellers, Simon Talbot, Andrew Cawthorn, and Wes Ford. Abalone. UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C55C7W.
[34] Tacjana Niksa-Rynkiewicz, Piotr Stomma, Anna Witkowska, Danuta Rutkowska, Adam Słowik, Krzysztof Cpałka, Joanna Jaworek-Korjakowska, and Piotr Kolendo. An intelligent approach to short-term wind power prediction using deep neural networks. Journal of Artificial Intelligence and Soft Computing Research, 13(3):197–210, 2023.
[35] B.T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.
[36] Nataliya Shakhovska, Andrii Shebeko, and Yarema Prykarpatskyy. A novel explainable AI model for medical data analysis. Journal of Artificial Intelligence and Soft Computing Research, 14(2):121–137, 2024.
[37] R. Shirin. A neural network approach for retailer risk assessment in the aftermarket industry. Benchmarking: An International Journal, 26(5):1631–1647, Jan 2019.
[38] A.K. Singh, S.K. Jha, and A.V. Muley. Candidates selection using artificial neural network technique in a pharmaceutical industry. In Siddhartha Bhattacharyya, Aboul Ella Hassanien, Deepak Gupta, Ashish Khanna, and Indrajit Pan, editors, International Conference on Innovative Computing and Communications, pages 359–366, Singapore, 2019. Springer Singapore.
[39] R. Tadeusiewicz, L. Ogiela, and M.R. Ogiela. Cognitive analysis techniques in business planning and decision support systems. In L. Rutkowski, R. Tadeusiewicz, L.A. Zadeh, and J.M. Zurada, editors, Artificial Intelligence and Soft Computing – ICAISC 2006, pages 1027–1039, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[40] K.Y. Tam and M. Kiang. Predicting bank failures: A neural network approach. Applied Artificial Intelligence, 4(4):265–282, 1990.
[41] Paul Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Harvard University, 1974.
[42] B.M. Wilamowski. Neural network architectures and learning algorithms. IEEE Industrial Electronics Magazine, 3(4):56–63, 2009.
[43] I-Cheng Yeh. Concrete Compressive Strength. UCI Machine Learning Repository, 2007. DOI: https://doi.org/10.24432/C5PK67.
[44] Matthew D. Zeiler. Adadelta: An adaptive learning rate method, 2012.
[45] Junming Zhang, Hao Dong, Jinfeng Gao, Ruxian Yao, Gangqiang Li, and Haitao Wu. Self-organized operational neural networks for the detection of atrial fibrillation. Journal of Artificial Intelligence and Soft Computing Research, 14(1):63–75, 2024.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-8d7f6797-d1f3-4c37-8266-e0349698c06a