PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A hybrid model of heuristic algorithm and gradient descent to optimize neural networks

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Training a neural network can be a challenging task, particularly when working with complex models and large amounts of training data, as it consumes significant time and resources. This research proposes a hybrid model that combines population-based heuristic algorithms with traditional gradient-based techniques to enhance the training process. The proposed approach involves using a dynamic population-based heuristic algorithm to identify good initial values for the neural network weight vector. This is done as an alternative to the traditional technique of starting with random weights. After several cycles of distributing search agents across the search domain, the training process continues using a gradient-based technique that starts with the best initial weight vector identified by the heuristic algorithm. Experimental analysis confirms that exploring the search domain during the training process decreases the number of cycles needed for gradient descent to train a neural network. Furthermore, a dynamic population strategy is applied during the heuristic search, with objects added and removed dynamically based on their progress. This approach yields better results compared to traditional heuristic algorithms that use the same population members throughout the search process.
Rocznik
Strony
art. no. e147924
Opis fizyczny
Bibliogr. 42 poz., rys., tab.
Twórcy
autor
  • Sakarya University, Computer Engineering Department
  • Sakarya University, Information Systems Engineering Department
Bibliografia
  • [1] I.H. Osman and G. Laporte, “Metaheuristics: A bibliography,” Ann. Oper. Res., vol. 63, no. 5, pp. 511–623, Oct. 1996, doi: 10.1007/BF02125421.
  • [2] X.-S. Yang, Nature-inspired metaheuristic algorithms, 2. ed. Frome: Luniver Press, 2010.
  • [3] S. Amari, “Backpropagation and stochastic gradient descent method,” Neurocomputing, vol. 5, no. 4–5, pp. 185–196, Jun. 1993, doi: 10.1016/0925-2312(93)90006-O.
  • [4] D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization.” arXiv, Jan. 29, 2017. Accessed: Apr. 26, 2023. [On-line]. Available: http://arxiv.org/abs/1412.6980.
  • [5] Y. Nesterov, “Implementable tensor methods in unconstrained convex optimization,” Math. Program., vol. 186, no. 1–2, pp. 157–183, Mar. 2021, doi: 10.1007/s10107-019-01449-1.
  • [6] T. Dozat, “Incorporating Nesterov Momentum into Adam,” 2016.
  • [7] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, 2011.
  • [8] M.D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method.” arXiv, Dec. 22, 2012. Accessed: Apr. 26, 2023. [On-line]. Available: http://arxiv.org/abs/1212.5701.
  • [9] P. Liashchynskyi and P. Liashchynskyi, “Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS.” arXiv, Dec. 12, 2019. Accessed: Apr. 26, 2023. [Online]. Available: http://arxiv.org/abs/1912.06059.
  • [10] J.S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for Hyper-Parameter Optimization,” in Advances in Neural Information Processing Systems, 2011, vol. 24.
  • [11] J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Optimization,” J. Mach. Learn. Res., vol. 13, pp. 218–305, 2012.
  • [12] T. Domhan, J.T. Springenberg, and F. Hutter, “Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves,” in Proceedings of the 24th International Conference on Artificial Intelligence, 2015, pp. 3460–3468.
  • [13] I. Loshchilov and F. Hutter, “SGDR: Stochastic Gradient Descent with Warm Restarts.” arXiv, May 03, 2017. Accessed: Apr. 26, 2023. [Online]. Available: http://arxiv.org/abs/1608.03983.
  • [14] J. Rasley, Y. He, F. Yan, O. Ruwase, and R. Fonseca, “HyperDrive: exploring hyperparameters with POP scheduling,” in Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas Nevada: ACM, Dec. 2017, pp. 1–13, doi: 10.1145/3135974.3135994.
  • [15] A. Gyorgy and L. Kocsis, “E?cient Multi-Start Strategies for Local Search Algorithms,” arXiv, 16 Jan. 2014. [Online]. Available: https://arxiv.org/abs/1401.3894.
  • [16] P. Koch, O. Golovidov, S. Gardner, B. Wujek, J. Griffin, and Y. Xu, “Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining, London United Kingdom: ACM, Jul. 2018, pp. 443–452, doi: 10.1145/3219819.3219837.
  • [17] A. Aly, G. Guadagni, and J.B. Dugan, “Derivative-Free Optimization of Neural Networks using Local Search,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics &Mobile Communication Conference (UEMCON), New York City, NY, USA: IEEE, Oct. 2019, pp. 0293–0299, doi: 10.1109/UEMCON47517.2019.8993007.
  • [18] L.M. Rios and N.V. Sahinidis, “Derivative-free optimization: a review of algorithms and comparison of software implementations,” J. Glob. Optim., vol. 56, no. 3, pp. 1247–1293, Jul. 2013, doi: 10.1007/s10898-012-9951-y.
  • [19] N.H. Kadhim and Q. Mosa, “Review Optimized Artificial Neural Network by Meta-Heuristic Algorithm and its Applications,” J.- Qadisiyah Comput. Sci. Math., vol. 13, no. 3, pp. 2021–2021, doi: 10.29304/jqcm.2021.13.3.825.
  • [20] Z. Tian and S. Fong, “Survey of Meta-Heuristic Algorithms for Deep Learning Training,” in Optimization Algorithms – Methods and Applications, InTech, 2016, doi: 10.5772/63785.
  • [21] R. Mohapatra, S. Saha, C.A.C. Coello, A. Bhattacharya, S.S. Dhavala, and S. Saha, “AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence,” arXiv, May 2020, [Online]. Available: http://arxiv.org/abs/2006.09875.
  • [22] M. Kaminski, “Neural Network Training Using Particle Swarm Optimization – a Case Study,” in 2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR), Aug. 2019, pp. 115–120, doi: 10.1109/MMAR.2019.8864679.
  • [23] K.H. Lai, Z. Zainuddin, and P. Ong, “A study on the performance comparison of metaheuristic algorithms on the learning of neural networks,” in AIP Conference Proceedings, American Institute of Physics Inc., Aug. 2017, doi: 10.1063/1.4995871.
  • [24] M. Jaderberg et al., “Population Based Training of Neural Networks,” arXiv, Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.09846.
  • [25] S.R.Young, D.C. Rose, T.P. Karnowski, S.H. Lim, and R.M. Patton, “Optimizing deep learning hyper-parameters through an evolutionary algorithm,” in Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Austin, USA, Nov. 2015, doi: 10.1145/2834892.2834896.
  • [26] M. Mavrovouniotis and S. Yang, “Training neural networks with ant colony optimization algorithms for pattern classification,” Soft Comput., vol. 19, no. 6, pp. 1511–1522, Jun. 2015, doi: 10.1007/s00500-014-1334-5.
  • [27] X. Cui, W. Zhang, Z. Tüske, and M. Picheny, “Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks,” arXiv, Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.06773.
  • [28] F.P. Such, V. Madhavan, E. Conti, J. Lehman, K.O. Stanley, and J. Clune, “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning,” arXiv, Dec. 2017, [Online]. Available: http://arxiv.org/abs/1712.06567.
  • [29] G. Morse and K.O. Stanley, “Simple evolutionary optimization can rival stochastic gradient descent in neural networks,” in Proceedings of the 2016 Genetic and Evolutionary Computation Conference GECCO 2016, Jul. 2016, pp. 477–484, doi: 10.1145/2908812.2908916.
  • [30] A. Khan, R. Shah, M. Imran, A. Khan, J.I. Bangash, and K. Shah, “An alternative approach to neural network training based on hybrid bio meta-heuristic algorithm,” J. Ambient Intell. Humaniz. Comput., vol. 10, no. 10, pp. 3821–3830, Oct. 2019, doi: 10.1007/s12652-019-01373-4.
  • [31] R. Poli, J. Kennedy, and T.M. Blackwell, “Particle swarm optimization,” Swarm Intell., vol. 1, pp. 33–57, 1995.
  • [32] M. Dorigo and L.M. Gambardella, “Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem,” 1997. [Online]. Available: http://iridia.ulb.ac.be/dorigo/dorigo.html, http://www.idsia.ch/~luca.
  • [33] X.-S. Yang and S. Deb, “Cuckoo Search via Levy Flights,” arXiv, Mar. 2010, [Online]. Available: http://arxiv.org/abs/1003.1594.
  • [34] N.F. Johari, A.M. Zain, N.H. Mustaffa, and A. Udin, “Firefly algorithm for optimization problem,” in Applied Mechanics and Materials, 2013, pp. 512–517, doi: 10.4028/www.scientific.net/AMM.421.512.
  • [35] D. Karaboga, “An Idea Based on Honey Bee Swarm for Numerical Optimization”, Technical Report, Erciyes University, 2005.
  • [36] D. Polap and M. Wo´zniak, “Polar bear optimization algorithm: Meta-heuristic with fast population movement and dynamic birth and death mechanism,” Symmetry, vol. 9, no. 10, p. 203, Oct. 2017, doi: 10.3390/sym9100203.
  • [37] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv, Sep. 2016, [Online]. Available: http://arxiv.org/abs/1609.04747.
  • [38] A. Mirkhan and N. Celebi, “Binary Representation of Polar Bear Algorithm for Feature Selection,” Comput. Syst. Sci. Eng., vol. 43, no. 2, pp. 767–783, 2022, doi: 10.32604/csse.2022.023249.
  • [39] M. Aghili Nasr, M. Zangian, M. Abbasi, and A. Zolfaghari, “Neutronic and thermal-hydraulic aspects of loading pattern optimization during the first cycle of VVER-1000 reactor using Polar Bear Optimization method,” Ann. Nucl. Energy, vol. 133, pp. 538–548, Nov. 2019, doi: 10.1016/j.anucene.2019.06.042.
  • [40] V.K. Ojha, A. Abraham, and V. Snášel, “Metaheuristic design of feedforward neural networks: A review of two decades of research,” Eng. Appl. Artif. Intell., vol. 60, pp. 97–116, Apr. 2017, doi: 10.1016/j.engappai.2017.01.013.
  • [41] D.A. Simovici, C. Djeraba, Mathematical Tools for Data Mining. Springer, 2008.
  • [42] R. Livni, S. Shalev-Shwartz, O. Shamir, “On the Computational Efficiency of Training Neural Networks” in Advances in Neural Information Processing Systems, 2014, vol. 27.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-82ac81f3-0f1e-4641-bb7f-770cddf01831
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.