Analiza możliwości wykorzystania algorytmów uczenia maszynowego w środowisku Unity

Litwynenko, Karina; Plechawska-Wójcik, Małgorzata

doi:10.35784/jcsi.2680

Artykuł - szczegóły

Tytuł artykułu

Analiza możliwości wykorzystania algorytmów uczenia maszynowego w środowisku Unity

Autorzy

Litwynenko Karina , Plechawska-Wójcik Małgorzata

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.35784/jcsi.2680

Warianty tytułu

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Języki publikacji

Abstrakty

Algorytmy uczenia ze wzmocnieniem zyskują coraz większą popularność, a ich rozwój jest możliwy dzięki istnieniu narzędzi umożliwiających ich badanie. Niniejszy artykuł dotyczy możliwości zastosowania algorytmów uczenia maszynowego na platformie Unity wykorzystującej bibliotekę Unity ML-Agents Toolkit. Celem badania było porównanie dwóch algorytmów: Proximal Policy Optimization oraz Soft Actor-Critic. Zweryfikowano również możliwość poprawy wyników uczenia poprzez łączenie tych algorytmów z metodą uczenia przez naśladowanie Generative Adversarial Imitation Learning. Wyniki badania wykazały, że algorytm PPO może sprawdzić się lepiej w nieskomplikowanych środowiskach o nienatychmiastowym charakterze nagród, zaś dodatkowe zastosowanie GAIL może wpłynąć na poprawę skuteczności uczenia.

Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.

Słowa kluczowe

uczenie ze wzmocnieniem uczenie przez naśladowanie Unity

reinforcement learning imitation learning Unity

Wydawca

Wydawnictwo Politechniki Lubelskiej

Czasopismo

Journal of Computer Sciences Institute

Rocznik

2021

Tom

Vol. 20

Strony

197--204

Opis fizyczny

Bibliogr. 16 poz., rys., tab.

Twórcy

autor

Litwynenko Karina

autor

Plechawska-Wójcik Małgorzata

Bibliografia

1. Juliani, V. P. Berges, E. Vckay, Y. Gao, H. Henry, M. Mattar, D. Lange, Unity: A general platform for intelligent agents, arXiv preprint arXiv:1809.02627v2 (2020).
2. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
3. T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actorcritic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 80 (2018) 1861–1870.
4. J. Ho, S. Ermon, Generative adversarial imitation learning. Advances in neural information processing systems, (2016) 4565–4573.
5. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation Learning: A Survey of Learning Methods. ACM Computing Surveys (CSUR), 50(2) (2017) 1–35 https://doi.org/10.1145/3054912.
6. R. S Sutton, A. G. Barto, Reinforcement Learning: An Introduction. Second edition. The MIT Press (2018).
7. J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization. In International conference on machine learning (2015) 1889–1897.
8. M. Urmanov, M. Alimanova, A. Nurkey, Training Unity Machine Learning Agents using reinforcement learning method. In 2019 15th International Conference on Electronics, Computer and Computation (ICECCO), (2019) 1–4, https://doi.org/10.1109/ICECCO48375.2019.9043194.
9. M. Pleines, F. Zimmer, V. Berges, Action Spaces in Deep Reinforcement Learning to Mimic Human Input Devices, 2019 IEEE Conference on Games (CoG), (2019) 1–8 https://dx.doi.org/10.1109/CIG.2019.8848080.
10. V. Mnih, K. Kavukcuoglu, D. Silver et al., Human-level control through deep reinforcement learning. Nature, 518(7540) (2015) 529–533.
11. M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47 (2013) 253–279.
12. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, D. Guo, C. Blundell, Agent57: Outperforming the Atari Human Benchmark, International Conference on Machine Learning (2020) 507–517.
13. Defazio, T. Graepel, A comparison of learning algorithms on the arcade learning environment. arXiv preprint arXiv:1410.8620 (2014).
14. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang,W. Zaremba, OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016).
15. Tavakoli, F. Pardo, P. Kormushev, Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. No. 1. (2018).
16. Dokumentacja biblioteki ML-Agents Toolkit — opis i zalecany zakres wartości hiperparametrów uczenia, https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md, [04.05.2021].

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9b841f18-8bbe-44fd-b10e-dad5c41b0e43