Metrics for assessing generalization of deep reinforcement learning in parameterized environments

Aleksandrowicz, Maciej; Jaworek-Korjakowska, Joanna

doi:10.2478/jaiscr-2024-0003

Artykuł - szczegóły

Tytuł artykułu

Metrics for assessing generalization of deep reinforcement learning in parameterized environments

Autorzy

Aleksandrowicz Maciej , Jaworek-Korjakowska Joanna

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2024-0003

Warianty tytułu

Języki publikacji

Abstrakty

In this work, a study focusing on proposing generalization metrics for Deep Reinforcement Learning (DRL) algorithms was performed. The experiments were conducted in DeepMind Control (DMC) benchmark suite with parameterized environments. The performance of three DRL algorithms in selected ten tasks from the DMC suite has been analysed with existing generalization gap formalism and the proposed ratio and decibel metrics. The results were presented with the proposed methods: average transfer metric and plot for environment normal distribution. These efforts allowed to highlight major changes in the model’s performance and add more insights about making decisions regarding models’ requirements.

Słowa kluczowe

deep reinforcement learning optimization generalization Sim2Sim transfer adaptation

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2024

Tom

Vol. 14, No. 1

Strony

45--61

Opis fizyczny

Bibliogr. 33 poz., rys.

Twórcy

autor

Aleksandrowicz Maciej

macal@agh.edu.pl

Department of Automatic Control and Robotics, AGH University of Krakow, al. A. Mickiewicza 30, Building B-1, 30-059 Krakow

autor

Jaworek-Korjakowska Joanna

Department of Automatic Control and Robotics, AGH University of Krakow, al. A. Mickiewicza 30, Building B-1, 30-059 Krakow

Bibliografia

[1] Robert Kirk, Amy Zhang, Edward Grefenstette, and Tim Rockt¨aschel. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning. Journal of Artificial Intelligence Research, 76: 201–264, January 2023. ISSN 1076-9757. doi:10.1613/jair.1.14174.
[2] Katsuhiko Ogata. Modern Control Engineering. Prentice Hall, 2010. ISBN 978-0-13-615673-4.
[3] Richard S. Sutton and Andrew G. Barto. Sutton & Barto Book: Reinforcement Learning: An Introduction. 2018. ISBN 978-0-262-03924-6.
[4] Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control. 2019. ISBN 978-1-886529-39-7.
[5] Hiroki Furuta and et al. Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning. In Proceedings of the 38th International Conference on Machine Learning, pages 3541–3552. PMLR, July 2021.
[6] Richard S. Sutton, Michael H. Bowling, and Patrick M. Pilarski. The Alberta Plan for AI Research, August 2022.
[7] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms. August 2017.
[8] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust Region Policy Optimization. April 2017.
[9] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat], August 2018.
[10] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs, stat], July 2019.
[11] Assaf Hallak, Dotan Di Castro, and Shie Mannor. Contextual Markov Decision Processes, February 2015.
[12] Dibya Ghosh, Jad Rahme, Aviral Kumar, Amy Zhang, Ryan P. Adams, and Sergey Levine. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability, July 2021.
[13] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv:1606.01540 [cs], June 2016.
[14] Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess, and Yuval Tassa. Dm control: Software and tasks for continuous control. Software Impacts, 6: 100022, November 2020. ISSN 26659638. doi:10.1016/j.simpa.2020.100022.
[15] Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, pages 2048–2056. PMLR, November 2020.
[16] Kevin Frans and Phillip Isola. Powderworld: A Platform for Understanding Generalization via Rich Task Distributions, November 2022.
[17] Farama Foundation. Gymnasium, 2023. URL https://gymnasium.farama.org/.
[18] Sumukh Aithal K, Dhruva Kashyap, and Natarajan Subramanyam. Robustness to Augmentations as a Generalization metric. arXiv:2101.06459 [cs], January 2021.
[19] OpenAI, Ilge Akkaya, and et al. Solving Rubik’s Cube with a Robot Hand, October 2019.
[20] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic Algorithms and Applications. arXiv:1812.05905 [cs, stat], January 2019.
[21] Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, and Shimon Whiteson. A Survey of Meta-Reinforcement Learning, January 2023.
[22] Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ahenbühl, Vladlen Koltun, and Dawn Song. Assessing Generalization in Deep Reinforcement Learning. March 2019.
[23] Jianda Chen and Sinno Pan. Learning representations via a robust behavioral metric for deep reinforcement learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36654–36666. Curran Associates, Inc., 2022.
[24] Sam Witty, Jun K. Lee, Emma Tosch, Akanksha Atrey, Kaleigh Clary, Michael L. Littman, and David Jensen. Measuring and characterizing generalization in deep reinforcement learning. Applied AI Letters, 2(4), December 2021. ISSN 2689-5595, 2689-5595. doi:10.1002/ail2.45.
[25] Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. Quantifying Generalization in Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pages 1282–1289. PMLR, May 2019.
[26] Qucheng Peng, Zhengming Ding, Lingjuan Lyu, Lichao Sun, and Chen Chen. RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 4118–4126. International Joint Conferences on Artificial Intelligence Organization,. ISBN 978-1-956792-03-4. doi:10.24963/ijcai.2023/458. URL https://www.ijcai.org/proceedings/2023/458 .
[27] Qucheng Peng, Ce Zheng, and Chen Chen. Source-free Domain Adaptive Human Pose Estimation. pages 4826–4836. URL https://openaccess.thecvf.com/content/ICCV2023/html/Peng_Source-free_Domain_Adaptive_Human_Pose_Estimation_ICCV_2023_paper.html.
[28] Xingyou Song, Yilun Du, and Jacob Jackson. An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. June 2019.
[29] Aravind Rajeswaran, Kendall Lowrey, Emanuel V. Todorov, and Sham M Kakade. Towards Generalization and Simplicity in Continuous Control. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[30] Philipp Moritz and et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI’18, pages 561–577, USA, October 2018. USENIX Association. ISBN 978-1-931971-47-8.
[31] Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, and Sergio Guadarrama. Measuring the Reliability of Reinforcement Learning Algorithms, February 2020.
[32] Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning. In Deep RL Workshop NeurIPS 2021, 2021.
[33] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering Diverse Domains through World Models, January 2023.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-079f252c-0c70-430e-8f67-9766aceb59a0