Two-step reinforcement learning for multistage strategy card game

Godlewski, Konrad; Sawicki, Bartosz

doi:10.24425/bpasts.2024.151376

Artykuł - szczegóły

Tytuł artykułu

Two-step reinforcement learning for multistage strategy card game

Autorzy

Godlewski Konrad , Sawicki Bartosz

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/bpasts.2024.151376

Warianty tytułu

Języki publikacji

Abstrakty

This study introduces a two-step reinforcement learning (RL) strategy tailored for "The Lord of the Rings: The Card Game", a complex multistage strategy card game. The research diverges from conventional RL methods by adopting a phased learning approach, beginning with a foundational learning step in a simplified version of the game and subsequently progressing to the complete, intricate game environment. This methodology notably enhances the AI agent’s adaptability and performance in the face of the unpredictable and challenging nature of the game. The paper also explores a multi-phase system where distinct RL agents are employed for various decision-making phases of the game. This approach has demonstrated remarkable improvement, with the RL agents achieving a winrate of 78.5 % at the highest difficulty level.

Słowa kluczowe

reinforcement learning incremental learning card games

gry karciane uczenie przyrostowe uczenie wzmacniające

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2024

Tom

Vol. 72, nr 6

Strony

art. no. e151376

Opis fizyczny

Bibliogr. 25 poz., rys., tab.

Twórcy

autor

Godlewski Konrad

Warsaw University of Technology, Poland

autor

Sawicki Bartosz

bartosz.sawicki@pw.edu.pl

Warsaw University of Technology, Poland

https://orcid.org/0000-0003-1832-759X

Bibliografia

[1] statista.com, “Card games – worldwide.” https://www.statista.com/outlook/dmo/app/games/card-games/worldwide, 2024.
[2] N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, 2019.
[3] K. Godlewski and B. Sawicki, “Optimisation of mcts player for the lord of the rings: The card game,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 69, no. 3, p. e136752, 2021.
[4] P. Barros, A. Tanevska, and A. Sciutti, “Learning from learners: Adapting reinforcement learning agents to be competitive in a card game,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2716–2723, IEEE, 2021.
[5] G. Yang et al., “Perfectdou: Dominating doudizhu with perfect information distillation,” Adv. Neural Inf. Process. Syst., vol. 35, pp. 34954–34965, 2022.
[6] R. Vieira, A.R. Tavares, and L. Chaimowicz, “Drafting in collectible card games via reinforcement learning,” in 2020 19th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 54–61, IEEE, 2020.
[7] D. Zha et al.„ “Rlcard: A toolkit for reinforcement learning in card games,” arXiv preprint arXiv:1910.04376, 2019.
[8] J. Zhao, W. Shu, Y. Zhao, W. Zhou, and H. Li, “Improving deep reinforcement learning with mirror loss,” IEEE Trans. Games, vol. 15, no. 3, pp. 337–347, 2023. doi: 10.1109/TG.2022.3164470.
[9] Z. Yao et al., “Towards modern card games with large-scale action spaces through action representation,” in 2022 IEEE Conference on Games (CoG), pp. 576–579, IEEE, 2022.
[10] N.R. Sturtevant and A.M. White, “Feature construction for reinforcement learning in hearts,” in Computers and Games: 5th International Conference, CG 2006, Turin, Italy, May 29-31, 2006. Revised Papers 5, pp. 122–134, Springer, 2007.
[11] N. Bard et al., “The hanabi challenge: A new frontier for ai research,” Artif. Intell., vol. 280, p. 103216, 2020.
[12] B. Grooten, J. Wemmenhove, M. Poot, and J. Portegies, “Is vanilla policy gradient overlooked? analyzing deep reinforcement learning for hanabi,” arXiv preprint arXiv:2203.11656, 2022.
[13] R. Canaan, X. Gao, J. Togelius, A. Nealen, and S. Menzel, “Generating and adapting to diverse ad-hoc partners in hanabi,” IEEE Trans. Games, vol. 15, no. 2, pp. 228–241, 2023. doi: 10.1109/TG.2022.3169168
[14] I. Bravi and S. Lucas, “Rinascimento: Playing splendor-like games with event-value functions,” IEEE Trans. Games, vol. 15, no. 1, pp. 16–25, 2022.
[15] J. Guo, B. Yang, P. Yoo, B.Y. Lin, Y. Iwasawa, and Y. Matsuo, “Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,” arXiv preprint arXiv:2309.17277, 2023.
[16] J. Kowalski and R. Miernik, “Evolutionary approach to collectible arena deckbuilding using active card game genes,” in 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, IEEE, 2020.
[17] R.S. Sutton and A.G. Barto, Reinforcement learning: An introduction. MIT Press, 2018.
[18] X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 4555–4576, 2021.
[19] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, pp. 41–48, ACM, 2009.
[20] Y. Tang, Y. Tian, J. Lu, P. Li, and J. Zhou, “Deep progressive reinforcement learning for skeleton-based action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5323–5332, 2018.
[21] M.G. Madden and T. Howley, “Transfer of experience between reinforcement learning environments with progressive difficulty,” Artif. Intell. Rev., vol. 21, no. 3-4, pp. 375–398, 2004.
[22] H.M. Fayek, L. Cavedon, and H.R. Wu, “Progressive learning: A deep learning framework for continual learning,” Neural Netw., vol. 128, pp. 345–357, 2020. doi: 10.1016/j.neunet.2020.05.011
[23] K. Godlewski, Monte Carlo Tree Search and Reinforcement Learning methods for multi-stage strategic card game. PhD thesis, Warsaw University of Technology, 2023. doi: 10.13140/RG.2.2.36103.16808
[24] R. Miernik and J. Kowalski, “Evolving evaluation functions for collectible card game AI,” arXiv preprint arXiv:2105.01115, 2021.
[25] O. Vinyals et al., “Grandmaster level in starcraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e634a3b9-a659-445b-85a9-e7c188cc55ad