Overview of selected reinforcement learning solutions to several game theory problems

Jarosz, Robert

doi:10.5604/01.3001.0053.9698

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Overview of selected reinforcement learning solutions to several game theory problems

Autorzy

Jarosz Robert

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.5604/01.3001.0053.9698

Warianty tytułu

Przegląd wybranych rozwiązań opartych na uczeniu ze wzmocnieniem dla problemów z teorii gier

Języki publikacji

Abstrakty

This paper collects several applications of reinforcement learning in solving some problems related to game theory. The methods were selected to possibly show variety of problems and approaches. Selections includes Thompson Sampling, Q-learning, DQN and AlphaGo Zero using Monte Carlo Tree Search algorithm. Paper attempts to show intuition behind proposed algorithms with shallow explaining of technical details. This approach aims at presenting overview of the topic without assuming deep knowledge about statistics and artificial intelligence.

Artykuł gromadzi wybrane podejścia do rozwiązania problemów z teorii gier wykorzystując uczenie ze wzmocnieniem. Zastosowania zostały dobrane tak, aby przedstawić możliwie przekrojowo klasy problemów i podejścia do ich rozwiązania. W zbiorze wybranych algorytmów znalazły się: próbkowanie Thompsona, Q-learning (Q-uczenie), DQN, AlphaGo Zero. W artykule nacisk położono na przedstawienie intuicji sposobu działania algorytmów, koncentrując się na przeglądzie technologii zamiast na technicznych szczegółach.

Słowa kluczowe

artificial intelligence game theory Thompson sampling Q-learning DQN Monte Carlo tree search AlphaZero

sztuczna inteligencja teoria gier próbkowanie Thompsona przeszukiwanie drzew Monte Carlo

Wydawca

Institute of Computer and Information Systems, Faculty of Cybernetics, Military University of Technology

Czasopismo

Computer Science and Mathematical Modelling

Rocznik

2022

Tom

No. 15-16

Strony

13--22

Opis fizyczny

Bibliogr. 27 poz.

Twórcy

autor

Jarosz Robert

robert.jarosz@wat.edu.pl

Military University of Technology, Faculty of Cybernetics, Kaliskiego Str. 2, 00-908 Warsaw, Poland

Bibliografia

[1] Binmore K., Game theory: a very short introduction, OUP Oxford, 2007.
[2] Ameljańczyk A., „Teoria gier”, Vol. 690, p. 78, WAT, 1978.
[3] Lattimore T., Szepesvári C., Bandit algorithms, Cambridge University Press, 2020.
[4] Thompson W. R., „On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”, Biometrika, Vol. 25, No. 3-4, 285-294 (1933).
[5] Thompson W. R., „On the theory of apportionment”, American Journal of Mathematics, Vol. 57, No. 2, 450-456 (1935).
[6] Agrawal S., Goyal N., „Further optimal regret bounds for thompson sampling”, Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 31, pp. 99-107, USA, 2013.
[7] Chapelle O., Li L., „An empirical evaluation of thompson sampling”, in: Advance in Neural Information Processing Systems 24(NIPS 2011), Vol. 24, 1-9, 2011.
[8] Bolstad W. M., Curran J. M., Introduction to Bayesian statistics. John Wiley & Sons, 2016.
[9] Agrawal S., Goyal N., „Analysis of thompson sampling for the multi-armed bandit problem”, Proceedings of the Conference on Learning Theory, Edinburgh, UK, 25-27 June 2012, pp. 31-39.
[10] Kaufmann E., Korda N., Munos R., „Thompson sampling: An asymptotically optimal finite-time analysis”, in: International Conference on Algorithmic Learning Theory, LNCS 7568, pp. 199-213, Springer 2012.
[11] Auer P., Cesa-Bianchi N., Fischer P., „Finite-time analysis of the multiarmed bandit problem”, Mach. Learn., Vol. 47, No. 2, 235-256 (2002).
[12] Audibert J.-Y., Bubeck S., „Minimax policies for adversarial and stochastic bandits”, in: COLT - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009.
[13] Watkins C. J. C. H., Learning from delayed rewards, University of London 1989.
[14] Sutton R. S., Barto A. G., Reinforcement learning: An introduction, MIT Press, 2018.
[15] Bellman R., Kalaba R. E., Dynamic programming and modern control theory, Academic Press, NY 1965.
[16] Melo F. S., „Convergence of Q-learning: A simple proof”, Institute of Systems and Robootics Tech. Rep., pp. 1-4, 2001.
[17] Mnih V. and others, „Playing atari with deep reinforcement learning”, arXiv Prepr. arXiv1312.5602, 2013.
[18] Mnih V. and others, „Human-level control through deep reinforcement learning”, Nature, Vol. 518, No. 7540, 529-533 (2015).
[19] Li M., Zhang T., Chen Y., Smola A. J., „Efficient mini-batch training for stochastic optimization”, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY, 2014, pp. 661-670.
[20] Bellemare M. G., Naddaf Y., Veness J., Bowling M., „The arcade learning environment: An evaluation platform for general agents”, Journal of Artificial . Intelligence Research, Vol. 47, 253-279 (2013).
[21] Bellemare M. G., Veness J., Bowling M., „Investigating contingency awareness using Atari 2600 games”, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Vol. 26, No. 1, pp. 864-871, 2012.
[22] Tromp J., Farnebäck G., „Combinatorics of go”, in: Computers and Games, LNCS 4630, pp. 84-99, Springer 2006.
[23] Silver D. and others, „Mastering the game of go without human knowledge”, Nature, Vol. 550, No. 7676, 354-359 (2017).
[24] Browne C. B. and others, „A survey of monte carlo tree search methods”, IEEE Transaction on Computational Intelligence and AI in Games, Vol. 4, No. 1, 1-43 (2012).
[25] Silver D. and others, „Mastering the game of Go with deep neural networks and tree search”, Nature, Vol. 529, No. 7587, 484-489 (2016).
[26] Rosin C.D., „Multi-armed bandits with episode context”, Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, 203-230 (2011).
[27] Zhang P. and others, „Reinforcement learning-based end-to-end parking for automatic parking system”, Sensors, Vol. 19(18), 3996 (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-1b7dd8d5-97e6-4d52-b062-13211a914a6e