Analysis of selected reinforcement learning applications in contract bridge

Jarosz, Robert

doi:10.5604/01.3001.0053.9702

Artykuł - szczegóły

Tytuł artykułu

Analysis of selected reinforcement learning applications in contract bridge

Autorzy

Jarosz Robert

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.5604/01.3001.0053.9702

Warianty tytułu

Analiza wybranych zastosowań uczenia maszynowego w podejściu do problemu gry brydż

Języki publikacji

Abstrakty

This paper presents an overview of four selected solutions addressing problem of bidding in card game of contract bridge. In the beginning the basic rules are presented along with basic problem size estimation. Brief description of collected work is presented in chronological order, tracking evolution of approaches to the problem. While presenting solution a short description of mathematical base is attached. In the end a comparison of solution is made, followed by an attempt to estimate future development of techniques.

Artykuł przedstawia cztery wybrane podejścia do rozgrywania licytacji w brydżu. W części pierwszej przybliżane są zasady brydża, stanu wiedzy na jego temat oraz krótkie oszacowanie poziomu komplikacji problemu. W części zasadniczej przedstawiono krótkie opisy podejść badaczy do problemu licytacji, badania przedstawione są w kolejności chronologicznej, ukazując ewolucję podejść do problemu. W trakcie opisywania rozwiązań, przybliżane są po krótce matematyczne zasady działania wykorzystanych mechanizmów uczenia maszynowego. Część końcowa podsumowuje przedstawione porównanie rozwiązań i oszacowanie kierunku przyszłego rozwoju.

Słowa kluczowe

artificial intelligence bidding bid prediction contract bridge game theory incomplete knowledge machine learning neural networks Q-learning reinforcement learning supervised learning

brydż licytacja niepełna informacja Q-learning sieci neuronowe sztuczna inteligencja teoria gier uczenie maszynowe uczenie ze wzmocnieniem uczenie z nauczycielem

Wydawca

Institute of Computer and Information Systems, Faculty of Cybernetics, Military University of Technology

Czasopismo

Computer Science and Mathematical Modelling

Rocznik

2022

Tom

No. 15-16

Strony

23--31

Opis fizyczny

Bibliogr. 20 poz., rys.

Twórcy

autor

Jarosz Robert

robert.jarosz@wat.edu.pl

Military University of Technology, Faculty of Cybernetics, Kaliskiego Str. 2, 00-908 Warsaw, Poland

Bibliografia

[1] Ameljańczyk A., „Teoria gier”, Vol. 690, p. 78, WAT, 1978.
[2] Binmore K., Game theory: a very short introduction. OUP Oxford, 2007.
[3] Silver D. and others, „Mastering the game of go without human knowledge”, Nature, Vol. 550, No. 7676, 354-359 (2017).
[4] Chang M.-S., „Building a fast double-dummy bridge solver”, Technical Report, NY 1996.
[5] Yeh C.-K., Hsieh C.-Y., Lin H.-T., „Automatic bridge bidding using deep reinforcement learning”, in: IEEE Transaction on Games, Vol. 10, No. 4, pp. 365-377, 2018.
[6] Gong Q., Jiang Y., Tian Y., „Simple is better: Training an end-to-end contract bridge bidding agent without human knowledge”, Real-world Sequential Decision Making, Workshop at ICML 2019, June 14, 2019, Long Beach, USA.
[7] Ho C.-Y., Lin H.-T., „Contract bridge bidding by learning”, Proceedings of the Workshop on Computer Poker and Imperfect Information at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[8] DeLooze L. L., Downey J., „Bridge bidding with imperfect information”, in: Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games, pp. 368-373, 2007.
[9] Langford J., Zhang T., „The epoch-greedy algorithm for multi-armed bandits with side information”, in: Advance Neural Information Processing. Systems, Vol. 20, 1-8, NIPS 2007.
[10] Auer P., Cesa-Bianchi N., Fischer P., „Finite-time analysis of the multiarmed bandit problem”, in: Machine Learning, Vol. 47, No. 2, pp. 235-256, Springer 2002.
[11] Beygelzimer A., Dani V., Hayes T., Langford J., Zadrozny B., „Error limiting reductions between classification tasks”, Proceedings of the 22nd International Conference on Machine Learning, pp. 49-56, 2005.
[12] Tu H.-H., Lin H.-T., „One-sided support vector regression for multiclass cost-sensitive classification”, Proceedings of the 27th International Conference on Machine Learning (ICML10), June 21-24, 2010, Haifa, Israel.
[13] Chu W., Li L., Reyzin L., Schapire R., „Contextual bandits with linear payoff functions”, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208-214, 2011.
[14] Costel Y., WBridge5, 2014.
[15] Watkins C. J. C. H., Dayan P., „Q-learning”, in: Machine Learning, Vol. 8, No. 3, pp. 279-292, Springer, 1992.
[16] Melo F. S., „Convergence of Q-learning: A simple proof”, Institute of Systems and Robotics Tech. Rep., pp. 1-4, 2001.
[17] Rong J., Qin T., An B., „Competitive bridge bidding with deep neural networks”, arXiv Prepr. arXiv1903.00900, 2019.
[18] https://www.bridgebase.com/vugraph_archives/vugraph_archives.php (accessed October 19, 2022).
[19] Zhang X., Liu W., Lou L., Yang F., „AI Enabled Bridge Bidding Supporting Interactive Visualization”, Sensors, Vol. 22, No. 5, 1877 (2022).
[20] http://www.synrey.com/root.html (accessed October 21, 2022).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-b1718991-caf0-443c-a885-cc90dbca5f0a