Learning Automata based Algorithms for Finding Optimal Policies in Fully Cooperative Markov Games

Masoumi, B.; Meybodi, M.R.; Abtahi, F.

Artykuł - szczegóły

Tytuł artykułu

Learning Automata based Algorithms for Finding Optimal Policies in Fully Cooperative Markov Games

Autorzy

Masoumi B. , Meybodi M.R. , Abtahi F.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Automat uczący bazujący na algorytmie znajdowania optymalnej strategii w kooperacyjne grze Markova

Języki publikacji

Abstrakty

Markov games, as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi-agent systems. In this paper, several learning automata based multi-agent system algorithms for finding optimal policies in fully-cooperative Markov Games are proposed. In the proposed algorithms, Markov problem is described as a directed graph in which the nodes are the states of the problem, and the directed edges represent the actions that result in transition from one state to another. Each state of the environment is equipped with a variable structure learning automata whose actions are moving to different adjacent states of that state. Each agent moves from one state to another and tries to reach the goal state. In each state, the agent chooses its next transition with help of the learning automaton in that state. The actions taken by learning automata along the path traveled by the agent is then rewarded or penalized based on the value of the traveled path according to a learning algorithm. In the second group of the proposed algorithms, the concept of entropy has been imported into learning automata based multi-agent systems to drive the magnitude of the reinforcement signal given to the LA and improve the performance of the algorithms. The results of experiments have shown that the proposed algorithms perform better than the existing learning automata based algorithms in terms of speed and the accuracy of reaching the optimal policy.

Zaprezentowano szereg automatów uczących bazujących na algorytmach systemów typu multi-agent w celu poszukiwania optymalnej polityki w kooperatywnej grze Markova. Proces Markova jest opisany w postaci grafów których węzły opisują stan problemu, a krawędzie reprezentują akcje.

Słowa kluczowe

multi-agent systems fully cooperative Markov games learning automata

system typu multi-agent proces Markova automat uczący

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2012

Tom

R. 88, nr 6

Strony

280--289

Opis fizyczny

Bibliogr. 29 poz., schem., wykr.

Twórcy

autor

Masoumi B.

autor

Meybodi M.R.

autor

Abtahi F.

Islamic Azad University, Qazvin Branch, Masoumi@iau.ac.ir

Bibliografia

[1] Vlassis N., A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence, (Morgan and Claypool Publishers, Amesterdam, (2007).
[2] L. Panait, S. Luke, Cooperative Multi-Agent Learning: The State of the Art, Autonomous Agents and Multi-Agent Systems, 11 (2005) 387-434.
[3] L. Shapley, Stochastic Games, in: The National Academy of Sciences. USA, (1953), 1095-1100.
[4] M.J. Osborne, A. Rubinstein, A Course in Game Theory, (Cambridge, MA: MIT Press, 1994).
[5] A. Nowe, K. Verbeeck, Colonies of Learning Automata, IEEE Transactions on Systems, Man and Cybernetics, 32 (2002) 772-780.
[6] C. Claus, C. Boutilier, The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems, in: The 15th National Conference on Artificial Intelligence. Wisconsin, USA, 1998), 746-752.
[7] G. Chalkiadakis, C. Boutilier, Coordination in Multiagent Reinforcement Learning: A Bayesian Approach, in: The 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. Melbourne, Australia, 2003), 709-716.
[8] J. Hu, M.P. Wellman, Nash Q-Learning for General-Sum Stochastic Games, Journal of Machine Learning Research, 4 (2003) 1039-1069.
[9] M. Lauer, M. Riedmiller, An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems, in: The 17th International Conference on Machine Learning. (Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2000), 535 - 542.
[10] X. Wang, T. Sandholm, Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games, in: in Advances in Neural Information Processing Systems. (MIT Press, 2002), 1571-1578.
[11] P. Laroche, Y. Boniface, R. Schott, A New Decomposition Technique for Solving Markov Decision Processes, in: The 2001 ACM Symposium on Applied Computing. Las Vegas, Nevada, United States, 2001), 12 -16.
[12] L. Peret, F. Garcia, On-line search for solving large Markov Decision Processes, in: Sixth European Workshop on Reinforcement Learning (EWRL-6). Nancy, FRANCE, 2003).
[13] M.R. Khojasteh, M.R. Meybodi, Evaluating Learning Automata as a Model for Cooperation in Complex Multi-Agent Domains, in: RoboCup 2006: Robot Soccer World Cup. (Springer, Berlin, 2007), 410-417.
[14] A. Now´e, K. Verbeeck, M. Peeters, Learning Automata as a Basis for Multi-agent Reinforcement Learning, in: Learning and Adaption in Multi-Agent Systems (Springer, Berlin, 2006), 71-85.
[15] P. Sastry, V. Phansalkar, M.A.L. Thathachar, Decentralized Learning of Nash Equilibria in Multi-person Stochastic Games with Incomplete Information, IEEE Transactions on Systems, Man, and Cybernetics, 24 (1994) 769-777.
[16] P. Vrancx, K. Verbeeck, A. Nowé, Decentralized Learning in Markov Games, IEEE Transactions on Systems, Man and Cybernetics (Part B: Cybernetics), 38 (2008) 976-981.
[17] B. Masoumi, M.R. Meybodi, Utilization of Networks of Learning Automata for Solving Decision Problems in Decentralized Multiagent Systems, in: 17th Iranian Conference on Electrical Engineering (ICEE2009). Tehran, Iran, 2009).
[18] X. Zhuang, Z. Chen, Strategy Entropy as a Measure of Strategy Convergence in Reinforcement Learning, in: First International Conference on Intelligent Networks and Intelligent Systems. Wuhan, China, 2008), 81-84.
[19] E.H. Lieb, J. Yngvason, The Physics and Mathematics of the Second Law of Thermodynamics, Physics Report, 310 (1999) 1-96.
[20] L. Guana, I.U. Awanb, I. Phillipsa, A. Griggc, W. Dargied, Performance analysis of a threshold-based discrete-time queue using maximum entropy, Simulation Modelling Practice and Theory, 17 (2009) 558-568.
[21] K.S. Narendra, M.A.L. Thathachar, Learning Automata: an Introduction, (Prentice-Hall Inc, 1989).
[22] M.A.L. Thathachar, P.S. Sastry, Networks of Learning Automata: Techniques for Online Stochastic Optimization, (Kluwer Academic Publishers, 2004).
[23] S. Lakshmivarahan, K. Narendra, Learning Algorithms for Twoperson Zero-sum Stochastic Games with Incomplete Information, Mathematics of Operations Research, 6 (1981) 379 - 386.
[24] R.A. Howard, Dynamic Programming and Markov Processes, (Cambridge, MA: MIT Press, 1960).
[25] M. Littman, Markov Games as a Framework for Multi-agent Reinforcement Learning, in: 11th Int. Conf. on Machine Learning. San Francisco, Morgan Kaufmann, 1994), 157-163.
[26] C. Boutilier, Planning, Learning and Coordination in Multiagent Decision Processes, in: The 6th Conference on Theoretical Aspects of Rationality and Knowledge. Renesse, Holland, 1996), 195-210.
[27] R.M. Wheeler, K.S. Narendra, Decentralized Learning in Finite Markov Chains, IEEE Transactions on Automatic Control, 31 (1986) 519-526.
[28] M. Costa, A. Goldberger, C. Peng, Multi-scale Entropy Analysis of Complex Physiologic Time Series, Physical Review Letters, 89 (2002) 1-4.
[29] Z. Dianhu, F. Shaohui, D. Xiaojun, Entropy-A Measure of Uncertainty of Random Variable, Systems Engineering and Electronics, 11 (1997) 1-3.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPOK-0039-0066