Multi agent deep learning with cooperative communication

Simões, David; Lau, Nuno; Reis, Luís Paulo

doi:10.2478/jaiscr-2020-0013

Artykuł - szczegóły

Tytuł artykułu

Multi agent deep learning with cooperative communication

Autorzy

Simões David , Lau Nuno , Reis Luís Paulo

Treść / Zawartość

Pełne teksty:

simoes_Multi agent deep learning with cooperative communication.pdf

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2020-0013

Warianty tytułu

Języki publikacji

Abstrakty

We consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partiallyobservable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.

Słowa kluczowe

multi-agent systems deep reinforcement learning centralized learning

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2020

Tom

Vol. 10, No. 3

Strony

189--207

Opis fizyczny

Bibliogr. 32 poz., rys.

Twórcy

autor

Simões David

david.simoes@ua.pt

Electronics, Telecommunications and Informatics Department, University of Aveiro, Portugal
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal

autor

Lau Nuno

Electronics, Telecommunications and Informatics Department, University of Aveiro, Portugal
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal

autor

Reis Luís Paulo

Artificial Intelligence and Computer Science Lab, Oporto, Portugal
Informatics Engineering Department, Faculty of Engineering of the University of Porto, Portugal

Bibliografia

[1] Maximilian Hüttenrauch, Adrian Šošic, and Gerhard ´ Neumann. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017.
[2] Lili Ma and Naira Hovakimyan. Vision-based cyclic pursuit for cooperative target tracking. Journal of Guidance, Control, and Dynamics, 36(2):617–622, 2013.
[3] Patrick Mannion, Jim Duggan, and Enda Howley. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems, pages 47–66. Springer, 2016.
[4] P Skobelev, E Simonova, and A Zhilyaev. Using multi-agent technology for the distributed management of a cluster of remote sensing satellites. Complex Syst: Fundament Appl, 90:287, 2016.
[5] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
[6] Jonathan Raiman, Susan Zhang, and Filip Wolski. Long-term planning and situational awareness in openai five. arXiv preprint arXiv:1912.06721, 2019.
[7] Tabish Rashid, Mikayel Samvelyan, Christian Schröder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. CoRR, abs/1803.11485, 2018.
[8] L. Busoniu, R. Babuska, and B. De Schutter. A comprehensive survey of multiagent reinforcement learning. Trans. Sys. Man Cyber Part C, 38(2):156–172, March 2008.
[9] Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.
[10] Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[11] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, pages 746–752, Menlo Park, CA, USA, 1998. American Association for Artificial Intelligence.
[12] Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4):1–15, 04 2017.
[13] Maxim Egorov. Multi-Agent Deep Reinforcement Learning. Technical report, University of Stanford, Department of Computer Science, 2016.
[14] Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems, pages 66–83. Springer, 2017.
[15] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.
[16] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Valuedecomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pages 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and MultiagentSystems.
[17] Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pages 2244–2252, 2016.
[18] Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, and Jun Wang. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR, abs/1703.10069, 2017.
[19] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, volume 157, pages 157–163, 1994.
[20] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533, 1986.
[21] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.
[22] Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip HS Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1146–1155. JMLR. org, 2017.
[23] Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[24] Abhishek Das, Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2951–2960, 2017.
[25] Ronald J. Williams. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229–256, May 1992.
[26] David B D’Ambrosio, Skyler Goodell, Joel Lehman, Sebastian Risi, and Kenneth O Stanley. Multirobot behavior synchronization through direct neural network communication. In International Conference on Intelligent Robotics and Applications, pages 603–614. Springer, 2012.
[27] Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. Proceedings of the International Conference on Learning Representations, 2017.
[28] Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. Deal or no deal? end-toend learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453.Association for Computational Linguistics, 2017.
[29] Qiyang Li, Xintong Du, Yizhou Huang, Quinlan Sykora, and Angela P. Schoellig. Learning of coordination policies for robotic swarms. CoRR, abs/1709.06620, 2017.
[30] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.
[31] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
[32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations, San Diego, 2015.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c9ac4d65-2aaa-4dd6-ac27-f19cc16bc599