Handling realistic noise in multi-agent systems with self-supervised learning and curiosity

Szemenyei, Marton; Reizinger, Patrik

doi:10.2478/jaiscr-2022-0009

Artykuł - szczegóły

Tytuł artykułu

Handling realistic noise in multi-agent systems with self-supervised learning and curiosity

Autorzy

Szemenyei Marton , Reizinger Patrik

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2022-0009

Warianty tytułu

Języki publikacji

Abstrakty

Most reinforcement learning benchmarks – especially in multi-agent tasks – do not go beyond observations with simple noise; nonetheless, real scenarios induce more elaborate vision pipeline failures: false sightings, misclassifications or occlusion. In this work, we propose a lightweight, 2D environment for robot soccer and autonomous driving that can emulate the above discrepancies. Besides establishing a benchmark for accessible multiagent reinforcement learning research, our work addresses the challenges the simulator imposes. For handling realistic noise, we use self-supervised learning to enhance scene reconstruction and extend curiosity-driven learning to model longer horizons. Our extensive experiments show that the proposed methods achieve state-of-the-art performance, compared against actor-critic methods, ICM, and PPO.

Słowa kluczowe

deep reinforcement learning multi-agent environment autonomous driving robot soccer self-supervised learning

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2022

Tom

Vol. 12, No. 2

Strony

135--148

Opis fizyczny

Bibliogr. 36 poz., rys.

Twórcy

autor

Szemenyei Marton

szemenyei.marton@vik.bme.hu

Department of Control Engineering and Information Technology, Budapest University of Technology and Economics, 1117, Budapest, Magyar Tudosok krt. 2.

autor

Reizinger Patrik

Department of Control Engineering and Information Technology, Budapest University of Technology and Economics, 1117, Budapest, Magyar Tudosok krt. 2.

Bibliografia

[1] Bowen Baker, Ingmar Kanitscheider, Todor M. Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multiagent autocurricula. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
[2] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. 6 2016.
[3] Yuri Burda, Harrison Edwards, Deepak Pathak, Amos J. Storkey, Trevor Darrell, and Alexei A. Efros. Large-scale study of curiosity-driven learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
[4] Carl Doersch, Abhinav Gupta, and Alexei A. Efros. Unsupervised visual representation learning by context prediction. May 2015.
[5] Jeff Donahue, Philipp Krahenbahl, and Trevor Darrell. Adversarial feature learning. May 2016.
[6] Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. Discriminative unsupervised feature learning with exemplar convolutional neural networks. 2014.
[7] Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 2974–2982. AAAI Press, 2018.
[8] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. March 2018.
[9] David Ha and Jurgen Schmidhuber. Recurrent ¨ world models facilitate policy evolution. In SamyBengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolo Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pages 2455–2467, 2018.
[10] Matt Hoffman, Bobak Shahriari, John Aslanides,Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gomez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew owie, Ziyu Wang, Bilal Piot, and Nando de Freitas. Acme: A research framework for distributed reinforcement learning. 6 2020.
[11] Eric Jang, Coline Devin, Vincent Vanhoucke, and Sergey Levine. Grasp2vec: Learning object representations from self-supervised grasping. Proceedings of The 2nd Conference on Robot Learning, in PMLR 87:99-112 (2018), November 2018.
[12] John K. Kruschke. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2):573–603, 2013.
[13] Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, and Thore Graepel. Emergent coordination through competition. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
[14] Felipe B. Martins, Mateus G. Machado, Hansenclever F. Bassani, Pedro H. M. Braga, and Edna S. Barros. rsoccer: A framework for studying reinforcement learning in small and very small size robot soccer. 6 2021.
[15] Volodymyr Mnih, Adria Puigdom ` enech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org, 2016.
[16] Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, and Sergey Levine. Visual reinforcement learning with imagined goals. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolo Cesa-Bianchi, and Roan Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pages 9209–9220, 2018.
[17] Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, and Sergey Levine. Visual reinforcement learning with imagined goals. July 2018.
[18] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 2778–2787. PMLR, 2017.
[19] Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta. Self-supervised exploration via disagreement. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 5062–5071.PMLR, 2019.
[20] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 779–788. IEEE Computer Society, 2016.
[21] Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 6517–6525. IEEE Computer Society, 2017.
[22] Patrik Reizinger and Marton Szemenyei. Attention-based curiosity-driven exploration in deep reinforcement learning. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona,Spain, May 4-8, 2020, pages 3542–3546. IEEE, 2020.
[23] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.
[24] P. Sermanet, C. Lynch, J. Hsu, and S. Levine. Time-contrastive networks: Self-supervised learning from multi-view observation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 486–487, 2017.
[25] Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. Time-contrastive networks: Selfsupervised learning from video. 4 2017.
[26] Marton Szemenyei and Vladimir Estivill-Castro. Real-time scene understanding using deep neural networks for RoboCup SPL. In RoboCup 2018: Robot World Cup XXII, pages 96–108. Springer International Publishing, 2019.
[27] Marton Szemenyei and Vladimir Estivill-Castro. Fully neural object detection solutions for robot soccer. Neural Computing and Applications, 4 2021.
[28] Marton Szemenyei and Patrik Reizinger. Attention-based curiosity in multi-agent reinforcement learning environments. In 2019 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO). IEEE, 5 2019.
[29] Marton Szemenyei and Vladimir Estivill-Castro. ROBO: Robust, fully neural object detection forrobot soccer. In RoboCup 2019: Robot World Cup XXIII, pages 309–322. Springer International Publishing, 2019.
[30] Ashish Vaswani, Noam Shazeer, Niki Parmar,Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017.
[31] Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michael Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Remi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wunsch, Katrina McKinney, Oliver Smith, Tom ¨ Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver.Grandmaster level in StarCraft II using multi-agentreinforcement learning. Nature, 575(7782):350–354, 2019.
[32] Carl Vondrick, Abhinav Shrivastava, Alireza Fathi,Sergio Guadarrama, and Kevin Murphy. Trackingemerges by colorizing videos. 6 2018.
[33] Xiaolong Wang and Abhinav Gupta. Unsupervised learning of visual representations using videos. May 2015.
[34] Donglai Wei, Joseph Lim, Andrew Zisserman, and William T Freeman. Learning and using the arrow of time. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 6 2018.
[35] Richard Zhang, Phillip Isola, and Alexei A. Efros.Colorful image colorization. March 2016.
[36] Richard Zhang, Phillip Isola, and Alexei A. Efros.Split-brain autoencoders: Unsupervised learning by cross-channel prediction. November 2016.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a06cf8d1-dcfa-4eb7-bc56-c7438f7209b7