Deep reinforcement learning overview of the state of the art

Fenjiro, Y.; Benbrahim, H.

doi:10.14313/JAMRIS_3-2018/15

Artykuł - szczegóły

Tytuł artykułu

Deep reinforcement learning overview of the state of the art

Autorzy

Fenjiro Y. , Benbrahim H.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.14313/JAMRIS_3-2018/15

Warianty tytułu

Języki publikacji

Abstrakty

Artificial intelligence has made big steps forward with reinforcement learning (RL) in the last century, and with the advent of deep learning (DL) in the 90s, especially, the breakthrough of convolutional networks in computer vision field. The adoption of DL neural networks in RL, in the first decade of the 21 century, led to an end-toend framework allowing a great advance in human-level agents and autonomous systems, called deep reinforcement learning (DRL). In this paper, we will go through the development Timeline of RL and DL technologies, describing the main improvements made in both fields. Then, we will dive into DRL and have an overview of the state-ofthe- art of this new and promising field, by browsing a set of algorithms (Value optimization, Policy optimization and Actor-Critic), then, giving an outline of current challenges and real-world applications, along with the hardware and frameworks used. In the end, we will discuss some potential research directions in the field of deep RL, for which we have great expectations that will lead to a real human level of intelligence.

Słowa kluczowe

reinforcement learning deep learning convolutional network recurrent network deep reinforcement learning

Wydawca

Łukasiewicz Industrial Research Institute for Automation and Measurements PIAP

Czasopismo

Journal of Automation Mobile Robotics and Intelligent Systems

Rocznik

2018

Tom

Vol. 12, No. 3

Strony

20--39

Opis fizyczny

Bibliogr. 77 poz., rys.

Twórcy

autor

Fenjiro Y.

fenjiro@gmail.com

National School of Computer Science and Systems Analysis (ENSIAS), Mohammed V University, Rabat, Morocco.

autor

Benbrahim H.

benbrahimh@hotmail.com

National School of Computer Science and Systems Analysis (ENSIAS), Mohammed V University, Rabat, Morocco.

Bibliografia

[1] “Sutton & Barto Book: Reinforcement Learning: An Introduction.” Available at: http://incompleteideas. net/book/the-book-2nd.html
[2] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition. ISBN-13: 978-0136042594
[3] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning”, Nature, vol. 521, no. 7553, May 2015, pp. 436–444.DOI: 10.1038/nature14539.
[4] V. Mnih et al., “Human-level control through deep reinforcement learning”, Nature, vol. 518,no. 7540, pp. 529–533, Feb. 2015.DOI: 10.1038/nature14236.
[5] A. D. Tijsma, M. M. Drugan, M. A. Wiering, “Comparing exploration strategies for Q-learning in random stochastic mazes”. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 2016, pp. 1–8.DOI: 10.1109/SSCI.2016.7849366.
[6] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” Jul. 2012. ArXiv:1207.0580[Cs].
[7] R. Sutton, “Learning to Predict by the Method of Temporal Differences,” Mach. Learn., vol. 3, pp. 9–44, Aug. 1988.DOI: 10.1007/BF00115009
[8] K. M. Gupta, “Performance Comparison of Sarsa(λ) and Watkin’s Q(λ) Algorithms,” p. 8. Available at: https://pdfs.semanticscholar. org/ccdc/3327f4da824825bb990ffb693ceaf7dc89f6.pdf.
[9] G. A. Rummery, M. Niranjan, “On-Line Q-Learning Using Connectionist Systems,” 1994, Cite-Seer.
[10] Yuji Takahashi, Geoffrey Schoenbaum, Yael Niv, “Silencing the Critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an Actor/Critic model”, Front. Neurosci., 15 July 2008,pp. 86–99.DOI: 10.3389/neuro.01.014.2008l
[11] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.DOI: 10.1038/nature16961
[12] S. Hölldobler, S. Möhle, A. Tigunova, “Lessons Learned from AlphaGo,” p. 10. S. Hölldobler, A. Malikov, C. Wernhard (eds.): YSIP2 – Proceedings of the Second Young Scientist’s International Workshop on Trends in Information Processing, Dombai, Russian Federation, May 16–20, 2017, published at http://ceur-ws.org.
[13] David Silver, Deepmind, “UCL Course on RL” [14] Luis Serrano, A friendly introduction to Deep Learning and Neural Networks. https://www.youtube.com/watch?v=BR9h47Jtqyw
[15] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv:1609.04747 [cs], Sep. 2016.
[16] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Netw., vol. 12, no. 1, pp. 145–151, Jan. 1999.DOI: 10.1016/S0893-6080(98)00116-6.
[17] J. Duchi, E. Hazan, Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”, JMLR, vol. 12(Jul), 2011, p. 2121−2159.
[18] “Rmsprop: Divide the gradient by a running average of its recent magnitude – Optimization: How to make the learning go faster,” Coursera.
[19] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” ArXiv1212.5701 Cs, Dec. 2012.
[20] D. P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization,” ArXiv1412.6980 Cs, Dec. 2014.
[21] J. Bergstra and Y. Bengio, “Random Search for Hyper-parameter Optimization”, J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012. ISSN: 1532-4435
[22] J. Snoek, H. Larochelle, R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”, p. 9. https://arxiv.org/pdf/1206.2944.pdf
[23] Yoshua Bengio, “Gradient-Based Optimization of Hyperparameters.” DOI: 10.1162/089976600300015187.
[24] M. Sazli, “A brief review of feed-forward neural networks”, Commun. Fac. Sci. Univ. Ank., vol. 50, pp. 11–17, Jan. 2006. DOI: 10.1501/0003168.
[25] Salman Khan, Hossein Rahmani, Syed Afaq Ali Shah, A Guide to Convolutional Neural Networks for Computer Vision. DOI: 10.2200/S00822ED1V01Y201712COV015
[26] Hamed Habibi Aghdam, Elnaz Jahani Heravi,Guide to Convolutional Neural Networks Practical Application to Traffic-Sign Detection and Classification, Springer 2017.
[27] S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” ArXiv1502.03167 Cs,Feb. 2015.
[28] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning Applied to Document Recognition”.In: Proceedings of the IEEE, 1998, pp. 2278–2324. DOI: 10.1109/5.726791.
[29] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems, 25, 2012, pp. 1097–1105.DOI: 10.1145/3065386.
[30] K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv1409.1556 Cs, Sep. 2014.
[31] C. Szegedy et al., “Going Deeper with Convolutions,” ArXiv1409.4842 Cs, Sep. 2014. DOI: 10.1109/CVPR.2015.7298594.
[32] M. Lin, Q. Chen, S. Yan, “Network In Network,” ArXiv1312.4400 Cs, Dec. 2013.
[33] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition,” ArXiv151203385 Cs, Dec. 2015. DOI: 10.1109/CVPR.2016.90.
[34] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, “Densely Connected Convolutional Networks,” ArXiv1608.06993 Cs, Aug. 2016.
[35] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing Between Capsules,” ArXiv1710.09829 Cs, Oct. 2017.
[36] S. Hochreiter and J. Schmidhuber, “Long Short-term Memory,” Neural Comput., vol. 9, pp. 1735–80, Dec. 1997.DOI: 10.1162/neco.1997.9.8.1735
[37] “Understanding LSTM Networks”. Colah’s blog.27/08/2015. https://colah.github.io/posts/2015-08-Understanding-LSTMs/.
[38] J. Yosinski, J. Clune, Y. Bengio, H. Lipson, “How transferable are features in deep neural networks?,” ArXiv1411.1792 Cs, Nov. 2014.
[39] N. Becherer, J. Pecarina, S. Nykl, K. Hopkinson, “Improving optimization of convolutional neural networks through parameter fine-tuning”, Neural Comput. Appl., pp. 1–11, Nov. 2017.DOI: 10.1007/s00521-017-3285-0.
[40] “Frame Skipping and Pre-Processing for Deep Q-Nets on Atari 2600 Games”, Daniel Takeshi blog, 25/11/2016 https://danieltakeshi.github.io/2016/11/25/frame-skipping-andpreprocessing-for-deep-q-networks-on-atari-2600-games/.
[41] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” ArXiv1509.02971 Cs Stat, Sep. 2015.
[42] A. S. Lakshminarayanan, S. Sharma, B. Ravindran,“Dynamic Frame skip Deep Q Network,”ArXiv1605.05365 Cs, May 2016.
[43] S. Lewandowsky, S.-C. Li, Catastrophic interference in neural networks: Causes, solutions, and data, Dec. 1995.DOI: 10.1016/B978-012208930-5/50011-8
[44] A. Nair et al., “Massively Parallel Methods for Deep Reinforcement Learning”, ArXiv1507.04296 Cs, Jul. 2015.
[45] M. Hausknecht, P. Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs,” ArXiv1507.06527 Cs, Jul. 2015.
[46] H. van Hasselt, A. Guez, D. Silver, “Deep Reinforcement Learning with Double Q-learning,” ArXiv1509.06461 Cs, Sep. 2015.
[47] T. Schaul, J. Quan, I. Antonoglou, D. Silver, “Prioritized Experience Replay,” ArXiv1511.05952Cs, Nov. 2015.
[48] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, N. de Freitas, “Dueling Network Architectures for Deep Reinforcement Learning”, ArXiv1511.06581 Cs, Nov. 2015.
[49] M. Fortunato et al., “Noisy Networks for Exploration,” ArXiv1706.10295 Cs Stat, Jun. 2017.
[50] M. Hessel et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” ArXiv1710.02298 Cs, Oct. 2017.
[51] V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” ArXiv1602.01783 Cs,Feb. 2016.
[52] J. Schulman, P. Moritz, S. Levine, M. Jordan,P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” ArXiv1506.02438 Cs, Jun. 2015.
[53] M. Jaderberg et al., “Reinforcement Learning with Unsupervised Auxiliary Tasks,” ArXiv1611.05397 Cs, Nov. 2016.
[54] H. Noh, S. Hong, B. Han, “Learning Deconvolution Network for Semantic Segmentation”, ArXiv1505.04366 Cs, May 2015.DOI: 10.1109/ICCV.2015.178.
[55] Z. Wang et al., “Sample Efficient Actor-Critic with Experience Replay”, ArXiv1611.01224 Cs, Nov. 2016.
[56] R. Munos, T. Stepleton, A. Harutyunyan, M. G. Bellemare, “Safe and Efficient Off-Policy Reinforcement Learning”, ArXiv1606.02647 Cs Stat, Jun. 2016.
[57] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, P. Abbeel, “Trust Region Policy Optimization,” ArXiv1502.05477 Cs, Feb. 2015.
[58] S. M. Kakade, “A Natural Policy Gradient,” p. 8.https://papers.nips.cc/paper/2073-a-natural--policy-gradient.pdf
[59] J. Schulman, F. Wolski, P. Dhariwal, A. Radford,O. Klimov, “Proximal Policy Optimization Algorithms,”ArXiv1707.06347 Cs, Jul. 2017.
[60] Y. Wu, E. Mansimov, S. Liao, R. Grosse, Ba, “Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation,”ArXiv1708.05144 Cs, Aug. 2017.
[61] J. Martens, R. Grosse, “Optimizing Neural Networks with Kronecker-factored Approximate Curvature,” ArXiv1503.05671 Cs Stat, Mar.2015.
[62] R. Grosse, J. Martens, “A Kronecker-factored approximate Fisher matrix for convolution layers,”ArXiv1602.01407 Cs Stat, Feb. 2016.
[63] Bonsai “Writing Great Reward Functions” Youtube https://www.youtube.com/watch?v=0R3Pn-JEisqk
[64] X. Guo, “Deep Learning and Reward Design for Reinforcement Learning,” p. 117.
[65] A. Y. Ng, S. Russell, “Algorithms for Inverse Reinforcement Learning”. In: ICML 2000 Proc. Seventeenth Int. Conf. Mach. Learn., May 2000.ISBN:1-55860-707-2
[66] Y. Duan et al., “One-Shot Imitation Learning,” ArXiv1703.07326 Cs, Mar. 2017.
[67] “CS 294 Deep Reinforcement Learning, Fall 2017”, Course.
[68] C. Finn, P. Abbeel, S. Levine, “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,”ArXiv1703.03400 Cs, Mar. 2017.
[69] D. Kulkarni, R. Narasimhan, “Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation.”arXiv:1604.06057 Cs.
[70] A. Gudimella et al., “Deep Reinforcement Learning for Dexterous Manipulation with Concept Networks,” ArXiv1709.06977 Cs, Sep. 2017.
[71] R. Negrinho and G. Gordon, “DeepArchitect: Automatically Designing and Training Deep Architectures,”ArXiv1704.08792 Cs Stat, Apr. 2017.
[72] J. X. Wang et al., “Learning to reinforcement learn”,ArXiv1611.05763 Cs Stat, Nov. 2016.
[73] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever,P. Abbeel, “RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning,”ArXiv1611.02779 Cs Stat, Nov. 2016.
[74] M. T. J. Spaan, “Partially Observable Markov Decision Processes,” Reinf. Learn., p. 27. DOI: 10.1007/978-3-642-27645-3_12
[75] Bonsai, M. Hammond, Deep Reinforcement Learning in the Enterprise: Bridging the Gap from Games to Industry”, Youtube. https://www.youtube.com/watch?v=GOsUHlr4DKE
[76] Emine Cengil, Ahmet Çinar, “A GPU-based convolutional neural network approach for image classification”.DOI: 10.1109/IDAP.2017.8090194
[77] “Why are GPUs necessary for training Deep Learning models?”, Analytics Vidhya, 18-May-2017.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c799f70c-d6bb-4e75-9bba-ef2313881015