Online learning algorithm for zero-sum games with integral reinforcement learning

Vamvoudakis, K. G.; Vrabie, D.; Lewis, F. L.

Artykuł - szczegóły

Tytuł artykułu

Online learning algorithm for zero-sum games with integral reinforcement learning

Autorzy

Vamvoudakis K. G. , Vrabie D. , Lewis F. L.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time zero sum game solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Isaacs equation and it does not require explicit knowledge on the system’s drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/ disturbance/critic structure having three adaptive approximator structures. All three approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for critic, disturbance and actor networks. The convergence to the Nash solution of the game is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.

Słowa kluczowe

learning online algorithm zero-sum game game infinite horizon Hamilton-Jacobi-Isaacs equation approximation network optimal value function adaptive control tuning algorithm Nash solution

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2011

Tom

Vol. 1, No. 4

Strony

315--332

Opis fizyczny

Bibliogr. 29 poz., rys.

Twórcy

autor

Vamvoudakis K. G.

kyriakos@arri.uta.edu

Automation and Robotics Research Institute, University of Texas at Arlington, Texas, USA

autor

Vrabie D.

United Technologies Research Center, Connecticut, USA

autor

Lewis F. L.

Automation and Robotics Research Institute, University of Texas at Arlington, Texas, USA

Bibliografia

[1] Tijs S. Introduction to Game Theory. Hindustan Book Agency, India, 2003.
[2] Baar T., Olsder G. J. Dynamic Noncooperative Game Theory, 2nd ed. Philadelphia, PA: SIAM, 1999, vol. 23, SIAM’s Classic in Applied Mathematics.
[3] Baar T., Bernard P. Optimal Control and Related Minimax Design Problems. Boston, MA:Birkhuser, 1995.
[4] Van Der Shaft A. J. L2-gain analysis of nonlinear systems and nonlinear state feedback control. IEEE Transactions on Automatic Control 1992; 37(6): 770-784.
[5] Abu-Khalaf M., Lewis F. L. Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems. IEEE Transactions on Neural Networks 2008; 19(7); 1243-1252.
[6] Abu-Khalaf M., Lewis F. L., Huang J. Policy Iterations on the Hamilton-Jacobi- Isaacs Equation for H? State Feedback Control With Input Saturation. IEEE Transactions on Automatic Control 2006; 51(12); 1989–1995.
[7] Abu-Khalaf M., Lewis F. L., Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach, Automatica 2005, 41 (5), 779–791.
[8] Murray J. J., Cox C. J., Lendaris G. G., Saeks R., Adaptive Dynamic Programming, IEEE Trans. on Systems, Man and Cybernetics 2002, 32 (2), 140-153.
[9] Bertsekas D. P., Tsitsiklis J. N. Neuro-Dynamic Programming. Athena Scientific: MA, 1996.
[10] Si J., Barto A., Powel W., Wunch D., Handbook of Learning and Approximate Dynamic Programming, John Wiley, New Jersey, 2004.
[11] Sutton R. S., Barto A. G., Reinforcement Learning – An Introduction, MIT Press, Cambridge, Massachusetts,1998.
[12] Howard R. A., Dynamic Programming and Markov Processes, MIT Press, Cambridge, Massachusetts, 1960.
[13] Werbos P.J. Beyond Regression: New Tools for Prediction and Analysis in the Behavior Sciences. Ph.D. Thesis. 1974.
[14] Werbos P. J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control. ed. D.A. White and D.A. Sofge, New York: Van Nostrand Reinhold, 1992.
[15] Werbos, Neural networks for control and system identification, IEEE Proc. CDC89.
[16] Prokhorov D., Wunsch D., Adaptive critic designs,”IEEE Trans. on Neural Networks 1997, 8(5), 997–1007.
[17] Baird III L. C., “Reinforcement Learning in Continuous Time: Advantage Updating”, Proc. Of ICNN, 1994.
[18] Vamvoudakis K. G., Lewis F. L. Online Actor-Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem. Automatica 2010; 46(5): 878–888.
[19] Vamvoudakis K. G., Lewis F. L. Online Neural Network Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration. to appear in International Journal of Robust and Nonlinear Control, 2011.
[20] Vrabie D., Pastravanu O., Lewis F., Abu-Khalaf M., Adaptive Optimal Control for Continuous-Time Linear Systems Based on Policy Iteration, Automatica 2009, 42(2), 477–484.
[21] Kleinman D. On an Iterative Technique for Riccati Equation Computations. IEEE Transactions on Automatic Control 1968; 13; 114–115.
[22] Dierks T., Jagannathan S., Optimal Control of Affine Nonlinear Continuous-time systems Using an Online Hamilton-Jacobi-Isaacs Formulation, Proc. 49th IEEE Conference on Decision and Control 2010, 3048–3053.
[23] Johnson M., Bhasin S., Dixon W. E. Nonlinear Two-player Zero-sum Game Approximate Solution Using a Policy Iteration Algorithm. to appear IEEE Conference on Decision and Control, Orlando, FL, 2011.
[24] Bhasin S., Johnson M., Dixon W. E. A model free robust policy iteration algorithm for optimal control of nonlinear systems. Proc. 49th IEEE Conference on Decision and Control 2010; 3060–3065.
[25] Lewis F. L., Syrmos V. L. Optimal Control. John Wiley, 1995.
[26] Lewis F.L., Jagannathan S., Yesildirek A. Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor & Francis 1999.
[27] Khalil H. K. Nonlinear Systems. Prentice-Hall, 1996.
[28] Stevens B., Lewis F. L., Aircract Control and Simulation, 2nd edition, JohnWilley, New Jersey, 2003.
[29] Nevistic V. , Primbs J. A. Constrained nonlinear optimal control: a converse HJB approach. Technical Report 96-021, California Institute of Technology, 1996.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-32bdcf51-69b2-41c8-9fe1-e9522f3e53de