Adaptive Machine Reinforcement Learning

Poliscuk, J.

Artykuł - szczegóły

Tytuł artykułu

Adaptive Machine Reinforcement Learning

Autorzy

Poliscuk J.

Wybrane pełne teksty z tego czasopisma

http://www.ejournals.eu/Schedae-Informaticae/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

In this article is defined a reinforcement learning method, in which a subject of learning is analyzed. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basis of the present known conditions and actions, relatively to the Markov decision making process. The relationship between the present conditions and values and the potential future conditions is defined by the Bellman equation. The article discusses also a method of temporal difference learning, mechanism of eligibility traces, as well as their algorithms TD(0) and TD(Lambda). Theoretical analyses were supplemented by the practical studies, with reference to all implementation of the Sarsa(Lambda) algorithm, with replacing eligibility traces and the Epsilon greedy policy.

Słowa kluczowe

algorithm TD(0) algorithm TD(Lambda) Bellman equation Markov decision making process mechanism of eligibility traces method of temporal difference learning reinforcement learning method

Wydawca

Wydawnictwo Uniwersytetu Jagiellońskiego

Czasopismo

Schedae Informaticae

Rocznik

2002

Tom

Vol. 11

Strony

57--74

Opis fizyczny

Bibliogr. 9 poz., rys.

Twórcy

autor

Poliscuk J.

Department of Electrical Engineering, Poggorica, University of Montenegro, Yugoslavia, jaroslav@server1.cis.cg.ac.yu

Bibliografia

[1] Boyan J.A. and Littman M.L.; Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach, Advances in Neural Information Processing Systems: Proceedings of the 994 Conference, San Francisco, CA, USA, 1994.
[2] Doya K.; Reinforcement Learning in Continuous Time and Space, Neural Computation, Jan 2000, Vol. 12, No. 1. pp. 219-246.
[3] Kaclbling L.P., Littman M.L. and Moore A.W.; Reinforcement Learning: A Survey, Journal of Artificial Intelligence, Vol. 4, 1996, pp. 237-285.
[4] Lewis M.E. and Puterman M.L.; A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Process, IEEE Transactions on Automatic Control, Jan 2001, Vol. 46, No. 1, pp. 96-101.
[5] Mitchell T.; Machine Learning. Me Graw Hill. 1997.
[6] Poliscuk J.E.; A contribution to methodology of development of Decision Support Systems and Expert Systems, Ph.D. Thesis, Faculty of Organization and Informatics, University of Zagreb, Croatia, 1992.
[7] Rolls E.T., Milward T. and Wiskott L.; A Model of Invoviant Object Recognition in the Visual System: learning Rules, Activation Functions, Lateral Inhibition, and Information - Based Performance Measures, Neural Computation, Nov 2000, Vol. 2, No. 11, pp. 2547-2573.
[8] Sutton R.S. and Barto A.G.; Reinforcement Learning: An Introduction, MIT press - Bradford Books. Cambridge, MA, 1998.
[9] Szepesvari C. and Littman M.L.; A Unified Analysis of Value - Function - Based Reinforcement - Learning Algorithms, Neural Computation, 11/15/99, Vol. 11, No. 8, pp. 2017-2061.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUJ1-0016-0037