Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In this article is defined a reinforcement learning method, in which a subject of learning is analyzed. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basis of the present known conditions and actions, relatively to the Markov decision making process. The relationship between the present conditions and values and the potential future conditions is defined by the Bellman equation. The article discusses also a method of temporal difference learning, mechanism of eligibility traces, as well as their algorithms TD(0) and TD(Lambda). Theoretical analyses were supplemented by the practical studies, with reference to all implementation of the Sarsa(Lambda) algorithm, with replacing eligibility traces and the Epsilon greedy policy.
Czasopismo
Rocznik
Tom
Strony
57--74
Opis fizyczny
Twórcy
autor
- Department of Electrical Engineering, Poggorica, University of Montenegro, Yugoslavia, jaroslav@server1.cis.cg.ac.yu
Bibliografia
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUJ1-0016-0037