The article analyses a reinforcement learning method in which the subject of learning is defined. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. Theoretical analyses were supplemented by the practical studies, with reference to implementation of the Sarsa( Lambda) algorithm, with replacing eligibility traces and the Epsilon greedy policy.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.