Wyniki wyszukiwania - BazTech

Ograniczanie wyników

Znaleziono wyników: 1

Liczba wyników na stronie

Wyniki wyszukiwania

Wyszukiwano:
w słowach kluczowych: R(lambda)-learning

Sortuj według:

Ogranicz wyniki do:

Experimental Study on Parameter Selection for Reinforcement Learning Algorithms

Zajdel R.

Theoretical and Applied Informatics

2008

Vol. 20, No. 2

71-85

The use of the reinforcement learning algorithms is contended with a number of practical problems related to the proper choice of learning parameters. There are three such factors in case of Q(lambda)- learning algorithms and five when AHC(lambda) is considered. On the other hand, more rarely applied R-learning algorithm is parametrized by only two such factors, however it does not possess the method of accelerating. In order to compare three algorithms mentioned earlier with the R-learning algorithm, the implementation of the temporal-dierence method TD(lambda) is proposed. The main purpose of this study is to formulate, on the empirical way, the general recommendation regarding the selection of factors of reinforcement learning algorithms and to compare the eciency of these algorithms. The criterion of the factor selection is determined in terms of the highest probability of learned system. The experiments are carried out with the model of the cart-pole and the ball-beam system.

Stosowanie algorytmów uczenia ze wzmocnieniem napotyka na szereg praktycznych problemów związanych z właściwym wyborem współczynników uczenia, których jest od 3 (Q(lambda)-learning) aż do 5 (AHC(lambda). Z kolei rzadziej stosowany algorytm R-learning parametryzowany jest tylko za pomocą dwóch takich współczynników, jednakże nie posiada on metody przyśpieszania. Aby umożliwić porównanie wydajnościowe 2 wcześniej wymienionych algorytmów z algorytmem R-learning zostanie dla niego zaproponowana implementacja metody różnic czasowych TD(lambda). Głównym celem niniejszego opracowania jest podanie na drodze empirycznej ogólnych zaleceń odnośnie doboru wartości współczynników algorytmów uczenia ze wzmocnieniem oraz porównanie wydajności tych algorytmów. Kryterium doboru współczynników było uzyskanie największego prawdopodobieństwa nauczonego systemu. Eksperymenty zostały przeprowadzone z zastosowaniem modelu wahadła odwróconego i układu ball-beam.