PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Epokowo-inkrementacyjny algorytm uczenia się ze wzmocnieniem wykorzystujący kryterium średniego wzmocnienia

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
EN
The epoch-incremental reinforcement learning algorithm based on the average reward
Języki publikacji
PL
Abstrakty
PL
W artykule zaproponowano nowy, epokowo – inkrementacyjny algorytm uczenia się ze wzmocnieniem. Główną ideą tego algorytmu jest przeprowadzenie w trybie epokowym dodatkowych aktualizacji strategii w oparciu o odległości aktywnych w przeszłości stanów od stanu terminalnego. Zaproponowany algorytm oraz algorytmy R(0)-learning, R(λ)-learning, Dyna-R oraz prioritized sweeping-R zastosowano do sterowania modelem samochodu górskiego oraz modelem kulki umieszczonej na balansującej belce.
EN
The application of the average reward reinforcement learning algorithms in the control were described in this paper. Moreover, new epoch-incremental reinforcement learning algorithm (EIR(0)-learning for short) was proposed. In this algorithm, the basic R(0)-learning algorithm was implemented in the incremental mode and the environment model was created. In the epoch mode, on the basis of the model, the distances of past active states to the terminal state were determined. These distances were then used in the update strategy. The proposed algorithm was applied to mountain car (Fig. 4) and ball-beam (Fig. 5) models. The proposed EIR(0)-learning was empirically compared to R(0)-learning [4, 6], R(λ)-learning and model based algorithms: Dyna-R and prioritized sweeping-R [11]. In the case of ball-beam system, EIR(0)-learning algorithm reached the stable control strategy after the smallest number of trials (Tab. 1, column 2). For the mountain car system, the number of trials was smaller than in the case of R(0)-learning and R(λ)-learning algorithms, but greater than for Dyna-R and prioritized sweeping-R. It is worth to pay attention to the fact that the execution times of Dyna-R and prioritized sweeping-R algorithms in the incremental mode were respectively 5 and 50 times longer than for proposed EIR(0)-learning algorithm (Tab. 2, column 3). The main conclusion of this work is that the epoch – incremental learning algorithm provided the stable control strategy in relatively small number of trials and in short time of single iteration.
Wydawca
Rocznik
Strony
700--703
Opis fizyczny
Bibliogr. 14 poz., rys., tab., wzory
Twórcy
autor
  • Politechnika Rzeszowska, Al. Powstańców Warszawy 12, 35-959 Rzeszów
Bibliografia
  • [1] Watkins, C. J. C. H.: Learning from delayed Rewards. PhD thesis, Cambridge University, Cambridge, England, 1989.
  • [2] Barto A., Sutton R., Anderson C.: Neuronlike adaptive elements that can solve difficult learning problem, IEEE Trans. SMC, 13, pp. 834-847, 1983.
  • [3] Rummery G., Niranjan M.: On line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
  • [4] Schwartz A.: A reinforcement learning method for maximizing undiscounted rewards, Proc. of Tenth International Conference on Machine Learning, Amhest, Massachusetts. Morgan Kaufman, 298-305, 1993.
  • [5] Tadepalli P., Ok D.: Model-Based Average Reward Reinforcement Learning. Artificial Intelligence, 100, pp. 177-224, 1998.
  • [6] Sutton R., Barto A.: Reinforcement learning: An Introduction, MIT Press, Cambridge, 1998.
  • [7] Cichosz P.: Systemy uczące się. WNT, Warszawa, 2000.
  • [8] Sutton R.: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In Proc. of Seventh Int. Conf. on Machine Learning, pp. 216-224, 1990.
  • [9] Moore A., Atkeson C.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, pp. 103-130, 1993.
  • [10] Peng J., Williams R.: Efficient learning and planning within the Dyna framework. In Proc. of the 2nd International Conference on Simulation of Adaptive Behavior, pp. 281-290, 1993.
  • [11] Zajdel R., Algorytmy uczenia ze wzmocnieniem Dyna-R i prioritized sweeping-R, Inżynieria wiedzy i systemy ekspertowe, Akademicka Oficyna Wydawnicza EXIT, Warszawa 2009, 161-169.
  • [12] Zajdel R., Epoch-Incremental Queue-Dyna Algorithm, The Ninth International Conference on Artificial Intelligence and Soft Computing, Zakopane, Lecture Notes in Artificial Intelligence 5097, 1160-1170, 2008.
  • [13] Wellstead, P. E.: Introduction to Physical System Modelling, Control Systems Principles, 2000.
  • [14] Kaelbing L. P., Litman, M. L., Moore, A. W., (1996). Reinforcement Learning: A Survey, Journal of Artificial Intelligence Research 4, 237–285.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6761d4a6-ca68-4166-8dca-391927df5c1d
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.