Accidental exploration through value predictors

Kisielewski, Tomasz; Leśniak, Damian

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Schedae Informaticae

2018 | Vol. 27 |

Tytuł artykułu

Accidental exploration through value predictors

Autorzy

Kisielewski, Tomasz , Leśniak, Damian

Wybrane pełne teksty z tego czasopisma

http://www.ejournals.eu/Schedae-Informaticae/

Warianty tytułu

Języki publikacji

Abstrakty

Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning. In practice learning occurs on finite trajectories. In this paper we examine a specific result of this disparity, namely a strong bias of the time-bounded Every-visit Monte Carlo value estimator. This manifests as a vastly different learning dynamic for algorithms that use value predictors, including encouraging or discouraging exploration. We investigate these claims theoretically for a one dimensional random walk, and empirically on a number of simple environments. We use GAE as an algorithm involving a value predictor and evolution strategies as a reference point.

Słowa kluczowe

reinforcement learning value predictors exploration

Wydawca

Czasopismo

Schedae Informaticae

Rocznik

2018

Tom

Vol. 27

Opis fizyczny

Twórcy

autor

Kisielewski, Tomasz

Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland

autor

Leśniak, Damian

Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland

Bibliografia

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.baztech-44ce9b3c-4551-4aad-9e31-1a944723cb04