Wyniki wyszukiwania - BazTech

1

A Recursive Classifier System for Partially Observable Environments

Hamzeh A., Hashemi S., Sami A., Rahmani A.

Fundamenta Informaticae

|

2009

|

Vol. 97, nr 1/2

15-40

EN

Previously we introduced Parallel Specialized XCS (PSXCS), a distributed-architecture classifier system that detects aliased environmental states and assigns their handling to created subordinate XCS classifier systems. PSXCS uses a history-window approach, but with novel efficiency since the subordinateXCSs, which employ the windows, are only spawned for parts of the state space that are actually aliased. However, because the window lengths are finite and set manually, PSXCS may fail to be optimal in difficult test mazes. This paper introduces Recursive PSXCS (RPSXCS) that automatically spawns windows wherever more history is required. Experimental results show that RPSXCS is both more powerful and learns faster than PSXCS. The present research suggests new potential for history approaches to partially observable environments.

2

Approximating Arbitrary Reinforcement Signal by Learning Classifier Systems using Micro Genetic Algorithm

Hamzeh A., Rahmani A.

Fundamenta Informaticae

|

2008

|

Vol. 86, nr 1-2

93-111

EN

Learning Classifier Systems are Evolutionary Learning mechanisms which combine Genetic Algorithm and the Reinforcement Learning paradigm. Learning Classifier Systems try to evolve state-action-reward mappings to propose the best action for each environmental state to maximize the achieved reward. In the first versions of learning classifier systems, state-action pairs can only be mapped to a constant real-valued reward. So to model a fairly complex environment, LCSs had to develop redundant state-action pairs which had to be mapped to different reward values. But an extension to a well-known LCS, called Accuracy Based Learning Classifier System or XCS, was recently developed which was able to map state-action pairs to a linear reward function. This new extension, called XCSF, can develop a more compact population than the original XCS. But some further researches have shown that this new extension is not able to develop proper mappings when the input parameters are from certain intervals. As a solution to this issue, in our previous works, we proposed a novel solution inspired by the idea of using evolutionary approach to approximate the reward landscape. The first results seem promising, but our approach, called XCSFG, converged to the goal very slowly. In this paper, we propose a new extension to XCSFG which employs micro-GA which its needed population is extremely smaller than simple GA. So we expect micro-GA to help XCSFG to converge faster. Reported results show that this new extension can be assumed as an alternative approach in XCSF family with respect to its convergence speed, approximation accuracy and population compactness.

3

A New Architecture for Learning Classifier Systems to Solve POMDP Problems

Hamzeh A., Rahmani A.

Fundamenta Informaticae

|

2008

|

Vol. 84, nr 3-4

329-351

EN

Reinforcement Learning is a learning paradigm that helps the agent to learn to act optimally in an unknown environment through trial and error. An RL-based agent senses its environmental state, proposes an action, and applies it to the environment. Then a reinforcement signal, called the reward, is sent back from the environment to the agent. The agent is expected to learn how to maximize overall environmental reward through its internal mechanisms. One of the most challenging issues in the RL area arises as a result of the sensory ability of the agent, when it is not able to sense its current environmental state completely. These environments are called partially observable environments. In these environments, the agent may fail to distinguish the actual environmental state and so may fail to propose the optimal action in particular environmental states. So an extended mechanism must be added to the architecture of the agent to enable it to perform optimally in these environments. On the other hand, one of the most-used approaches to reinforcement learning is the evolutionary learning approach and one of the most-used techniques in this family is learning classifier systems. Learning classifier systems try to evolve state-action-reward mappings to model their current environment through trial and error. In this paper we propose a new architecture for learning classifier systems that is able to perform optimally in partially observable environments. This new architecture uses a novel method to detect aliased states in the environment and disambiguates them through multiple instances of classifier systems that interact with the environment in parallel. This model is applied to some well-known benchmark problems and is compared with some of the best classifier systems proposed for these environments. Our results and detailed discussion show that our approach is one of the best techniques among other learning classifier systems in partially observable environments.