PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A New Architecture for Learning Classifier Systems to Solve POMDP Problems

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Reinforcement Learning is a learning paradigm that helps the agent to learn to act optimally in an unknown environment through trial and error. An RL-based agent senses its environmental state, proposes an action, and applies it to the environment. Then a reinforcement signal, called the reward, is sent back from the environment to the agent. The agent is expected to learn how to maximize overall environmental reward through its internal mechanisms. One of the most challenging issues in the RL area arises as a result of the sensory ability of the agent, when it is not able to sense its current environmental state completely. These environments are called partially observable environments. In these environments, the agent may fail to distinguish the actual environmental state and so may fail to propose the optimal action in particular environmental states. So an extended mechanism must be added to the architecture of the agent to enable it to perform optimally in these environments. On the other hand, one of the most-used approaches to reinforcement learning is the evolutionary learning approach and one of the most-used techniques in this family is learning classifier systems. Learning classifier systems try to evolve state-action-reward mappings to model their current environment through trial and error. In this paper we propose a new architecture for learning classifier systems that is able to perform optimally in partially observable environments. This new architecture uses a novel method to detect aliased states in the environment and disambiguates them through multiple instances of classifier systems that interact with the environment in parallel. This model is applied to some well-known benchmark problems and is compared with some of the best classifier systems proposed for these environments. Our results and detailed discussion show that our approach is one of the best techniques among other learning classifier systems in partially observable environments.
Słowa kluczowe
Wydawca
Rocznik
Strony
329--351
Opis fizyczny
bibliogr. 35 poz., tab., wykr.
Twórcy
autor
autor
  • Department of Computer Engineering, Iran University of Science and Technology, Narmak, Teheran, Iran, hamzeh@iust.ac.ir
Bibliografia
  • [1] J.H. Holland, 1995, Escaping Brittleness: The Possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-based Systems. In Computation and Intelligence: Collected Readings American Association for Artificial Intelligence, Menlo Park, CA, pages 275-304.
  • [2] R. Sutton, A. Barto, 1998, Reinforcement learning, Cambridge, MIT Press, ISBN: 0262193981.
  • [3] T. Kovacs, P.L. Lanzi, 2003, The 2003 Learning Classifier Systems Bibliography, In IWLCS2003, volume LNCS 2661, pages 187-230, Springer.
  • [4] L. Lin, 1993, Reinforcement Learning for Robots Using Neural Networks. Technical Report CMU-CS-93-103, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.
  • [5] G.G. Robertson, R.L. Riolo, 1988, A Tale of Two Classifier Systems, Machine Learning, 3, pages 139-159.
  • [6] M. Colombetti, and M. Dorigo, 1994, Training Agents to Pet from Sequential Behavior, Adaptive Behavior Journal 2(3): pages 247-275.
  • [7] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, 1998, Planning and Acting in Partially Observable Stochastic Domains, Artificial Intelligence, Vol. 101.
  • [8] S.W. Wilson, 1995, Classifier Fitness Based on Accuracy, Evolutionary Computation 3(2): 149-175.
  • [9] L.P. Kaelbling, M.L. Littman, and A. Moore, 1996, Reinforcement Learning: A Survey, Artificial Intelligence Research, vol. 4, pages 237-285.
  • [10] B. Widrow, M.E. Hoff, 1988, Adaptive Switching Circuits, Chapter Neurocomputing: Foundation of Research, pp. 126-134. Cambridge: The MIT Press.
  • [11] P.L. Lanzi, 1997, A Model of the Environment to Avoid Local Learning, Technical Report Number 9746, Dipartimneto di Electronica e Informazione, Politectico di Milano.
  • [12] P.L. Lanzi, 1999, an Analysis of Generalization in the XCS Classifier System, Evolutionary Computation 7(2): 125-149.
  • [13] S. Russell S, P. Norvig, 2003, Artificial Intelligence: A Modern Approach, Second Edition, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey.
  • [14] D. Cliff, S. Ross, 1994, Adding Memory to ZCS, Adaptive Behavior Journal 3(2), pages 101-150.
  • [15] S.W. Wilson, 1994, ZCS: a Zeroth Level Classifier System, Evolutionary Computation, 1(2):1-18.
  • [16] P.L. Lanzi, 1998, Adding Memory to XCS, In Proceedings of the IEEE Conference on Evolutionary Computation, IEEE Press.
  • [17] P.L. Lanzi, 1998, An Analysis of the Memory Mechanism of XCSM, In Proceedings of the Third Annual Conference on Genetic Programming. Morgan Kaufmann: San Francisco, CA, pp. 643-651
  • [18] A. Tomlinson, and L. Bull, 1998, A Corporate Classifier System. In Proceedings of the Fifth International Conference on Parallel Problem Solving from Nature - PPSN V, number 1498 in LNCS, pp. 550-559, Springer-Verlag.
  • [19] A. Tomlinson, and L. Bull, 1999, On Corporate Classifier Systems: Increasing the Benefits of Rule Linkage, In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 549-656, Morgan Kaufmann: San Francisco CA.
  • [20] A. Tomlinson, and L. Bull, 1999, A Zeroth Level Corporate Classifier System, In Proceedings of the Genetic and Evolutionary Computation Conference Workshop Program, pp. 306-313, Morgan Kaufmann: San Francisco CA.
  • [21] A. Tomlinson, L. Bull, 2002, An Accuracy Based Corporate Classifier System, Soft Computing. 6(3-4): 200-215.
  • [22] PL. Lanzi, S.W. Wilson, 2000, Toward Optimal Classifier System Performance in Non-Markov Environments, Evolutionary Computation 8(4), pp. 393-418.
  • [23] T. Kovacs, 2002, What Should a Classifier System Learn and How Should We Measure It, Journal of Soft Computing, 6(3, 4), pages 171-182.
  • [24] M.V. Butz, T. Kovacs, PL. Lanzi, and S.W. Wilson, 2004, Toward a Theory of Generalization and Learning in XCS, IEEE Transaction on Evolutionary Computation, 8(1), pages 28-46.
  • [25] L. Bull, 2005, Two Simple Learning Classifier Systems, In L. Bull & T. Kovacs (eds.) Foundations of Learning Classifier Systems, Springer-Verlag.
  • [26] A.J. Bagnall, Z. Zatuchna, 2005, On the Classification of Maze Problems, Applications of Learning Classifier Systems, Studies in, Edited by Bull, L. and Kovacs, T., Springer, pp. 307-316.
  • [27] PL. Lanzi, 1997, A Model of the Environment to Avoid Local Learning (An Analysis of the Generalization Mechanism of XCS), Technical Report 9746, Politecnico di Milano, Department of Electronic Engineering and Information Sciences.
  • [28] C.H. Watkins, 1989, Learning from Delayed Rewards, PhD thesis, King's College, Cambridge, UK.
  • [29] M. Metivier, C. Lattaud, 2002, Anticipatory Classifier System using Behavioral Sequences in non-Markov Environments, In Proceedings of IWLCS2002, pages 143-162, Springer-Verlag.
  • [30] W. Stolzmann, 1998, Anticipatory Classifier Systems, In Proceedings of the Third Annual Genetic Programming Conference, pages 658-664. Morgan Kaufmann.
  • [31] M.V. Butz, 2002, Biasing Exploration in an Anticipatory Learning Classifier System, In Advances in Learning Classifier Systems, volume 2321 of LNAI. Springer-Verlag, Berlin, pages 3-22.
  • [32] Z. Zatuchna, 2005, AgentP: a Learning Classifier System with Associative Perception in Maze Environments, PhD. Thesis, University of East Anglia.
  • [33] L. Bull, 2002, Look ahead and latent learning in ZCS. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 897-904, New York, Morgan Kaufmann Publishers.
  • [34] P. Gerard, O. Sigaud, 2001, Adding a Generalization Mechanism to YACS, In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 951-957, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.
  • [35] G. Kanji, 1994,100 Statistical Tests, SAGE Publications.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS5-0015-0077
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.