Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Learning board evaluation function for Othello by hybridizing coevolution with temporal difference learning

Treść / Zawartość
Warianty tytułu
Języki publikacji
Hybridization of global and local search techniques has already produced promising results in the fields of optimization and machine learning. It is commonly presumed that approaches employing this idea, like memetic algorithms combining evolutionary algorithms and local search, benefit from complementarity of constituent methods and maintain the right balance between exploration and exploitation of the search space. While such extensions of evolutionary algorithms have been intensively studied, hybrids of local search with coevolutionary algorithms have not received much attention. In this paper we attempt to fill this gap by presenting Coevolutionary Temporal Difference Learning (CTDL) that works by interlacing global search provided by competitive coevolution and local search by means of temporal difference learning. We verify CTDL by applying it to the board game of Othello, where it learns board evaluation functions represented by a linear architecture of weighted piece counter. The results of a computational experiment show CTDL superiority compared to coevolutionary algorithm and temporal difference learning alone, both in terms of performance of elaborated strategies and computational cost. To further exploit CTDL potential, we extend it by an archive that keeps track of selected well-performing solutions found so far and uses them to improve search convergence. The overall conclusion is that the fusion of various forms of coevolution with a gradient-based local search can be highly beneficial and deserves further study.
Opis fizyczny
Bibliogr. 34 poz., wykr.
  • Angeline, P.J. and Pollack, J.B. (1993) Competitive Environments Evolve Better Solutions for Complex Tasks. In: S. Forrest, ed., Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93. Morgan Kaufmann, University of Illinois at Urbana-Champaign, 264-270.
  • Archer, K.J. and Kimes, R.V. (2008) Empirical Characterization of Random Forest Variable Importance Measures. Comp. Stat. &Data Anal., 52(4), 2249-2260.
  • Azaria, Y. and Sipper, M. (2005) GP-Gammon: Genetically Programming Backgammon Players. Genetic Programming and Evolvable Machines, 6(3), 283-300.
  • Bucci, A. (2007) Emergent Geometric Organization and Informative Dimensions in Coevolutionary Algorithms. Ph.D. thesis, Michtom School of Computer Science, Brandeis University.
  • Buro, M. (2002) Improving heuristic mini-max search by supervised learning. Artificial Intelligence 134 (1-2), 85-99.
  • Caverlee, J.B. (2000) A Genetic Algorithm Approach to Discovering an Optimal Blackjack Strategy. In: Koza, J.R., ed., Genetic Algorithms and Genetic Programming at Stanford 2000. Stanford Bookstore, Stanford, California, 70-79.
  • de Jong,E.D. (2005) The MaxSolve algorithm for coevolution. In: H.G. Beyer et al., eds., GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, vol. 1. ACM Press, Washington DC, USA, 483-489.
  • de Jong, E.D. (2007) A Monotonic Archive for Pareto-Coevolution. Evolutionary Computation, 15(1), 61-93.
  • Ficici, S.G. (2004) Solution concepts in coevolutionary algorithms. Ph.D. thesis, Waltham, MA, USA.
  • Ficici, S.G. and Pollack, J.B. (2003) A Game-Theoretic Memory Mechanism for Coevolution. In: E. Cantú-Paz et al., eds., Genetic and Evolutionary Computation - GECCO 2003. LNCS 2723. Springer, Chicago, IL, 286-297.
  • Fogel, D.B. (2001) Blondie24: Playing at the Edge of AI. Morgan Kaufmann Publishers.
  • Hauptman, A. and Sipper, M. (2007) Evolution of an Efficient Search Algorithm for the Mate-In-N Problem in Chess. In: M. Ebner et al., eds., Proceedings of the 10th European Conference on Genetic Programming. LNCS 4445. Springer, Valencia, Spain, 78-89.
  • Jaśkowski, W., Krawiec, K. and Wieloch, B. (2008) Evolving Strategy for a Probabilistic Game of Imperfect Information using Genetic Programming. Genetic Programming and Evolvable Machiness, 9(4), 281-294.
  • Kim, K.-J., Choi, H. and Cho, S.-B. (2007) Hybrid of Evolution and Reinforcement Learning for Othello Players. IEEE Symposium on Computational Intelligence and Games, 203-209.
  • Krawiec,K., Jaśkowski,W. and Szubert,M. (2011) Evolving Small-Board Go Players using Coevolutionary Temporal Difference Learning with Archive. International Journal of Applied Mathematics and Computer Science, 21(4).
  • Lee, K.-F. and Mahajan, S. (1990) The development of a world class Othello program. Artificial Intelligence Journal, 43(1), 21-36.
  • Lubberts, A. and Miikkulainen, R. (2001) Co -Evolving a Go-Playing Neural Network. In: R.K. Belew & H. Juillé, eds., Coevolution: Turning Adaptive Algorithms upon Themselves. Birds-of-a-Feather Workshop. GECCO 2001, 14-19.
  • Lucas, S.M. and Runarsson, T.P. (2006) Temporal difference learning versus co-evolution for acquiring othello position evaluation. IEEE Symposium on Computational Intelligence and Games. IEEE. 52-59.
  • Luke, S. (1998) Genetic Programming Produced Competitive Soccer Softbot Teams for RoboCup97. In: J.R. Koza et al., eds., Genetic Programming 1998: Proceedings of the Third Annual Conference. Morgan Kaufmann, University of Wisconsin, Madison, 214-222.
  • Luke, S. (2008) ECJ 18 - A Java-based Evolutionary Computation Research System. eclab/projects/ecj/.
  • Luke, S. and Wiegand, R.P. (2002) When coevolutionary algorithms exhibit evolutionary dynamics. In: A.M. Barry, ed., GECCO 2002: Proc. of the Bird of a Feather Workshops, Genetic and Evolutionary Computation conference. AAAI, Menlo Park, CA, 236-241.
  • Manning, E.P. (2007) Temporal Difference Learning of an Othello Evaluation Function for a Small Neural Network with Shared Weights. IEEE Symposium on Computational Intelligence and Games, 216-223.
  • Miconi, T. (2009)Why Coevolution Doesn’t „Work”: Superiorityand Progress in Coevolution. In: Proc. of the 12th European Conference on Genetic Programming. LNCS 5481, Springer, 49-60.
  • Monroy,G.A., Stanley,K.O. and Miikkulainen,R. (2006) Coevolution of neural networks using a layered Pareto archive. In: M. Keijzer et al., eds., GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, vol. 1. ACM Press, Seattle, Washington, USA, 329-336.
  • Pollack, J.B. and Blair, A.D. (1998) Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning, 32(3), 225-240.
  • Rosin, C.D. and Belew, R.K. (1997) New Methods for Competitive Coevolution. Evolutionary Computation, 5(1), 1-29.
  • Runarsson, T.P. and Lucas, S.M. (2005) Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go. IEEE Transactions on Evolutionary Computation, 9(6), 628-640.
  • Samuel, A.L. (1959) Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3), 211-229.
  • Singer, J.A. (2001) Co-evolving a Neural-Net Evaluation Function for Othello by Combining Genetic Algorithms and Reinforcement Learning. Proc. of International Conference on Computational Science (2). Part II. ICCS 2001, Springer, London, 377-389.
  • Stanley, K.O., Bryant, B, and Miikkulainen, R. (2005) Real-time neuroevolution in the NERO video game. IEEE Transactions on Evolutionary Computation, 9(6), 653-668.
  • Sutton, R.S. (1988) Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3, 9-44.
  • Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning. Vol. 9. MIT Press.
  • Szubert, M., Jaśkowski,W. and Krawiec, K. (2009) Coevolutionary Temporal Difference Learning for Othello. IEEE Symposium on Computational Intelligence and Games, 104-111.
  • Tesauro, G. (1995) Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.