Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

Authors:
T. P. Runarsson;S. M. Lucas
Affiliations:
Sci. Inst., Univ. of Iceland, Reykjavik, Iceland;-
Venue:
IEEE Transactions on Evolutionary Computation
Year:
2005

Citing 0
Cited 10

Empirical Studies in Action Selection with Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Fitness Diversity Parallel Evolution Algorithms in the Turtle Race Game

EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
Scalable Neural Networks for Board Games

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Evolution versus temporal difference learning for learning to play Ms. Pac-Man

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Coevolutionary temporal difference learning for Othello

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Learning n-tuple networks for othello by coevolutionary gradient search

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Evolving board-game players with genetic programming

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Evolving small-board Go players using coevolutionary temporal difference learning with archives

International Journal of Applied Mathematics and Computer Science
An evolutionary multi-objective optimization approach to computer go controller synthesis

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two learning methods for acquiring position evaluation for small Go boards are studied and compared. In each case the function to be learned is a position-weighted piece counter and only the learning method differs. The methods studied are temporal difference learning (TDL) using the self-play gradient-descent method and coevolutionary learning, using an evolution strategy. The two approaches are compared with the hope of gaining a greater insight into the problem of searching for "optimal" zero-sum game strategies. Using tuned standard setups for each algorithm, it was found that the temporal-difference method learned faster, and in most cases also achieved a higher level of play than coevolution, providing that the gradient descent step size was chosen suitably. The performance of the coevolution method was found to be sensitive to the design of the evolutionary algorithm in several respects. Given the right configuration, however, coevolution achieved a higher level of play than TDL. Self-play results in optimal play against a copy of itself. A self-play player will prefer moves from which it is unlikely to lose even when it occasionally makes random exploratory moves. An evolutionary player forced to perform exploratory moves in the same way can achieve superior strategies to those acquired through self-play alone. The reason for this is that the evolutionary player is exposed to more varied game-play, because it plays against a diverse population of players.