Empirical Studies in Action Selection with Reinforcement Learning
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Fitness Diversity Parallel Evolution Algorithms in the Turtle Race Game
EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
Scalable Neural Networks for Board Games
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Evolution versus temporal difference learning for learning to play Ms. Pac-Man
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Coevolutionary temporal difference learning for Othello
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Autonomous Agents and Multi-Agent Systems
Learning n-tuple networks for othello by coevolutionary gradient search
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Evolving board-game players with genetic programming
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Evolving small-board Go players using coevolutionary temporal difference learning with archives
International Journal of Applied Mathematics and Computer Science
An evolutionary multi-objective optimization approach to computer go controller synthesis
PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Hi-index | 0.00 |
Two learning methods for acquiring position evaluation for small Go boards are studied and compared. In each case the function to be learned is a position-weighted piece counter and only the learning method differs. The methods studied are temporal difference learning (TDL) using the self-play gradient-descent method and coevolutionary learning, using an evolution strategy. The two approaches are compared with the hope of gaining a greater insight into the problem of searching for "optimal" zero-sum game strategies. Using tuned standard setups for each algorithm, it was found that the temporal-difference method learned faster, and in most cases also achieved a higher level of play than coevolution, providing that the gradient descent step size was chosen suitably. The performance of the coevolution method was found to be sensitive to the design of the evolutionary algorithm in several respects. Given the right configuration, however, coevolution achieved a higher level of play than TDL. Self-play results in optimal play against a copy of itself. A self-play player will prefer moves from which it is unlikely to lose even when it occasionally makes random exploratory moves. An evolutionary player forced to perform exploratory moves in the same way can achieve superior strategies to those acquired through self-play alone. The reason for this is that the evolutionary player is exposed to more varied game-play, because it plays against a diverse population of players.