Temporal difference learning and TD-Gammon
Communications of the ACM
Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
Co-Evolution in the Successful Learning of Backgammon Strategy
Machine Learning
Learning to evaluate Go positions via temporal difference methods
Computational intelligence in games
Computer Go: an AI oriented survey
Artificial Intelligence
Blondie24: playing at the edge of AI
Blondie24: playing at the edge of AI
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Competitive Environments Evolve Better Solutions for Complex Tasks
Proceedings of the 5th International Conference on Genetic Algorithms
Solution concepts in coevolutionary algorithms
Solution concepts in coevolutionary algorithms
The MaxSolve algorithm for coevolution
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
GP-Gammon: Genetically Programming Backgammon Players
Genetic Programming and Evolvable Machines
Coevolution of neural networks using a layered pareto archive
Proceedings of the 8th annual conference on Genetic and evolutionary computation
A Monotonic Archive for Pareto-Coevolution
Evolutionary Computation
New methods for competitive coevolution
Evolutionary Computation
Emergent geometric organization and informative dimensions in coevolutionary algorithms
Emergent geometric organization and informative dimensions in coevolutionary algorithms
Evolving strategy for a probabilistic game of imperfect information using genetic programming
Genetic Programming and Evolvable Machines
Why Coevolution Doesn't "Work": Superiority and Progress in Coevolution
EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
Reinforcement learning of local shape in the game of go
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Some studies in machine learning using the game of checkers
IBM Journal of Research and Development
Coevolutionary temporal difference learning for Othello
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
A game-theoretic memory mechanism for coevolution
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Evolution of an efficient search algorithm for the mate-in-N problem in chess
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Winning ant wars: evolving a human-competitive game strategy using fitnessless selection
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
IEEE Transactions on Evolutionary Computation
Real-time neuroevolution in the NERO video game
IEEE Transactions on Evolutionary Computation
Improving coevolution by random sampling
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Shaping fitness function for evolutionary learning of game strategies
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Quantitative analysis of the hall of fame coevolutionary archives
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.