Reinforcement learning of local shape in the game of go

Authors:
David Silver;Richard Sutton;Martin Müller
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 7
Cited 16

A world championship caliber checkers program

Artificial Intelligence
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Deep Blue

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Computer Go

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Feature construction for reinforcement learning in hearts

CG'06 Proceedings of the 5th international conference on Computers and games

Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
On the role of tracking in stationary environments

Proceedings of the 24th international conference on Machine learning
Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
A Fast Indexing Method for Monte-Carlo Go

CG '08 Proceedings of the 6th international conference on Computers and Games
Knowledge Generation for Improving Simulations in UCT for General Game Playing

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Achieving master level play in 9×9 computer go

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Scalable Neural Networks for Board Games

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Indirect encoding of neural networks for scalable go

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence
A methodology for learning players| styles from game records

International Journal of Artificial Intelligence and Soft Computing
Evolving small-board Go players using coevolutionary temporal difference learning with archives

International Journal of Applied Mathematics and Computer Science
MapReduce for parallel reinforcement learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore an application to the game of Go of a reinforcement learning approach based on a linear evaluation function and large numbers of binary features. This strategy has proved effective in game playing programs and other reinforcement learning applications. We apply this strategy to Go by creating over a million features based on templates for small fragments of the board, and then use temporal difference learning and self-play. This method identifies hundreds of low level shapes with recognisable significance to expert Go players, and provides quantitive estimates of their values. We analyse the relative contributions to performance of templates of different types and sizes. Our results show that small, translation-invariant templates are surprisingly effective. We assess the performance of our program by playing against the Average Liberty Player and a variety of computer opponents on the 9×9 Computer Go Server. Our linear evaluation function appears to outperform all other static evaluation functions that do not incorporate substantial domain knowledge.