Learning to play Go using recursive neural networks

  • Authors:
  • Lin Wu;Pierre Baldi

  • Affiliations:
  • School of Information and Computer Sciences, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, CA 92697, USA;School of Information and Computer Sciences, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, CA 92697, USA

  • Venue:
  • Neural Networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Go is an ancient board game that poses unique opportunities and challenges for artificial intelligence. Currently, there are no computer Go programs that can play at the level of a good human player. However, the emergence of large repositories of games is opening the door for new machine learning approaches to address this challenge. Here we develop a machine learning approach to Go, and related board games, focusing primarily on the problem of learning a good evaluation function in a scalable way. Scalability is essential at multiple levels, from the library of local tactical patterns, to the integration of patterns across the board, to the size of the board itself. The system we propose is capable of automatically learning the propensity of local patterns from a library of games. Propensity and other local tactical information are fed into recursive neural networks, derived from a probabilistic Bayesian network architecture. The recursive neural networks in turn integrate local information across the board in all four cardinal directions and produce local outputs that represent local territory ownership probabilities. The aggregation of these probabilities provides an effective strategic evaluation function that is an estimate of the expected area at the end, or at various other stages, of the game. Local area targets for training can be derived from datasets of games played by human players. In this approach, while requiring a learning time proportional to N^4, skills learned on a board of size N^2 can easily be transferred to boards of other sizes. A system trained using only 9x9 amateur game data performs surprisingly well on a test set derived from 19x19 professional game data. Possible directions for further improvements are briefly discussed.