Multi-dimensional deep memory Atari-go players for parameter exploring policy gradients

Authors:
Mandy Grüttner;Frank Sehnke;Tom Schaul;Jürgen Schmidhuber
Affiliations:
Faculty of Computer Science, Technische Universität München, Germany;Faculty of Computer Science, Technische Universität München, Germany;IDSIA, University of Lugano, Switzerland;IDSIA, University of Lugano, Switzerland
Venue:
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Year:
2010

Citing 7
Cited 0

Evolution and Optimum Seeking: The Sixth Generation

Evolution and Optimum Seeking: The Sixth Generation
Completely Derandomized Self-Adaptation in Evolution Strategies

Evolutionary Computation
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Policy Gradients with Parameter-Based Exploration for Control

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Scalable Neural Networks for Board Games

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
PyBrain

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing superior artificial board-game players is a widely-studied area of Artificial Intelligence. Among the most challenging games is the Asian game of Go, which, despite its deceivingly simple rules, has eluded the development of artificial expert players. In this paper we attempt to tackle this challenge through a combination of two recent developments in Machine Learning. We employ Multi-Dimensional Recurrent Neural Networks with Long Short-Term Memory cells to handle the multi-dimensional data of the board game in a very natural way. In order to improve the convergence rate, as well as the ultimate performance, we train those networks using Policy Gradients with Parameter-based Exploration, a recently developed Reinforcement Learning algorithm which has been found to have numerous advantages over Evolution Strategies. Our empirical results confirm the promise of this approach, and we discuss how it can be scaled up to expert-level Go players.