Combining online and offline knowledge in UCT

Authors:
Sylvain Gelly;David Silver
Affiliations:
Univ. Paris Sud, INRIA, France;University of Alberta, Edmonton, Alberta
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 8
Cited 67

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Reinforcement learning of local shape in the game of go

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Temporal difference learning applied to a high-performance game-playing program

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
Monte-Carlo Tree Search Solver

CG '08 Proceedings of the 6th international conference on Computers and Games
Multi-player Go

CG '08 Proceedings of the 6th international conference on Computers and Games
A Parallel Monte-Carlo Tree Search Algorithm

CG '08 Proceedings of the 6th international conference on Computers and Games
Using Artificial Boundaries in the Game of Go

CG '08 Proceedings of the 6th international conference on Computers and Games
A Fast Indexing Method for Monte-Carlo Go

CG '08 Proceedings of the 6th international conference on Computers and Games
Grid Coevolution for Adaptive Simulations: Application to the Building of Opening Books in the Game of Go

EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
Bandit-based optimization on graphs with application to library performance tuning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Achieving master level play in 9×9 computer go

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
UCT for tactical assault planning in real-time strategy games

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Nested Monte-Carlo search

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A novel ontology for computer go knowledge management

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Backpropagation modification in Monte-Carlo game tree search

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Monte Carlo tree search in Kriegspiel

Artificial Intelligence
Indirect encoding of neural networks for scalable go

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Multi-dimensional deep memory Atari-go players for parameter exploring policy gradients

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Consistency modifications for automatically tuned Monte-Carlo tree search

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Systematic improvement of Monte-Carlo tree search with self-generated neural-networks controllers

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Intelligent agents for the game of go

IEEE Computational Intelligence Magazine
The true score of statistical paraphrase generation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
On the scalability of parallel UCT

CG'10 Proceedings of the 7th international conference on Computers and games
Scalability and parallelization of Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Biasing Monte-Carlo simulations through RAVE values

CG'10 Proceedings of the 7th international conference on Computers and games
Computational experiments with the RAVE heuristic

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo simulation balancing in practice

CG'10 Proceedings of the 7th international conference on Computers and games
Score bounded Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Improving Monte-Carlo tree search in Havannah

CG'10 Proceedings of the 7th international conference on Computers and games
Principled method for exploiting opening books

CG'10 Proceedings of the 7th international conference on Computers and games
Enhancements for multi-player Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence
Evolving neural networks for geometric game-tree pruning

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Revisiting Monte-Carlo tree search on a normal form game: NoGo

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
X-Armed Bandits

The Journal of Machine Learning Research
Monte-carlo style UCT search for boolean satisfiability

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Multi-armed bandits with episode context

Annals of Mathematics and Artificial Intelligence
Adding expert knowledge and exploration in monte-carlo tree search

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Evaluation function based monte-carlo LOA

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
A study of UCT and its enhancements in an artificial game

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Creating an upper-confidence-tree program for havannah

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Bandit-Based genetic programming

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Continuous upper confidence trees

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Evolutionary learning of policies for MCTS simulations

Proceedings of the International Conference on the Foundations of Digital Games
Online planning for ad hoc autonomous agent teams

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Nested rollout policy adaptation for Monte Carlo tree search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Non-linear Monte-Carlo search in civilization II

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Guiding combinatorial optimization with UCT

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Single-player Monte-Carlo tree search for SameGame

Knowledge-Based Systems
Nested Monte-Carlo Search with simulation reduction

Knowledge-Based Systems
Dynamic randomization and domain knowledge in Monte-Carlo Tree Search for Go knowledge-based systems

Knowledge-Based Systems
UCD: Upper confidence bound for rooted directed acyclic graphs

Knowledge-Based Systems
Safe robot learning by energy limitation

ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III
Bootstrapping monte carlo tree search with an imperfect heuristic

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Pilot, rollout and monte carlo tree search methods for job shop scheduling

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Improving the exploration in upper confidence trees

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Investigating monte-carlo methods on the weak schur problem

EvoCOP'13 Proceedings of the 13th European conference on Evolutionary Computation in Combinatorial Optimization
Tree pruning for new search techniques in computer games

Advances in Artificial Intelligence
Sufficiency-based selection strategy for MCTS

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.00

Visualization

Abstract

The UCT algorithm learns a value function online using sample-based search. The TD(λ) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 x 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 x 9 Go program. Each technique significantly improves MoGo's playing strength.