Proceedings of the seventh international conference (1990) on Machine learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Reinforcement learning of local shape in the game of go
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Temporal difference learning applied to a high-performance game-playing program
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Efficient selectivity and backup operators in Monte-Carlo tree search
CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Sample-based learning and search with permanent and transient memories
Proceedings of the 25th international conference on Machine learning
Monte-Carlo Tree Search Solver
CG '08 Proceedings of the 6th international conference on Computers and Games
CG '08 Proceedings of the 6th international conference on Computers and Games
A Parallel Monte-Carlo Tree Search Algorithm
CG '08 Proceedings of the 6th international conference on Computers and Games
Using Artificial Boundaries in the Game of Go
CG '08 Proceedings of the 6th international conference on Computers and Games
A Fast Indexing Method for Monte-Carlo Go
CG '08 Proceedings of the 6th international conference on Computers and Games
EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
Bandit-based optimization on graphs with application to library performance tuning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Simulation-based approach to general game playing
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Achieving master level play in 9×9 computer go
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
UCT for tactical assault planning in real-time strategy games
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A novel ontology for computer go knowledge management
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Backpropagation modification in Monte-Carlo game tree search
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Monte Carlo tree search in Kriegspiel
Artificial Intelligence
Indirect encoding of neural networks for scalable go
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Multi-dimensional deep memory Atari-go players for parameter exploring policy gradients
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Consistency modifications for automatically tuned Monte-Carlo tree search
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Systematic improvement of Monte-Carlo tree search with self-generated neural-networks controllers
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Intelligent agents for the game of go
IEEE Computational Intelligence Magazine
The true score of statistical paraphrase generation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Planning with noisy probabilistic relational rules
Journal of Artificial Intelligence Research
On the scalability of parallel UCT
CG'10 Proceedings of the 7th international conference on Computers and games
Scalability and parallelization of Monte-Carlo tree search
CG'10 Proceedings of the 7th international conference on Computers and games
Biasing Monte-Carlo simulations through RAVE values
CG'10 Proceedings of the 7th international conference on Computers and games
Computational experiments with the RAVE heuristic
CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo simulation balancing in practice
CG'10 Proceedings of the 7th international conference on Computers and games
Score bounded Monte-Carlo tree search
CG'10 Proceedings of the 7th international conference on Computers and games
Improving Monte-Carlo tree search in Havannah
CG'10 Proceedings of the 7th international conference on Computers and games
Principled method for exploiting opening books
CG'10 Proceedings of the 7th international conference on Computers and games
Enhancements for multi-player Monte-Carlo tree search
CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo tree search and rapid action value estimation in computer Go
Artificial Intelligence
Evolving neural networks for geometric game-tree pruning
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Revisiting Monte-Carlo tree search on a normal form game: NoGo
EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
The Journal of Machine Learning Research
Monte-carlo style UCT search for boolean satisfiability
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Multi-armed bandits with episode context
Annals of Mathematics and Artificial Intelligence
Adding expert knowledge and exploration in monte-carlo tree search
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Evaluation function based monte-carlo LOA
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
A study of UCT and its enhancements in an artificial game
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Creating an upper-confidence-tree program for havannah
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Bandit-Based genetic programming
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Continuous upper confidence trees
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Evolutionary learning of policies for MCTS simulations
Proceedings of the International Conference on the Foundations of Digital Games
Online planning for ad hoc autonomous agent teams
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Nested rollout policy adaptation for Monte Carlo tree search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Non-linear Monte-Carlo search in civilization II
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Guiding combinatorial optimization with UCT
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Single-player Monte-Carlo tree search for SameGame
Knowledge-Based Systems
Nested Monte-Carlo Search with simulation reduction
Knowledge-Based Systems
UCD: Upper confidence bound for rooted directed acyclic graphs
Knowledge-Based Systems
Safe robot learning by energy limitation
ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III
Bootstrapping monte carlo tree search with an imperfect heuristic
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Pilot, rollout and monte carlo tree search methods for job shop scheduling
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Improving the exploration in upper confidence trees
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Investigating monte-carlo methods on the weak schur problem
EvoCOP'13 Proceedings of the 13th European conference on Evolutionary Computation in Combinatorial Optimization
Tree pruning for new search techniques in computer games
Advances in Artificial Intelligence
Sufficiency-based selection strategy for MCTS
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
A tour of machine learning: An AI perspective
AI Communications - ECAI 2012 Turing and Anniversary Track
Hi-index | 0.00 |
The UCT algorithm learns a value function online using sample-based search. The TD(λ) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 x 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 x 9 Go program. Each technique significantly improves MoGo's playing strength.