Bandit based monte-carlo planning

Authors:
Levente Kocsis;Csaba Szepesvári
Affiliations:
Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 7
Cited 151

An analysis of forward pruning

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
The challenge of poker

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Operations Research
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
PickPocket: A computer billiards shark

Artificial Intelligence
Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
Rollout sampling approximate policy iteration

Machine Learning
Single-Player Monte-Carlo Tree Search

CG '08 Proceedings of the 6th international conference on Computers and Games
Amazons Discover Monte-Carlo

CG '08 Proceedings of the 6th international conference on Computers and Games
Monte-Carlo Tree Search Solver

CG '08 Proceedings of the 6th international conference on Computers and Games
An Analysis of UCT in Multi-player Games

CG '08 Proceedings of the 6th international conference on Computers and Games
Multi-player Go

CG '08 Proceedings of the 6th international conference on Computers and Games
Parallel Monte-Carlo Tree Search

CG '08 Proceedings of the 6th international conference on Computers and Games
A Parallel Monte-Carlo Tree Search Algorithm

CG '08 Proceedings of the 6th international conference on Computers and Games
Using Artificial Boundaries in the Game of Go

CG '08 Proceedings of the 6th international conference on Computers and Games
A Fast Indexing Method for Monte-Carlo Go

CG '08 Proceedings of the 6th international conference on Computers and Games
Symbolic Classification of General Two-Player Games

KI '08 Proceedings of the 31st annual German conference on Advances in Artificial Intelligence
Learning to play Go using recursive neural networks

Neural Networks
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
Optimistic Planning of Deterministic Systems

Recent Advances in Reinforcement Learning
Knowledge Generation for Improving Simulations in UCT for General Game Playing

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
General Game Playing with Ants

SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
A game theory approach to high-level strategic planning in first person shooters

IE '08 Proceedings of the 5th Australasian Conference on Interactive Entertainment
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Theoretical Computer Science
Grid Coevolution for Adaptive Simulations: Application to the Building of Opening Books in the Game of Go

EvoWorkshops '09 Proceedings of the EvoWorkshops 2009 on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, EvoNUM, EvoSTOC, EvoTRANSLOG
To create neuro-controlled game opponent from UCT-created data

Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
Bandit-based optimization on graphs with application to library performance tuning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Optimal robust expensive optimization is tractable

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Visualization and adjustment of evaluation functions based on evaluation values and win probability

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Achieving master level play in 9×9 computer go

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
A machine learning approach for statistical software testing

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
UCT for tactical assault planning in real-time strategy games

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Monte Carlo tree search techniques in the game of Kriegspiel

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Minimum proof graphs and fastest-cut-first search heuristics

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Improving state evaluation, inference, and search in trick-based card games

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Monte-Carlo exploration for deterministic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Coevolving intelligent game players in a cultural framework

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Monte-Carlo Tree Search in Poker Using Expected Reward Distributions

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
A novel ontology for computer go knowledge management

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Monte Carlo search applied to card selection in magic: the gathering

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
A study on security evaluation methodology for image-based biometrics authentication systems

BTAS'09 Proceedings of the 3rd IEEE international conference on Biometrics: Theory, applications and systems
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Bandit-based Monte-Carlo planning for the single-machine total weighted tardiness scheduling problem

EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Backpropagation modification in Monte-Carlo game tree search

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
To create intelligent adaptive neuro-controller of game opponent from UCT-created data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Monte Carlo tree search in Kriegspiel

Artificial Intelligence
Pure exploration in multi-armed bandits problems

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
A mean-based approach for real-time planning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Multi-agent plan adaptation using coordination patterns in team adversarial games

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Instantiating general games using prolog or dependency graphs

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Consistency modifications for automatically tuned Monte-Carlo tree search

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Intelligent agents for the game of go

IEEE Computational Intelligence Magazine
Adaptation-based programming in java

Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation
The true score of statistical paraphrase generation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Pheromones, probabilities, and multiple futures

MABS'10 Proceedings of the 11th international conference on Multi-agent-based simulation
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
On the scalability of parallel UCT

CG'10 Proceedings of the 7th international conference on Computers and games
Scalability and parallelization of Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Biasing Monte-Carlo simulations through RAVE values

CG'10 Proceedings of the 7th international conference on Computers and games
Computational experiments with the RAVE heuristic

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo simulation balancing in practice

CG'10 Proceedings of the 7th international conference on Computers and games
Score bounded Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Improving Monte-Carlo tree search in Havannah

CG'10 Proceedings of the 7th international conference on Computers and games
Node-expansion operators for the UCT algorithm

CG'10 Proceedings of the 7th international conference on Computers and games
Monte-Carlo opening books for amazons

CG'10 Proceedings of the 7th international conference on Computers and games
Enhancements for multi-player Monte-Carlo tree search

CG'10 Proceedings of the 7th international conference on Computers and games
Pure exploration in finitely-armed and continuous-armed bandits

Theoretical Computer Science
Computer poker: A review

Artificial Intelligence
Monte-Carlo tree search and rapid action value estimation in computer Go

Artificial Intelligence
Multiple tree for partially observable Monte-Carlo tree search

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Revisiting Monte-Carlo tree search on a normal form game: NoGo

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Towards procedural strategy game generation: evolving complementary unit types

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Upper confidence trees with short term partial information

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
X-Armed Bandits

The Journal of Machine Learning Research
Partial order methods for statistical model checking and simulation

FMOODS'11/FORTE'11 Proceedings of the joint 13th IFIP WG 6.1 and 30th IFIP WG 6.1 international conference on Formal techniques for distributed systems
Applying UCT to boolean satisfiability

SAT'11 Proceedings of the 14th international conference on Theory and application of satisfiability testing
Empirical evaluation of ad hoc teamwork in the pursuit domain

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Parallel Monte-Carlo tree search for HPC systems

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Incorporating variance in impact-based search

CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Monte-carlo style UCT search for boolean satisfiability

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Execution control for crowdsourcing

Proceedings of the 24th annual ACM symposium adjunct on User interface software and technology
Deviations of stochastic bandit regret

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Multi-armed bandits with episode context

Annals of Mathematics and Artificial Intelligence
The grand challenge of computer Go: Monte Carlo tree search and extensions

Communications of the ACM
Multiple overlapping tiles for contextual monte carlo tree search

EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
Adding expert knowledge and exploration in monte-carlo tree search

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Monte-Carlo tree search in settlers of catan

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Evaluation function based monte-carlo LOA

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
A study of UCT and its enhancements in an artificial game

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Creating an upper-confidence-tree program for havannah

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Plans, patterns, and move categories guiding a highly selective search

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Bandit-Based genetic programming

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Continuous upper confidence trees

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Parallel monte carlo tree search scalability discussion

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Computing approximate Nash Equilibria and robust best-responses using sampling

Journal of Artificial Intelligence Research
Towards more intelligent adaptive video game agents: a computational intelligence perspective

Proceedings of the 9th conference on Computing Frontiers
Monte-Carlo tree search for the physical travelling salesman problem

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Online planning for ad hoc autonomous agent teams

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Real-time solving of quantified CSPs based on Monte-Carlo game tree search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
A real-time opponent modeling system for rush football

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Embedding system dynamics in agent based models for complex adaptive systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Learning data transformation rules through examples: preliminary results

Proceedings of the Ninth International Workshop on Information Integration on the Web
Feature reinforcement learning in practice

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Guiding combinatorial optimization with UCT

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Strong mitigation: nesting search for good policies within search for good reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Combining human and machine intelligence in large-scale crowdsourcing

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Playing repeated Stackelberg games with unknown opponents

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Anytime algorithms for multi-agent visibility-based pursuit-evasion games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Single-player Monte-Carlo tree search for SameGame

Knowledge-Based Systems
Nested Monte-Carlo Search with simulation reduction

Knowledge-Based Systems
Dynamic randomization and domain knowledge in Monte-Carlo Tree Search for Go knowledge-based systems

Knowledge-Based Systems
UCD: Upper confidence bound for rooted directed acyclic graphs

Knowledge-Based Systems
Bitboard knowledge base system and elegant search architectures for Connect6

Knowledge-Based Systems
Genetic fuzzy markup language for game of NoGo

Knowledge-Based Systems
Monte-Carlo tree search parallelisation for computer go

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Efficient search for transformation-based inference

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Priority level planning in kriegspiel

ICEC'12 Proceedings of the 11th international conference on Entertainment Computing
Bootstrapping monte carlo tree search with an imperfect heuristic

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Searching with partial belief states in general games with incomplete information

KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Confidence bounds for statistical model checking of probabilistic hybrid systems

FORMATS'12 Proceedings of the 10th international conference on Formal Modeling and Analysis of Timed Systems
Pilot, rollout and monte carlo tree search methods for job shop scheduling

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Upper confidence tree-based consistent reactive planning application to minesweeper

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Bandit-based structure learning for bayesian network classifiers

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Modeling information exchange opportunities for effective human-computer teamwork

Artificial Intelligence
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Replanning in domains with partial information and sensing actions

Journal of Artificial Intelligence Research
Learning non-myopically from human-generated reward

Proceedings of the 2013 international conference on Intelligent user interfaces
Content recommendation on web portals

Communications of the ACM
Testing probabilistic equivalence through Reinforcement Learning

Information and Computation
Distributed Gibbs: a memory-bounded sampling-based DCOP algorithm

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Finding objects through stochastic shortest path problems

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Light at the end of the tunnel: a Monte Carlo approach to computing value of information

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Baseline: practical control variates for agent evaluation in zero-sum domains

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research
Design and parametric considerations for artificial neural network pruning in UCT game playing

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Evaluation of drivers interaction with assistant systems using criticality driven guided simulation

DHM'13 Proceedings of the 4th International conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics, and Risk Management: healthcare and safety of the environment and transport - Volume Part I
Sufficiency-based selection strategy for MCTS

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte Carlo *-minimax search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Causal belief decomposition for planning with sensing: completeness results and practical approximation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Lifelong learning for acquiring the wisdom of the crowd

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Using reinforcement learning to find an optimal set of features

Computers & Mathematics with Applications
Remarks on history and presence of game tree search and research

Information Theory, Combinatorics, and Search Theory
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
The arcade learning environment: an evaluation platform for general agents

Journal of Artificial Intelligence Research
Robustness of stochastic bandit policies

Theoretical Computer Science
BoostingTree: parallel selection of weak learners in boosting, with application to ranking

Machine Learning
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.03

Visualization

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.