A sparse sampling algorithm for near-optimal planning in large Markov decision processes

Authors:
Michael Kearns;Yishay Mansour;Andrew Y. Ng
Affiliations:
AT&T Labs;AT&T Labs and Tel-Aviv University;UC Berkeley
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 6
Cited 17

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Tractable inference for complex stochastic processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Regularized Fitted Q-Iteration: Application to Planning

Recent Advances in Reinforcement Learning
Using Conditional Random Fields for Decision-Theoretic Planning

MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
Review:

The Knowledge Engineering Review
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Amsaa: a multistep anticipatory algorithm for online stochastic combinatorial optimization

CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
Backpropagation modification in Monte-Carlo game tree search

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Pheromones, probabilities, and multiple futures

MABS'10 Proceedings of the 11th international conference on Multi-agent-based simulation
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Fast planning in stochastic games

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Monte-Carlo optimizations for resource allocation problems in stochastic network systems

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Lossy stochastic game abstraction with bounds

Proceedings of the 13th ACM Conference on Electronic Commerce
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Light at the end of the tunnel: a Monte Carlo approach to computing value of information

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Monte Carlo *-minimax search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even infinite state spaces, traditional planning and reinforcement learning algorithms are often inapplicable, since their running time typically scales linearly with the state space size. In this paper we present a new algorithm that, given only a generative model (simulator) for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor 7 and the desired degree of approximation to the optimal policy), our results establish for the first time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN99].