A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Authors:
Michael Kearns;Yishay Mansour;Andrew Y. Ng
Affiliations:
Department of Computer and Information Science, University of Pennsylvania, Moore School Building, 200 South 33rd Street, Philadelphia, PA 19104-6389, USA. mkearns@cis.upenn.edu;Department of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel. mansour@math.tau.ac.il;Department of Computer Science, University of Berkeley, Berkeley, CA 94704, USA. ang@cs.berkeley.edu
Venue:
Machine Learning
Year:
2002

Citing 12
Cited 35

Real-time heuristic search

Artificial Intelligence
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Reinforcement Learning

Reinforcement Learning
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A robust and fast action selection mechanism for planning

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Tractable inference for complex stochastic processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes

Discrete Event Dynamic Systems
Reinforcement learning for active model selection

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
APPSSAT: Approximate probabilistic planning using stochastic satisfiability

International Journal of Approximate Reasoning
Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees

Recent Advances in Reinforcement Learning
Optimistic Planning of Deterministic Systems

Recent Advances in Reinforcement Learning
Approximate inference for planning in stochastic relational worlds

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Action selection in Bayesian reinforcement learning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Learning the Difference between Partially Observable Dynamical Systems

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Learning planning rules in noisy stochastic worlds

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Thresholded rewards: acting optimally in timed, zero-sum games

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
On polynomial sized MDP succinct policies

Journal of Artificial Intelligence Research
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Learning symbolic models of stochastic domains

Journal of Artificial Intelligence Research
Monte Carlo sampling methods for approximating interactive POMDPs

Journal of Artificial Intelligence Research
Online learning and exploiting relational models in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The value of observation for monitoring dynamic systems

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
UCT for tactical assault planning in real-time strategy games

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A survey of collaborative filtering techniques

Advances in Artificial Intelligence
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Exploring continuous action spaces with diffusion trees for reinforcement learning

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Systematic improvement of Monte-Carlo tree search with self-generated neural-networks controllers

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
Towards proactive event-driven computing

Proceedings of the 5th ACM international conference on Distributed event-based system
Efficient planning under uncertainty with macro-actions

Journal of Artificial Intelligence Research
Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
APPSSAT: approximate probabilistic planning using stochastic satisfiability

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Admission control policies for a multi-class QoS-aware service oriented architecture

ACM SIGMETRICS Performance Evaluation Review
Approximate planning and verification for large markov decision processes

Proceedings of the 27th Annual ACM Symposium on Applied Computing
When do differences matter? On-line feature extraction through cognitive economy

Cognitive Systems Research
Bootstrapping monte carlo tree search with an imperfect heuristic

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research
Testing probabilistic equivalence through Reinforcement Learning

Information and Computation
Robotics and artificial intelligence: A perspective on deliberation functions

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.00

Visualization

Abstract

A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy). Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter.Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng. Neural information processing systems 13, to appear).