Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Machine Learning
Solving very large weakly coupled Markov decision processes
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Tractable inference for complex stochastic processes
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
Regularized Fitted Q-Iteration: Application to Planning
Recent Advances in Reinforcement Learning
Using Conditional Random Fields for Decision-Theoretic Planning
MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
The Knowledge Engineering Review
Efficient selectivity and backup operators in Monte-Carlo tree search
CG'06 Proceedings of the 5th international conference on Computers and games
Amsaa: a multistep anticipatory algorithm for online stochastic combinatorial optimization
CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
Backpropagation modification in Monte-Carlo game tree search
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Pheromones, probabilities, and multiple futures
MABS'10 Proceedings of the 11th international conference on Multi-agent-based simulation
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Fast planning in stochastic games
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Monte-Carlo optimizations for resource allocation problems in stochastic network systems
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Lossy stochastic game abstraction with bounds
Proceedings of the 13th ACM Conference on Electronic Commerce
Light at the end of the tunnel: a Monte Carlo approach to computing value of information
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even infinite state spaces, traditional planning and reinforcement learning algorithms are often inapplicable, since their running time typically scales linearly with the state space size. In this paper we present a new algorithm that, given only a generative model (simulator) for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor 7 and the desired degree of approximation to the optimal policy), our results establish for the first time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN99].