An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Authors:
Hyeong Soo Chang;Michael C. Fu;Jiaqiao Hu;Steven I. Marcus
Affiliations:
-;-;-;-
Venue:
Operations Research
Year:
2005

Citing 0
Cited 8

Computing and using lower and upper bounds for action elimination in MDP planning

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Sampled fictitious play for approximate dynamic programming

Computers and Operations Research
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research
Monte Carlo *-minimax search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (lnN)/ N, whereN is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm isO(( |A|N) H ), independent of the size of the state space, where | A| is the size of the action space andH is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.