Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
The Sample Average Approximation Method for Stochastic Discrete Optimization
SIAM Journal on Optimization
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Polynomial-time reinforcement learning of near-optimal policies
Eighteenth national conference on Artificial intelligence
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Operations Research
A Generalization Error for Q-Learning
The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)
Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)
Bias and Variance Approximation in Value Function Estimates
Management Science
Simulation-based Uniform Value Function Estimates of Markov Decision Processes
SIAM Journal on Control and Optimization
The Journal of Machine Learning Research
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models
Mathematics of Operations Research
Regret in the Newsvendor Model with Partial Information
Operations Research
Neural Network Learning: Theoretical Foundations
Neural Network Learning: Theoretical Foundations
PEGASUS: a policy search method for large MDPs and POMDPs
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Inventory management under highly uncertain demand
Operations Research Letters
On complexity of multistage stochastic programs
Operations Research Letters
Hi-index | 0.00 |
We consider Markov decision processes with unknown transition probabilities and unknown single-period expected cost functions, and we study a method for estimating these quantities from historical or simulated data. The method requires knowledge of the system equations that govern state transitions as well as the single-period cost functions but not the single-period expected cost functions. The estimation procedure is based upon taking expectations with respect to the empirical distribution functions of such data. Once the estimates are in place, the method computes a policy by solving the obtained “empirical” Markov decision process as if the estimates were correct. For MDPs that satisfy some conditions, we provide explicit, easily computed expressions for the probability that the procedure will produce a policy whose true expected cost is within any specified absolute distance of the actual optimal expected cost of the true Markov decision process. We also provide expressions for the number of historical or simulated data values that is sufficient for the procedure to produce a policy whose true expected cost is, with a prescribed probability, within a prescribed absolute distance of the actual optimal expected cost of the true Markov decision process. We apply our results to multiperiod inventory models. In addition, we provide a specialized analysis of such inventory models that also yields relative, rather than absolute, accuracy guarantees. We make comparisons with related results that have recently appeared, and we provide numerical examples.