Technical Note: \cal Q-Learning
Machine Learning
The asymptotic convergence-rate of Q-learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
On the existence of fixed points for approximate value iteration and temporal-difference learning
Journal of Optimization Theory and Applications
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
SIAM Journal on Control and Optimization
Least-squares policy iteration
The Journal of Machine Learning Research
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
Prediction, Learning, and Games
Prediction, Learning, and Games
Neurocomputing
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Regularized Fitted Q-Iteration: Application to Planning
Recent Advances in Reinforcement Learning
Model-free reinforcement learning as mixture learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Error bounds for approximate value iteration
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Algorithms for Reinforcement Learning
Algorithms for Reinforcement Learning
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Complexity analysis of real-time reinforcement learning
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Hi-index | 0.00 |
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic l∞-norm performance-loss bounds in the presence of approximation/ estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods.