Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales
Systems & Control Letters
Machine Learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Q(λ) is a reinforcement learning algorithm that combines Q-learning and TD(λ). Online implementations of Q(λ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins' Q(λ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q(λ) that appears to be a gain matrix variant of basic Q-learning. Using the ODE method, we then determine an optimal gain matrix for Q-learning that maximizes its rate of convergence toward the optimal value function Q*. The similarity between this optimal gain and the asymptotic gain of Q(λ) explains the relative efficiency of the latter for (λ) 0. Furthermore, by minimizing the difference between these two gains, optimal values for the λ parameter and the decreasing learning rates can be determined. This optimal λ strongly depends on the exploration policy during learning. A robust approximation of these learning parameters leads to the definition of a new efficient algorithm called AQ-learning (Average Q-learning), that shows a close resemblance to Schwartz' R-learning. Our results have been demonstrated through numerical simulations.