Towards finite-sample convergence of direct reinforcement learning

Authors:
Shiau Hong Lim;Gerald DeJong
Affiliations:
Dept. of Computer Science, University of Illinois, Urbana-Champaign;Dept. of Computer Science, University of Illinois, Urbana-Champaign
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 8
Cited 0

The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning in embedded systems

Learning in embedded systems
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Learning Rates for Q-learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.