A Generalization Error for Q-Learning

Authors:
Susan A. Murphy
Affiliations:
-
Venue:
The Journal of Machine Learning Research
Year:
2005

Citing 0
Cited 5

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Reinforcement learning with a bilinear q function

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

The Journal of Supercomputing
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.