A Generalization Error for Q-Learning

  • Authors:
  • Susan A. Murphy

  • Affiliations:
  • -

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.