Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

  • Authors:
  • Satinder Singh;Tommi Jaakkola;Michael L. Littman;Csaba Szepesvári

  • Affiliations:
  • AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932, USA. baveja@research.att.com;Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. tommi@ai.mit.edu;Department of Computer Science, Duke University, Durham, NC 27708-0129, USA. mlittman@cs.duke.edu;Mindmaker Ltd., Konkoly Thege M. u. 29-33, Budapest 1121, Hungary. szepes@mindmaker.hu

  • Venue:
  • Machine Learning
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

An important application of reinforcement learning(RL) is to finite-state control problems and one of the mostdifficult problems in learning for control is balancing theexploration/exploitation tradeoff. Existing theoretical results forRL give very little guidance on reasonable ways to performexploration. In this paper, we examine the convergence ofsingle-step on-policy RL algorithms for control. On-policyalgorithms cannot separate exploration from learning and thereforemust confront the exploration problem directly. We prove convergenceresults for several related on-policy algorithms with both decayingexploration and persistent exploration. We also provide examples ofexploration strategies that can be followed during learning thatresult in convergence to both optimal values and optimal policies.