Preconditioned temporal difference learning

  • Authors:
  • Hengshuai Yao;Zhi-Qiang Liu

  • Affiliations:
  • City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China

  • Venue:
  • Proceedings of the 25th international conference on Machine learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper also studies three significant issues of the new framework: it presents a new rule of step-size that can be computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning.