Preconditioned temporal difference learning

Authors:
Hengshuai Yao;Zhi-Qiang Liu
Affiliations:
City University of Hong Kong, Hong Kong, China;City University of Hong Kong, Hong Kong, China
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 10
Cited 0

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Memory Approaches to Reinforcement Learning in Non-Markovian Domains

Memory Approaches to Reinforcement Learning in Non-Markovian Domains
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper also studies three significant issues of the new framework: it presents a new rule of step-size that can be computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning.