Recursive least-squares learning with eligibility traces

Authors:
Bruno Scherrer;Matthieu Geist
Affiliations:
INRIA, MAIA Project-Team, Nancy, France;Supélec, IMS Research Group, Metz, France
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 13
Cited 0

Stochastic simulation

Stochastic simulation
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Projected equation methods for approximate solution of large linear systems

Journal of Computational and Applied Mathematics
Error bounds for approximate value iteration

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Kalman temporal differences

Journal of Artificial Intelligence Research
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces . This leads to two known algorithms, LSTD(λ )/LSPE(λ ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ ) [21] remains the best least-squares algorithm.