Stochastic simulation
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
Projected equation methods for approximate solution of large linear systems
Journal of Computational and Applied Mathematics
Error bounds for approximate value iteration
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Journal of Artificial Intelligence Research
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Hi-index | 0.00 |
In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces . This leads to two known algorithms, LSTD(λ )/LSPE(λ ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ ) [21] remains the best least-squares algorithm.