An efficient L2-norm regularized least-squares temporal difference learning algorithm

Authors:
Shenglei Chen;Geng Chen;Ruijun Gu
Affiliations:
-;-;-
Venue:
Knowledge-Based Systems
Year:
2013

Citing 13
Cited 0

Temporal difference learning and TD-Gammon

Communications of the ACM
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems

Knowledge-Based Systems
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Least Squares SVM for Least Squares TD Learning

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Reinforcement learning of local shape in the game of go

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning, when samples are limited in some real applications, Least-Squares Temporal Difference (LSTD) learning is prone to over-fitting, which can be overcome by the introduction of regularization. However, the solution of LSTD with regularization still depends on costly matrix inversion operations. In this paper we investigate the L2-norm regularized LSTD learning and propose an efficient algorithm to avoid expensive computational cost. We derive LSTD using Bellman operator along with projection operator. The L2-norm penalty is introduced to avoid over-fitting. We also describe the difference between Bellman residual minimization and LSTD. Then we propose an efficient recursive least-squares algorithm for L2-norm regularized LSTD, which can eliminate matrix inversion operations and decrease computational complexity effectively. We present empirical comparisons on the Boyan chain problem. The results show that the performance of the new algorithm is better than that of regularized LSTD.