Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization

Authors:
Matthew W. Hoffman;Alessandro Lazaric;Mohammad Ghavamzadeh;Rémi Munos
Affiliations:
Computer Science, University of British Columbia, Vancouver, Canada;INRIA Lille - Nord Europe, Team SequeL, France;INRIA Lille - Nord Europe, Team SequeL, France;INRIA Lille - Nord Europe, Team SequeL, France
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 6
Cited 1

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use high-dimensional feature spaces together with least-squares temporal difference learning (LSTD). Although this combination allows for very accurate approximations, it often exhibits poor prediction performance because of overfitting when the number of samples is small compared to the number of features in the approximation space. In the linear regression setting, regularization is commonly used to overcome this problem. In this paper, we review some regularized approaches to policy evaluation and we introduce a novel scheme (L 21 ) which uses ℓ2 regularization in the projection operator and an ℓ1 penalty in the fixed-point step. We show that such formulation reduces to a standard Lasso problem. As a result, any off-the-shelf solver can be used to compute its solution and standardization techniques can be applied to the data. We report experimental results showing that L 21 is effective in avoiding overfitting and that it compares favorably to existing ℓ1 regularized methods.