ℓ1-Penalized projected bellman residual

Authors:
Matthieu Geist;Bruno Scherrer
Affiliations:
Supélec, IMS Research Group, Metz, France;INRIA, MAIA Project-Team, Nancy, France
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 13
Cited 1

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Constructing basis functions from directed graphs for value function approximation

Proceedings of the 24th international conference on Machine learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Kernelized value function approximation for reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with ℓ1 -regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an ℓ1 -penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an ℓ1 -penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.