ℓ1-Penalized projected bellman residual

  • Authors:
  • Matthieu Geist;Bruno Scherrer

  • Affiliations:
  • Supélec, IMS Research Group, Metz, France;INRIA, MAIA Project-Team, Nancy, France

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with ℓ1 -regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an ℓ1 -penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an ℓ1 -penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.