Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Atomic Decomposition by Basis Pursuit
SIAM Journal on Scientific Computing
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Constructing basis functions from directed graphs for value function approximation
Proceedings of the 24th international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
Regularization and feature selection in least-squares temporal difference learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Kernelized value function approximation for reinforcement learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Algorithms for Reinforcement Learning
Algorithms for Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Hi-index | 0.00 |
We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with ℓ1 -regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an ℓ1 -penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an ℓ1 -penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.