Regularization and feature selection in least-squares temporal difference learning

Authors:
J. Zico Kolter;Andrew Y. Ng
Affiliations:
Stanford University, CA;Stanford University, CA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 14
Cited 12

Topics in matrix analysis

Topics in matrix analysis
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Least Squares SVM for Least Squares TD Learning

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
On information regularization

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Kalman temporal differences

Journal of Artificial Intelligence Research
Instance-based reinforcement learning technique with a meta-learning mechanism for robust multi-robot systems

TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
Robot learning from demonstration by constructing skill trees

International Journal of Robotics Research
Automatic state abstraction from demonstration

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Automatic task decomposition and state abstraction from demonstration

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
A reinforcement learning approach to autonomous decision-making in smart electricity markets

Machine Learning
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.