Hybrid least-squares algorithms for approximate policy evaluation

Authors:
Jeff Johns;Marek Petrik;Sridhar Mahadevan
Affiliations:
Department of Computer Science, University of Massachusetts Amherst, Amherst, USA 01003;Department of Computer Science, University of Massachusetts Amherst, Amherst, USA 01003;Department of Computer Science, University of Massachusetts Amherst, Amherst, USA 01003
Venue:
Machine Learning
Year:
2009

Citing 10
Cited 4

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-Squares Methods in Reinforcement Learning for Control

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Least-squares policy iteration

The Journal of Machine Learning Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
A worst-case comparison between temporal difference and residual gradient with linear function approximation

Proceedings of the 25th international conference on Machine learning

Guest editors' introduction: special issue of selected papers from ECML PKDD 2009

Data Mining and Knowledge Discovery
Guest editors' introduction: Special Issue from ECML PKDD 2009

Machine Learning
Hybrid Least-Squares Algorithms for Approximate Policy Evaluation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of approximate policy evaluation is to "best" represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better performing policies whereas the Bellman residual algorithm exhibits more stable behavior between rounds of policy iteration. We propose two hybrid least-squares algorithms to try to combine the advantages of these algorithms. We provide an analytical and geometric interpretation of hybrid algorithms and demonstrate their utility on a simple problem. Experimental results on both small and large domains suggest hybrid algorithms may find solutions that lead to better policies when performing policy iteration.