Least-squares policy iteration

Authors:
Michail G. Lagoudakis;Ronald Parr
Affiliations:
Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 16
Cited 131

Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Kernel-Based Reinforcement Learning

Machine Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement Learning Applied to Linear Quadratic Regulation

Advances in Neural Information Processing Systems 5, [NIPS Conference]
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
An approach to fuzzy control of nonlinear systems: stability and design issues

IEEE Transactions on Fuzzy Systems

Proto-value functions: developmental reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Fast direct policy evaluation using multiscale analysis of Markov diffusion processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Kernel rewards regression: an information efficient batch policy iteration approach

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
A robust Markov game controller for nonlinear systems

Applied Soft Computing
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
Constructing basis functions from directed graphs for value function approximation

Proceedings of the 24th international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
On the use of hybrid reinforcement learning for autonomic resource allocation

Cluster Computing
Batch reinforcement learning in a complex domain

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
IFSA: incremental feature-set augmentation for reinforcement learning tasks

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning

International Journal of Robotics Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
A worst-case comparison between temporal difference and residual gradient with linear function approximation

Proceedings of the 25th international conference on Machine learning
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Proceedings of the 25th international conference on Machine learning
A semiparametric statistical approach to model-free policy evaluation

Proceedings of the 25th international conference on Machine learning
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Rollout sampling approximate policy iteration

Machine Learning
Sigma point policy iteration

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Transfer of task representation in reinforcement learning using policy-based proto-value functions

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Geodesic Gaussian kernels for value function approximation

Autonomous Robots
Reinforcement Learning in Nonstationary Environment Navigation Tasks

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Policy Iteration for Learning an Exercise Policy for American Options

Recent Advances in Reinforcement Learning
Learning While Optimizing an Unknown Fitness Surface

Learning and Intelligent Optimization
Factored value iteration converges

Acta Cybernetica
Letters: On the bias of batch Bellman residual minimisation

Neurocomputing
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Binary action search for learning continuous-action control policies

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Constraint relaxation in approximate linear programs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
An empirical analysis of value function-based and policy search reinforcement learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A task specification language for bootstrap learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Reinforcement Learning Control of a Real Mobile Robot Using Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Least Squares SVM for Least Squares TD Learning

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Multi-Agent Least-Squares Policy Iteration

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A Simulation-based Approach for Solving Generalized Semi-Markov Decision Processes

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Reinforcement Learning and Reactive Search: an adaptive MAX-SAT solver

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Reinforcement learning for robot soccer

Autonomous Robots
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Learning representation and control in continuous Markov decision processes

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Sample-efficient evolutionary function approximation for reinforcement learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hybrid least-squares algorithms for approximate policy evaluation

Machine Learning
Compositional Models for Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Compact character controllers

ACM SIGGRAPH Asia 2009 papers
Samuel meets Amarel: automating value function approximation using global state space analysis

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Error bounds for approximate value iteration

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Compact spectral bases for value function approximation using Kronecker factorization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Towards faster planning with continuous resources in stochastic domains

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Adaptive importance sampling with automatic model selection in value function approximation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Closed-loop learning of visual control policies

Journal of Artificial Intelligence Research
Learning and multiagent reasoning for autonomous agents

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A fast analytical algorithm for solving Markov decision processes with real-valued resources

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Predictive projections

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
An Additive Reinforcement Learning

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

ACC'09 Proceedings of the 2009 conference on American Control Conference
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Fuzzy decision tree function approximation in reinforcement learning

International Journal of Artificial Intelligence and Soft Computing
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Approximate dynamic programming with a fuzzy parameterization

Automatica (Journal of IFAC)
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
On the characteristics of sequential decision problems and their impact on evolutionary computation and reinforcement learning

EA'09 Proceedings of the 9th international conference on Artificial evolution
Feature selection for reinforcement learning: evaluating implicit state-reward dependency via conditional mutual information

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Incorporating domain models into Bayesian optimization for RL

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Revisiting natural actor-critics with value function approximation

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Sparse approximate dynamic programming for dialog management

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Kalman temporal differences

Journal of Artificial Intelligence Research
Empowerment for continuous agent-environment systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Sample-efficient batch reinforcement learning for dialogue management optimization

ACM Transactions on Speech and Language Processing (TSLP)
Optimization of heuristic search using recursive algorithm selection and reinforcement learning

Annals of Mathematics and Artificial Intelligence
Adaptive kernel-width selection for kernel-based least-squares policy iteration algorithm

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
User to user QoE routing system

WWIC'11 Proceedings of the 9th IFIP TC 6 international conference on Wired/wireless internet communications
The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Robust Approximate Bilinear Programming for Value Function Approximation

The Journal of Machine Learning Research
Approximate policy iteration for closed-loop learning of visual tasks

ECML'06 Proceedings of the 17th European conference on Machine Learning
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Analysis and improvement of policy gradient estimation

Neural Networks
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Plan b: uncertainty/time trade-offs for linear and integer programming

CPAIOR'06 Proceedings of the Third international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Reinforcement Programming

Computational Intelligence
Monte-Carlo swarm policy search

SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation
Q-error as a selection mechanism in modular reinforcement-learning systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Reinforcement learning with a bilinear q function

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Regularized least squares temporal difference learning with nested ℓ2 and ℓ1 penalization

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Batch, off-policy and model-free apprenticeship learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce for parallel reinforcement learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Reinforcement learning transfer via sparse coding

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A novel feature sparsification method for kernel-based approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
A rapid sparsification method for kernel machines in approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An online kernel-based clustering approach for value function approximation

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
APRIL: active preference learning-based reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Policy iteration based on a learned transition model

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Adaptive reservoir computing through evolution and learning

Neurocomputing
Reinforcement learning transfer using a sparse coded inter-task mapping

EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
Using approximate dynamic programming to optimize admission control in cloud computing environment

Proceedings of the Winter Simulation Conference
Modular value iteration through regional decomposition

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
A hierarchical representation policy iteration algorithm for reinforcement learning

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Performance bounds for λ policy iteration and application to the game of Tetris

The Journal of Machine Learning Research
Finite-sample analysis of least-squares policy iteration

The Journal of Machine Learning Research
Dynamic policy programming

The Journal of Machine Learning Research
A reinforcement learning approach to autonomous decision-making in smart electricity markets

Machine Learning
Probabilistic planning for continuous dynamic systems under bounded risk

Journal of Artificial Intelligence Research
Reward shaping for statistical optimisation of dialogue management

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Reinforcement learning-based design of sampling policies under cost constraints in Markov random fields: Application to weed map reconstruction

Computational Statistics & Data Analysis
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Construction of approximation spaces for reinforcement learning

The Journal of Machine Learning Research
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems
Policy oscillation is overshooting

Neural Networks
A reinforcement learning based solution for cognitive network cooperation between co-located, heterogeneous wireless sensor networks

Ad Hoc Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares temporal-difference learning algorithm (LSTD) for prediction problems, which is known for its efficient use of sample experiences compared to pure temporal-difference algorithms. Heretofore, LSTD has not had a straightforward application to control problems mainly because LSTD learns the state value function of a fixed policy which cannot be used for action selection and control without a model of the underlying process. Our new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework. LSPI is a model-free, off-policy method which can use efficiently (and reuse in each iteration) sample experiences collected in any manner. By separating the sample collection method, the choice of the linear approximation architecture, and the solution method, LSPI allows for focused attention on the distinct elements that contribute to practical reinforcement learning. LSPI is tested on the simple task of balancing an inverted pendulum and the harder task of balancing and riding a bicycle to a target location. In both cases, LSPI learns to control the pendulum or the bicycle by merely observing a relatively small number of trials where actions are selected randomly. LSPI is also compared against Q-learning (both with and without experience replay) using the same value function architecture. While LSPI achieves good performance fairly consistently on the difficult bicycle task, Q-learning variants were rarely able to balance for more than a small fraction of the time needed to reach the target location.