Online exploration in least-squares policy iteration

Authors:
Lihong Li;Michael L. Littman;Christopher R. Mansley
Affiliations:
Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ
Venue:
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2009

Citing 15
Cited 5

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning

Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning algorithm, least-squares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules in several benchmark problems.