Reducing reinforcement learning to KWIK online regression

Authors:
Lihong Li;Michael L. Littman
Affiliations:
Yahoo! Research, Santa Clara, USA 95054;Rutgers Laboratory for Real-Life Reinforcement Learning (RL3), Department of Computer Science, Rutgers University, Piscataway, USA 08854
Venue:
Annals of Mathematics and Artificial Intelligence
Year:
2010

Citing 27
Cited 0

The complexity of dynamic programming

Journal of Complexity
Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
An introduction to computational learning theory

An introduction to computational learning theory
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Relating reinforcement learning performance to classification performance

ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games

Prediction, Learning, and Games
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Knows what it knows: a framework for self-aware learning

Proceedings of the 25th international conference on Machine learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Model-based exploration in continuous state spaces

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A unifying framework for computational reinforcement learning theory

A unifying framework for computational reinforcement learning theory
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.