The complexity of dynamic programming
Journal of Complexity
Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
An introduction to computational learning theory
An introduction to computational learning theory
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Least-squares policy iteration
The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Relating reinforcement learning performance to classification performance
ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games
Prediction, Learning, and Games
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
PAC model-free reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Knows what it knows: a framework for self-aware learning
Proceedings of the 25th international conference on Machine learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Journal of Artificial Intelligence Research
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
Model-based exploration in continuous state spaces
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A unifying framework for computational reinforcement learning theory
A unifying framework for computational reinforcement learning theory
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Hi-index | 0.00 |
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.