Proceedings of the seventh international conference (1990) on Machine learning
Artificial Intelligence Review - Special issue on lazy learning
Machine Learning
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Least-squares policy iteration
The Journal of Machine Learning Research
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations, which combine a sparse model with a model-free Q-value function constructed from samples not used by the model. We prove that, unlike regular sparse model-based methods, best-match learning is guaranteed to converge to the optimal Q-values in the tabular case. Empirical results demonstrate that best-match learning can substantially outperform regular sparse model-based methods, as well as several model-free methods that strive to improve the sample efficiency of temporal-difference methods. In addition, we demonstrate that best-match learning can be successfully combined with function approximation.