Simulation and the Monte Carlo Method
Simulation and the Monte Carlo Method
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Machine Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Online Choice of Active Learning Algorithms
The Journal of Machine Learning Research
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Building portable options: skill transfer in reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
R-IAC: Robust Intrinsically Motivated Exploration and Active Learning
IEEE Transactions on Autonomous Mental Development
Strong mitigation: nesting search for good policies within search for good reward
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Hi-index | 0.00 |
Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called LEO for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select exploration strategies based on the rewards received when following them. We show empirically that this method performs well across a set of five domains. In contrast, for a given algorithm, no set of parameters is best across all domains. Our results demonstrate that the LEO algorithm successfully learns the best exploration strategies on-line, increasing the received reward over static parameterizations of exploration and reducing the need for hand-tuning exploration parameters.