Learning in embedded systems
Temporal difference learning and TD-Gammon
Communications of the ACM
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Reinforcement Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
Perseus: randomized point-based value iteration for POMDPs
Journal of Artificial Intelligence Research
A decision-theoretic approach to task assistance for persons with dementia
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Point-Based Value Iteration for Continuous POMDPs
The Journal of Machine Learning Research
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
Proceedings of the 25th international conference on Machine learning
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system
Connection Science - Language and Robots
Reinforcement Learning with the Use of Costly Features
Recent Advances in Reinforcement Learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Bayesian reinforcement learning in continuous pomdps with Gaussian processes
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Provably Efficient Learning with Typed Parametric Models
The Journal of Machine Learning Research
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
Posterior weighted reinforcement learning with state uncertainty
Neural Computation
Computer Vision and Image Understanding
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
PAC-MDP learning with knowledge-based admissible models
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Smarter sampling in model-based Bayesian reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Exploration in relational worlds
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression
Annals of Mathematics and Artificial Intelligence
Representing uncertainty about complex user goals in statistical dialogue systems
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Preference elicitation and inverse reinforcement learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Robust bayesian reinforcement learning through tight lower bounds
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Planning and evaluating multiagent influences under reward uncertainty
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Bayes-optimal reinforcement learning for discrete uncertainty domains
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Exploration in relational domains for model-based reinforcement learning
The Journal of Machine Learning Research
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Prior-free exploration bonus for and beyond near bayes-optimal behavior
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Linear Bayesian reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.