An analytic solution to discrete Bayesian reinforcement learning

Authors:
Pascal Poupart;Nikos Vlassis;Jesse Hoey;Kevin Regan
Affiliations:
University of Waterloo, Waterloo, Ontario, Canada;University of Amsterdam, Amsterdam, The Netherlands;University of Toronto, Toronto, Ontario, Canada;University of Waterloo, Waterloo, Ontario, Canada
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 11
Cited 35

Learning in embedded systems

Learning in embedded systems
Temporal difference learning and TD-Gammon

Communications of the ACM
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Reinforcement Learning

Reinforcement Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

Proceedings of the 25th international conference on Machine learning
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system

Connection Science - Language and Robots
Reinforcement Learning with the Use of Costly Features

Recent Advances in Reinforcement Learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Bayesian reinforcement learning in continuous pomdps with Gaussian processes

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Posterior weighted reinforcement learning with state uncertainty

Neural Computation
Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process

Computer Vision and Image Understanding
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
PAC-MDP learning with knowledge-based admissible models

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Smarter sampling in model-based Bayesian reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Exploration in relational worlds

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Representing uncertainty about complex user goals in statistical dialogue systems

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Preference elicitation and inverse reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Active learning of MDP models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Planning and evaluating multiagent influences under reward uncertainty

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Bayes-optimal reinforcement learning for discrete uncertainty domains

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research
A general framework for interacting bayes-optimally with self-interested agents using arbitrary parametric model and model prior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Prior-free exploration bonus for and beyond near bayes-optimal behavior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.