Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Authors:
Finale Doshi-Velez;Joelle Pineau;Nicholas Roy
Affiliations:
Massachusetts Institute of Technology, Cambridge, USA;McGill University, Montreal, Canada;Massachusetts Institute of Technology, Cambridge, USA
Venue:
Artificial Intelligence
Year:
2012

Citing 29
Cited 1

Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
A Partially Observed Markov Decision Process for Dynamic Pricing

Management Science
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

Proceedings of the 25th international conference on Machine learning
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system

Connection Science - Language and Robots
A tutorial on adaptive MCMC

Statistics and Computing
A bayesian reinforcement learning approach for customizing human-robot interfaces

Proceedings of the 14th international conference on Intelligent user interfaces
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active Learning for Reward Estimation in Inverse Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Interactive policy learning through confidence-based autonomy

Journal of Artificial Intelligence Research
Forward search value iteration for POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Reinforcement learning in POMDPs without resets

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Inverse reinforcement learning in partially observable environments

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Sensitivity Analysis of POMDP Value Functions

ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Combining manual feedback with subsequent MDP reward signals for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Model-based online learning of POMDPs

ECML'05 Proceedings of the 16th European conference on Machine Learning
Active learning in partially observable markov decision processes

ECML'05 Proceedings of the 16th European conference on Machine Learning

Hybrid POMDP based evolutionary adaptive framework for efficient visual tracking algorithms

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agent@?s sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agent@?s knowledge and actions that increase the agent@?s immediate reward. However, the task of specifying the POMDP@?s parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries-in which we ask an expert for the correct action-to infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.