Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems
Eighteenth national conference on Artificial intelligence
Spoken dialogue management using probabilistic reasoning
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
A Partially Observed Markov Decision Process for Dynamic Pricing
Management Science
Point-Based Value Iteration for Continuous POMDPs
The Journal of Machine Learning Research
Partially observable Markov decision processes with imprecise parameters
Artificial Intelligence
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
Proceedings of the 25th international conference on Machine learning
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system
Connection Science - Language and Robots
Statistics and Computing
A bayesian reinforcement learning approach for customizing human-robot interfaces
Proceedings of the 14th international conference on Intelligent user interfaces
A survey of robot learning from demonstration
Robotics and Autonomous Systems
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active Learning for Reward Estimation in Inverse Reinforcement Learning
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Interactive policy learning through confidence-based autonomy
Journal of Artificial Intelligence Research
Forward search value iteration for POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Reinforcement learning in POMDPs without resets
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Inverse reinforcement learning in partially observable environments
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Sensitivity Analysis of POMDP Value Functions
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Combining manual feedback with subsequent MDP reward signals for reinforcement learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Model-based online learning of POMDPs
ECML'05 Proceedings of the 16th European conference on Machine Learning
Active learning in partially observable markov decision processes
ECML'05 Proceedings of the 16th European conference on Machine Learning
Hybrid POMDP based evolutionary adaptive framework for efficient visual tracking algorithms
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agent@?s sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agent@?s knowledge and actions that increase the agent@?s immediate reward. However, the task of specifying the POMDP@?s parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries-in which we ask an expert for the correct action-to infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.