Reinforcement learning with selective perception and hidden state
Reinforcement learning with selective perception and hidden state
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction
Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
A bayesian reinforcement learning approach for customizing human-robot interfaces
Proceedings of the 14th international conference on Intelligent user interfaces
Robust adaptive Markov decision processes in multi-vehicle applications
ACC'09 Proceedings of the 2009 conference on American Control Conference
Robot self-initiative and personalization by learning through repeated interactions
Proceedings of the 6th international conference on Human-robot interaction
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
The Journal of Machine Learning Research
Lifelong learning for acquiring the wisdom of the crowd
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.