Active learning in partially observable markov decision processes

Authors:
Robin Jaulmes;Joelle Pineau;Doina Precup
Affiliations:
School of Computer Science, McGill University, Montreal, QC;School of Computer Science, McGill University, Montreal, QC;School of Computer Science, McGill University, Montreal, QC
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 4
Cited 8

Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
A Reinforcement Learning approach to evaluating state representations in spoken dialogue systems

Speech Communication
A bayesian reinforcement learning approach for customizing human-robot interfaces

Proceedings of the 14th international conference on Intelligent user interfaces
Robust adaptive Markov decision processes in multi-vehicle applications

ACC'09 Proceedings of the 2009 conference on American Control Conference
Robot self-initiative and personalization by learning through repeated interactions

Proceedings of the 6th international conference on Human-robot interaction
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Lifelong learning for acquiring the wisdom of the crowd

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.