Model-based online learning of POMDPs

Authors:
Guy Shani;Ronen I. Brafman;Solomon E. Shimony
Affiliations:
Ben-Gurion University, Beer-Sheva, Israel;Ben-Gurion University, Beer-Sheva, Israel;Ben-Gurion University, Beer-Sheva, Israel
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 6
Cited 6

Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
State-aggregation algorithms for learning probabilistic models for robot control

State-aggregation algorithms for learning probabilistic models for robot control
Utile distinction hidden Markov models

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Improving approximate value iteration using memories and predictive state representations

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Improving anytime point-based value iteration using principled point selections

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Neural Processing Letters
Closing the learning-planning loop with predictive state representations

International Journal of Robotics Research
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Recognizing internal states of other agents to anticipate and coordinate interactions

EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods — methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach — learning a POMDP model of the world, and computing an optimal policy for the learned model — may generate superior results in the presence of sensor noise, but learning and solving a model of the environment is a difficult problem. We have previously shown how such a model can be obtained from the learned policy of model-free methods, but this approach implies a distinction between a learning phase and an acting phase that is undesirable. In this paper we present a novel method for learning a POMDP model online, based on McCallums' Utile Suffix Memory (USM), in conjunction with an approximate policy obtained using an incremental POMDP solver. We show that the incrementally improving policy provides superior results to the original USM algorithm, especially in the presence of increasing sensor and action noise.