Safe Q-Learning on Complete History Spaces

Authors:
Stephan Timmer;Martin Riedmiller
Affiliations:
Neuroinformatics Group, University of Osnabrueck, Germany;Neuroinformatics Group, University of Osnabrueck, Germany
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 3
Cited 0

Diversity-based inference of finite automata

Journal of the ACM (JACM)
Efficient dynamic-programming updates in partially observable Markov decision processes

Efficient dynamic-programming updates in partially observable Markov decision processes
Looping suffix tree-based inference of partially observable hidden state

ICML '06 Proceedings of the 23rd international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm.