A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs

Authors:
Roy Fox;Moshe Tennenholtz
Affiliations:
Computer Science Department, Technion IIT, Israel;Faculty of Industrial Engineering and Management, Technion IIT, Israel
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Year:
2007

Citing 4
Cited 2

The complexity of Markov decision processes

Mathematics of Operations Research
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Reinforcement learning in POMDPs without resets

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Reinforcement Learning with the Use of Costly Features

Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.