The complexity of Markov decision processes
Mathematics of Operations Research
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Reinforcement learning in POMDPs without resets
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Reinforcement Learning with the Use of Costly Features
Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.