A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs

  • Authors:
  • Roy Fox;Moshe Tennenholtz

  • Affiliations:
  • Computer Science Department, Technion IIT, Israel;Faculty of Industrial Engineering and Management, Technion IIT, Israel

  • Venue:
  • AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.