A POMDP approximation algorithm that anticipates the need to observe

  • Authors:
  • Valentina Bayer Zubek;Thomas Dietterich

  • Affiliations:
  • Department of Computer Science, Oregon State University, Corvallis, OR;Department of Computer Science, Oregon State University, Corvallis, OR

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V2MQP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.