A POMDP approximation algorithm that anticipates the need to observe

Authors:
Valentina Bayer Zubek;Thomas Dietterich
Affiliations:
Department of Computer Science, Oregon State University, Corvallis, OR;Department of Computer Science, Oregon State University, Corvallis, OR
Venue:
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Year:
2000

Citing 6
Cited 4

Cost-effective sensing during plan execution

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A POMDP Approximation Algorithm that Anticipates the Need to Observe

A POMDP Approximation Algorithm that Anticipates the Need to Observe
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Reinforcement Learning with the Use of Costly Features

Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V2MQP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.