Cost-effective sensing during plan execution
AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Neuro-Dynamic Programming
A POMDP Approximation Algorithm that Anticipates the Need to Observe
A POMDP Approximation Algorithm that Anticipates the Need to Observe
Approximating optimal policies for partially observable stochastic domains
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Reinforcement Learning with the Use of Costly Features
Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback
Autonomous Agents and Multi-Agent Systems
Speeding up the convergence of value iteration in partially observable Markov decision processes
Journal of Artificial Intelligence Research
Restricted value iteration: theory and algorithms
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V2MQP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.