Probabilistic planning for robotic exploration
Probabilistic planning for robotic exploration
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes
Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs
Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Forward search value iteration for POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hi-index | 0.00 |
Many planning tasks of autonomous robots can be modeled as partially observable Markov decision process (POMDP) problems. Point-based algorithms are well-known algorithms for solving large-scale POMDP problems. Several leading point-based algorithms eschew some flawed but very useful heuristics to find an ε-optimal policy. This paper aims at exploiting these avoided heuristics by a simple framework. The main idea of this framework is to construct a greedy strategy and combine it with the leading algorithms. We present an implementation to verify the framework's validity. The greedy strategy in this implementation stems from some common ignored heuristics in three leading algorithms, and therefore can be well combined with them. Experimental results show that the combined algorithms are more efficient than the original algorithms. On some benchmark problems, the combined algorithms have achieved about an order of magnitude improvement in runtime. These results provide an empirical evidence for our proposed framework's efficiency.