Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Learning Options in Reinforcement Learning
Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
An online POMDP algorithm for complex multiagent environments
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Real-time hierarchical POMDPs for autonomous robot navigation
Robotics and Autonomous Systems
Point-Based Value Iteration for Continuous POMDPs
The Journal of Machine Learning Research
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking
EURASIP Journal on Advances in Signal Processing - Special issue on signal processing advances in robots and autonomy
Near-optimal observation selection using submodular functions
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs with continuous or large discrete observation spaces
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Solving POMDPs: RTDP-bel vs. point-based algorithms
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
On the Design and Use of a Micro Air Vehicle to Track and Avoid Adversaries
International Journal of Robotics Research
Approximate planning for factored POMDPs using belief state simplification
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Policy-contingent abstraction for robust robot control
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Survey Constrained model predictive control: Stability and optimality
Automatica (Journal of IFAC)
Adaptive collective routing using gaussian process dynamic congestion models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Decentralized multi-robot cooperation with auctioned POMDPs
International Journal of Robotics Research
Point-based online value iteration algorithm in large POMDP
Applied Intelligence
Hi-index | 0.00 |
Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multistep lookahead is required to achieve good performance.