Efficient planning under uncertainty with macro-actions

Authors:
Ruijie He;Emma Brunskill;Nicholas Roy
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Electrical Engineering and Computer Science Department, University of California, Berkeley, Berkeley, CA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 19
Cited 3

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Real-time hierarchical POMDPs for autonomous robot navigation

Robotics and Autonomous Systems
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking

EURASIP Journal on Advances in Signal Processing - Special issue on signal processing advances in robots and autonomy
Near-optimal observation selection using submodular functions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs with continuous or large discrete observation spaces

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Solving POMDPs: RTDP-bel vs. point-based algorithms

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
On the Design and Use of a Micro Air Vehicle to Track and Avoid Adversaries

International Journal of Robotics Research
Approximate planning for factored POMDPs using belief state simplification

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Policy-contingent abstraction for robust robot control

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Survey Constrained model predictive control: Stability and optimality

Automatica (Journal of IFAC)

Adaptive collective routing using gaussian process dynamic congestion models

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Decentralized multi-robot cooperation with auctioned POMDPs

International Journal of Robotics Research
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multistep lookahead is required to achieve good performance.