Escaping local optima in POMDP planning as inference

  • Authors:
  • Pascal Poupart;Tobias Lang;Marc Toussaint

  • Affiliations:
  • University of Waterloo, Ontario, Canada;Machine Learning and Robotics Lab, FU Berlin, Germany;Machine Learning and Robotics Lab, FU Berlin, Germany

  • Venue:
  • The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We propose two algorithms: the first one adds nodes to the controller according to an increasingly deep forward search, while the second one splits nodes in a greedy fashion to improve reward likelihood.