Bayesian reinforcement learning in continuous pomdps with Gaussian processes

  • Authors:
  • Patrick Dallaire;Camille Besse;Stephane Ross;Brahim Chaib-draa

  • Affiliations:
  • partment of Computer Science, Laval University, Quebec, Canada;partment of Computer Science, Laval University, Quebec, Canada;Robotics Institute, Carnegie Mellon University, Pittsburgh, PA;partment of Computer Science, Laval University, Quebec, Canada

  • Venue:
  • IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle real-world sequential decision processes but require a known model to be solved by most approaches. However, mainstream POMDP research focuses on the discrete case and this complicates its application to most realistic problems that are naturally modeled using continuous state spaces. In this paper, we consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are unknown. We advocate the use of Gaussian Process Dynamical Models (GPDMs) so that we can learn the model through experience with the environment. Our results on the blimp problem show that the approach can learn good models of the sensors and actuators in order to maximize long-term rewards.