POMDP solving: what rewards do you really expect at execution?

  • Authors:
  • Caroline Ponzoni Carvalho Chanel;Jean-Loup Farges;Florent Teichteil-Königsbuch;Guillaume Infantes

  • Affiliations:
  • ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr and ISAE --Institut Supérieur de l'Aéronautique et de l'Espace;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr

  • Venue:
  • Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.