POMDP solving: what rewards do you really expect at execution?

Authors:
Caroline Ponzoni Carvalho Chanel;Jean-Loup Farges;Florent Teichteil-Königsbuch;Guillaume Infantes
Affiliations:
ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr and ISAE --Institut Supérieur de l'Aéronautique et de l'Espace;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr;ONERA --Office National d'Etudes et de Recherches Aérospatiales, Toulouse, France. Email: name.surname@onera.fr
Venue:
Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Year:
2010

Citing 5
Cited 0

Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Active mobile robot localization

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.