Potential-based reward shaping for POMDPs

Authors:
Adam Eck;Leen-Kiat Soh;Sam Devlin;Daniel Kudenko
Affiliations:
University of Nebraska-Lincoln, Lincoln, Nebraska, USA;University of Nebraska-Lincoln, Lincoln, Nebraska, USA;University of York, York, United Kingdom;University of York, York, United Kingdom
Venue:
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Year:
2013

Citing 7
Cited 1

Rollout Algorithms for Stochastic Scheduling Problems

Journal of Heuristics
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Active sensing in complex multiagent environments

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.