Potential-based reward shaping for POMDPs

  • Authors:
  • Adam Eck;Leen-Kiat Soh;Sam Devlin;Daniel Kudenko

  • Affiliations:
  • University of Nebraska-Lincoln, Lincoln, Nebraska, USA;University of Nebraska-Lincoln, Lincoln, Nebraska, USA;University of York, York, United Kingdom;University of York, York, United Kingdom

  • Venue:
  • Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.