Rollout Algorithms for Stochastic Scheduling Problems
Journal of Heuristics
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Heuristic search value iteration for POMDPs
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Potential-based shaping in model-based reinforcement learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Dynamic potential-based reward shaping
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Active sensing in complex multiagent environments
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.