Technical Note: \cal Q-Learning
Machine Learning
Competitive Markov decision processes
Competitive Markov decision processes
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Team-partitioned, opaque-transition reinforcement learning
Proceedings of the third annual conference on Autonomous Agents
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement Learning in the Multi-Robot Domain
Autonomous Robots
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Theory and application of reward shaping in reinforcement learning
Theory and application of reward shaping in reinforcement learning
If multi-agent learning is the answer, what is the question?
Artificial Intelligence
Automatic shaping and decomposition of reward functions
Proceedings of the 24th international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Social reward shaping in the prisoner's dilemma
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Potential-based shaping in model-based reinforcement learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Potential-based shaping and Q-value initialization are equivalent
Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems
IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Theoretical considerations of potential-based reward shaping for multi-agent systems
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A Comprehensive Survey of Multiagent Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Potential-based reward shaping for POMDPs
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Active sensing in complex multiagent environments
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning potential functions and their representations for multi-task reinforcement learning
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multi-agent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically. In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.