Competitive Markov decision processes
Competitive Markov decision processes
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Team-partitioned, opaque-transition reinforcement learning
Proceedings of the third annual conference on Autonomous Agents
Conjectural Equilibrium in Multiagent Learning
Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Introduction to Multiagent Systems
Introduction to Multiagent Systems
Reinforcement Learning in the Multi-Robot Domain
Autonomous Robots
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning of coordination in cooperative multi-agent systems
Eighteenth national conference on Artificial intelligence
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
If multi-agent learning is the answer, what is the question?
Artificial Intelligence
Automatic shaping and decomposition of reward functions
Proceedings of the 24th international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Social reward shaping in the prisoner's dilemma
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Potential-based shaping in model-based reinforcement learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Potential-based shaping and Q-value initialization are equivalent
Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems
IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Multi-agent, reward shaping for RoboCup KeepAway
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Decentralized learning in wireless sensor networks
ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
A Comprehensive Survey of Multiagent Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Multi-agent, reward shaping for RoboCup KeepAway
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
On the power of global reward signals in reinforcement learning
MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
Policy invariance under reward transformations for general-sum stochastic games
Journal of Artificial Intelligence Research
Dynamic potential-based reward shaping
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning potential functions and their representations for multi-task reinforcement learning
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.