Policy invariance under reward transformations for general-sum stochastic games

Authors:
Xiaosong Lu;Howard M. Schwartz;Sidney N. Givigi
Affiliations:
Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada;Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada;Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, ON, Canada
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 10
Cited 1

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Multiagent learning in the presence of agents with limitations

Multiagent learning in the presence of agents with limitations
Social reward shaping in the prisoner's dilemma

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Potential-based shaping and Q-value initialization are equivalent

Journal of Artificial Intelligence Research
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We extend the potential-based shapingmethod fromMarkov decision processes to multiplayer general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.