Social reward shaping in the prisoner's dilemma

Authors:
Monica Babes;Enrique Munoz de Cote;Michael L. Littman
Affiliations:
Rutgers University Piscataway, NJ;Politecnico di Milano, DEI, Milan, Italy;Rutgers University, Piscataway, NJ
Venue:
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Year:
2008

Citing 7
Cited 7

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Implicit Negotiation in Repeated Games

ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
A polynomial-time Nash equilibrium algorithm for repeated games

Decision Support Systems - Special issue: The fourth ACM conference on electronic commerce
If multi-agent learning is the answer, what is the question?

Artificial Intelligence
Potential-based shaping and Q-value initialization are equivalent

Journal of Artificial Intelligence Research

Planning against fictitious players in repeated normal form games

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Multi-agent, reward shaping for RoboCup KeepAway

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Policy invariance under reward transformations for general-sum stochastic games

Journal of Artificial Intelligence Research
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Organizational design principles and techniques for decision-theoretic agents

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to near-optimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagent-learning framework. We present preliminary experiments in the iterated Prisoner's dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and non-learning strategies. In particular, we show that these agents can both lead---encourage adaptive opponents to stably cooperate---and follow---adopt a best-response strategy when paired with a fixed opponent---where better known approaches achieve only one of these objectives.