Social reward shaping in the prisoner's dilemma

  • Authors:
  • Monica Babes;Enrique Munoz de Cote;Michael L. Littman

  • Affiliations:
  • Rutgers University Piscataway, NJ;Politecnico di Milano, DEI, Milan, Italy;Rutgers University, Piscataway, NJ

  • Venue:
  • Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to near-optimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagent-learning framework. We present preliminary experiments in the iterated Prisoner's dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and non-learning strategies. In particular, we show that these agents can both lead---encourage adaptive opponents to stably cooperate---and follow---adopt a best-response strategy when paired with a fixed opponent---where better known approaches achieve only one of these objectives.