Learning to commit in repeated games

Authors:
Stéphane Airiau;Sandip Sen
Affiliations:
University of Tulsa;University of Tulsa
Venue:
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Year:
2006

Citing 4
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Multiagent learning using a variable learning rate

Artificial Intelligence
Towards a pareto-optimal solution in general-sum games

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
A polynomial-time Nash equilibrium algorithm for repeated games

Decision Support Systems - Special issue: The fourth ACM conference on electronic commerce

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning to converge to an efficient, i.e., Pareto-optimal Nash equilibrium of the repeated game is an open problem in multiagent learning. Our goal is to facilitate the learning of efficient outcomes in repeated plays of incomplete information games when only opponent's actions but not its payoffs are observable. We use a two-stage protocol that allows a player to unilaterally commit to an action, allowing the other player to choose an action knowing the action chosen by the committed player. The motivation behind commitment is to promote trust between the players and prevent them from mutually harmful choices made to preclude worst-case outcomes. Our agents learn whether commitment is beneficial or not. Interestingly, the decision to commit can be thought of as expanding the action space and our proposed protocol can be incorporated by any learning strategies used for playing repeated games. We show the improvement of the outcome efficiency of standard learning algorithms when using our proposed commitment protocol. We propose convergence to pareto optimal Nash equilibrium of repeated games as desirable learning outcomes. The performance evaluation in this paper uses a similarly motivated metric that measures the percentage of Nash equilibria for repated games that dominate the observed outcome.