Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning

Authors:
Dipyaman Banerjee;Sandip Sen
Affiliations:
Department of Computer Science, University of Tulsa, Tulsa, USA;Department of Computer Science, University of Tulsa, Tulsa, USA
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2007

Citing 15
Cited 4

Multiagent learning using a variable learning rate

Artificial Intelligence
Predicting the Expected Behavior of Agents that Learn About Agents: The CLRI Framework

Autonomous Agents and Multi-Agent Systems
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning to Reach the Pareto Optimal Nash Equilibrium as a Team

AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Implicit Negotiation in Repeated Games

ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
Towards a pareto-optimal solution in general-sum games

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
A polynomial-time Nash equilibrium algorithm for repeated games

Decision Support Systems - Special issue: The fourth ACM conference on electronic commerce
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems
Learning to compete, compromise, and cooperate in repeated general-sum games

ICML '05 Proceedings of the 22nd international conference on Machine learning
Evolutionary game theory and multi-agent reinforcement learning

The Knowledge Engineering Review
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

Machine Learning
Existence of multiagent equilibria with limited agents

Journal of Artificial Intelligence Research
Satisficing and learning cooperation in the prisoner's dilemma

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1

On the analysis of reputation for agent-based web services

Expert Systems with Applications: An International Journal
Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning to achieve socially optimal solutions in general-sum games

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Strategic adaptation of humans playing computer algorithms in a repeated constant-sum game

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other's actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner's Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner's Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.