Technical Note: \cal Q-Learning
Machine Learning
Multiagent learning using a variable learning rate
Artificial Intelligence
Towards a pareto-optimal solution in general-sum games
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
A polynomial-time Nash equilibrium algorithm for repeated games
Decision Support Systems - Special issue: The fourth ACM conference on electronic commerce
Hi-index | 0.00 |
Learning to converge to an efficient, i.e., Pareto-optimal Nash equilibrium of the repeated game is an open problem in multiagent learning. Our goal is to facilitate the learning of efficient outcomes in repeated plays of incomplete information games when only opponent's actions but not its payoffs are observable. We use a two-stage protocol that allows a player to unilaterally commit to an action, allowing the other player to choose an action knowing the action chosen by the committed player. The motivation behind commitment is to promote trust between the players and prevent them from mutually harmful choices made to preclude worst-case outcomes. Our agents learn whether commitment is beneficial or not. Interestingly, the decision to commit can be thought of as expanding the action space and our proposed protocol can be incorporated by any learning strategies used for playing repeated games. We show the improvement of the outcome efficiency of standard learning algorithms when using our proposed commitment protocol. We propose convergence to pareto optimal Nash equilibrium of repeated games as desirable learning outcomes. The performance evaluation in this paper uses a similarly motivated metric that measures the percentage of Nash equilibria for repated games that dominate the observed outcome.