Technical Note: \cal Q-Learning
Machine Learning
Multiagent learning using a variable learning rate
Artificial Intelligence
On No-Regret Learning, Fictitious Play, and Nash Equilibrium
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Towards a pareto-optimal solution in general-sum games
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning
Autonomous Agents and Multi-Agent Systems
A few good agents: multi-agent social learning
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Satisficing and learning cooperation in the prisoner's dilemma
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
A polynomial-time Nash equilibrium algorithm for repeated games
Decision Support Systems - Special issue: The fourth ACM conference on electronic commerce
Strategy and Fairness in Repeated Two-agent Interaction
ICTAI '10 Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Achieving Socially Optimal Outcomes in Multiagent Systems with Reinforcement Social Learning
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Hi-index | 0.00 |
During multi-agent interactions, robust strategies are needed to help the agents to coordinate their actions on efficient outcomes. A large body of previous work focuses on designing strategies towards the goal of Nash equilibrium under self-play, which can be extremely inefficient in many situations. On the other hand, apart from performing well under self-play, a good strategy should also be able to well respond against those opponents adopting different strategies as much as possible. In this paper, we consider a particular class of opponents whose strategies are based on best-response policy and also we target at achieving the goal of social optimality. We propose a novel learning strategy TaFSO which can effectively influence the opponent's behavior towards socially optimal outcomes by utilizing the characteristic of best-response learners. Extensive simulations show that our strategy TaFSO achieves better performance than previous work under both self-play and against the class of best-response learners.