Multiagent learning using a variable learning rate
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Anarchy, Stability, and Utopia: Creating Better Matchings
SAGT '09 Proceedings of the 2nd International Symposium on Algorithmic Game Theory
Matching, cardinal utility, and social welfare
ACM SIGecom Exchanges
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Proceedings of the 13th ACM Conference on Electronic Commerce
Hi-index | 0.00 |
We study the decision problems facing agents in repeated matching environments with learning, or two-sided bandit problems, and examine the dating market, in which men and women repeatedly go out on dates and learn about each other, as an example. We consider three natural matching mechanisms and empirically examine properties of these mechanisms, focusing on the asymptotic stability of the resulting matchings when the agents use a simple learning rule coupled with an ε-greedy exploration policy. Matchings tend to be more stable when agents are patient in two different ways -- if they are more likely to explore early or if they are more optimistic. However, the two forms of patience do not interact well in terms of increasing the probability of stable outcomes. We also define a notion of regret for the two-sided problem and study the distribution of regrets under the different matching mechanisms.