Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Hedged learning: regret-minimization with learning experts
ICML '05 Proceedings of the 22nd international conference on Machine learning
Experience-efficient learning in associative bandit problems
ICML '06 Proceedings of the 23rd international conference on Machine learning
The social Ultimatum Game and adaptive agents
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Hi-index | 0.00 |
No-regret is described as one framework that game theorists and computer scientists have converged upon for designing and evaluating multi-agent learning algorithms. However, Shoham, Powers, and Grenager also point out that the framework has serious deficiencies, such as behaving sub-optimally against certain reactive opponents. But all is not lost. With some simple modifications, regret-minimizing algorithms can perform in many of the ways we wish multi-agent learning algorithms to perform, providing safety and adaptability against reactive opponents. We argue that the research community should have no regrets about no-regret methods.