The weighted majority algorithm
Information and Computation
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Performance bounded reinforcement learning in strategic interactions
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
On the performance of on-line concurrent reinforcement learners
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Multi-agent learning model with bargaining
Proceedings of the 38th conference on Winter simulation
If multi-agent learning is the answer, what is the question?
Artificial Intelligence
Regret based dynamics: convergence in weakly acyclic games
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Exploiting based pre-testing in competition environment
PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
No regret learning for sensor relocation in mobile sensor networks
ICICA'11 Proceedings of the Second international conference on Information Computing and Applications
Hi-index | 0.00 |
We present new results on the efficiency of no-regret algorithmsin the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents. (2) close to the asymptotic optimal payoff against opponents that playa converging sequence of policies. and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents. with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.