On Fixed Convex Combinations of No-Regret Learners

Authors:
Jan-P. Calliess
Affiliations:
Machine Learning Dept., Carnegie Mellon University, Pittsburgh, USA
Venue:
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2009

Citing 10
Cited 0

The Strength of Weak Learnability

Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Approximate solutions to markov decision processes

Approximate solutions to markov decision processes
Convex Optimization

Convex Optimization
Online learning in online auctions

Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
No-regret learning in convex games

Proceedings of the 25th international conference on Machine learning
The weighted majority algorithm

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
No-regret learning and a mechanism for distributed multiagent planning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
From external to internal regret

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

No-regret algorithms for online convex optimization are potent online learning tools and have been demonstrated to be successful in a wide-ranging number of applications. Considering affine and external regret, we investigate what happens when a set of no-regret learners (voters ) merge their respective decisions in each learning iteration to a single, common one in form of a convex combination. We show that an agent (or algorithm) that executes this merged decision in each iteration of the online learning process and each time feeds back a copy of its own reward function to the voters, incurs sublinear regret itself. As a by-product, we obtain a simple method that allows us to construct new no-regret algorithms out of known ones.