From External to Internal Regret

Authors:
Avrim Blum;Yishay Mansour
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 13

A few good agents: multi-agent social learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Supermartingales in Prediction with Expert Advice

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
On the convergence of regret minimization dynamics in concave games

Proceedings of the forty-first annual ACM symposium on Theory of computing
Efficient learning algorithms for changing environments

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regret Minimization and Job Scheduling

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Supermartingales in prediction with expert advice

Theoretical Computer Science
λ-Perceptron: An adaptive classifier for data streams

Pattern Recognition
Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

The Journal of Machine Learning Research
Convergence time of power-control dynamics

ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
Learning hurdles for sleeping experts

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference
Review: a unifying framework for iterative approximate best-response algorithms for distributed constraint optimization problems1

The Knowledge Engineering Review
Forecasting electricity consumption by aggregating specialized experts

Machine Learning
Online portfolio selection: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another. In this paper we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium. For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.