Robbing the bandit: less regret in online geometric optimization against an adaptive adversary

Authors:
Varsha Dani;Thomas P. Hayes
Affiliations:
University of Chicago;University of California at Berkeley
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 5
Cited 13

The weighted majority algorithm

Information and Computation
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

Approximation algorithms and online mechanisms for item pricing

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Playing games with approximation algorithms

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
Regret minimization and the price of total anarchy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Following the Perturbed Leader to Gamble at Multi-armed Bandits

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Better algorithms for benign bandits

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Characterizing truthful multi-armed bandit mechanisms: extended abstract

Proceedings of the 10th ACM conference on Electronic commerce
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Return of the boss problem: competing online against a non-adaptive adversary

FUN'10 Proceedings of the 5th international conference on Fun with algorithms
Better Algorithms for Benign Bandits

The Journal of Machine Learning Research
Understanding and protecting privacy: formal semantics and principled audit mechanisms

ICISS'11 Proceedings of the 7th international conference on Information Systems Security
Combinatorial bandits

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider "online bandit geometric optimization," a problem of iterated decision making in a largely unknown and constantly changing environment. The goal is to minimize "regret," defined as the difference between the actual loss of an online decision-making procedure and that of the best single decision in hindsight. "Geometric optimization" refers to a generalization of the well-known multi-armed bandit problem, in which the decision space is some bounded subset of Rd, the adversary is restricted to linear loss functions, and regret bounds should depend on the dimensionality d, rather than the total number of possible decisions. "Bandit" refers to the setting in which the algorithm is only told its loss on each round, rather than the entire loss function.McMahan and Blum [10] presented the best known algorithm in this setting, and proved that its expected additive regret is O(poly(d)T3/4). We simplify and improve their analysis of this algorithm to obtain regret O(poly(d)T2/3).We also prove that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary.