The K-armed dueling bandits problem

Authors:
Yisong Yue;Josef Broder;Robert Kleinberg;Thorsten Joachims
Affiliations:
H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA 15213, United States;Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, United States;Department of Computer Science, Cornell University, Ithaca, NY 14853, United States;Department of Computer Science, Cornell University, Ithaca, NY 14853, United States
Venue:
Journal of Computer and System Sciences
Year:
2012

Citing 18
Cited 3

Elements of information theory

Elements of information theory
Computing with Noisy Information

SIAM Journal on Computing
Randomized algorithms

Randomized algorithms
Selection in the presence of noise: the design of playoff systems

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Journal of Machine Learning Research
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Regret Minimization Under Partial Monitoring

Mathematics of Operations Research
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Noisy binary search and its applications

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
The Bayesian Learner is Optimal for Noisy Binary Search (and Pretty Good for Quantum as Well)

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Interactively optimizing information retrieval systems as a dueling bandits problem

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning to order things

Journal of Artificial Intelligence Research
Robust reductions from ranking to classification

COLT'07 Proceedings of the 20th annual conference on Learning theory

Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Lazy paired hyper-parameter tuning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Relative confidence sampling for efficient on-line ranker evaluation

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor).