Elements of information theory
Elements of information theory
Computing with Noisy Information
SIAM Journal on Computing
Randomized algorithms
Selection in the presence of noise: the design of playoff systems
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Regret Minimization Under Partial Monitoring
Mathematics of Operations Research
The Journal of Machine Learning Research
Noisy binary search and its applications
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
The Bayesian Learner is Optimal for Noisy Binary Search (and Pretty Good for Quantum as Well)
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Interactively optimizing information retrieval systems as a dueling bandits problem
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Journal of Artificial Intelligence Research
Robust reductions from ranking to classification
COLT'07 Proceedings of the 20th annual conference on Learning theory
Reusing historical interaction data for faster online learning to rank for IR
Proceedings of the sixth ACM international conference on Web search and data mining
Lazy paired hyper-parameter tuning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Relative confidence sampling for efficient on-line ranker evaluation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor).