Ranked bandits in metric spaces: learning diverse rankings over large document collections

Authors:
Aleksandrs Slivkins;Filip Radlinski;Sreenivas Gollapudi
Affiliations:
Microsoft Research Silicon Valley, Mountain View, CA;Microsoft Research Cambridge, Cambridge, UK;Microsoft Research Silicon Valley, Mountain View, CA
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 27
Cited 0

The Continuum-Armed Bandit Problem

SIAM Journal on Control and Optimization
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic approximation of metric spaces and its algorithmic applications

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Bounded Geometries, Fractals, and Low-Distortion Embeddings

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
A tight bound on approximating arbitrary metrics by tree metrics

Journal of Computer and System Sciences - Special issue: STOC 2003
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Gaussian Processes for Ordinal Regression

The Journal of Machine Learning Research
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Online linear optimization and adaptive routing

Journal of Computer and System Sciences
SoftRank: optimizing non-smooth rank metrics

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Learning diverse rankings with multi-armed bandits

Proceedings of the 25th international conference on Machine learning
Better algorithms for benign bandits

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Improved rates for the stochastic continuum-armed bandit problem

COLT'07 Proceedings of the 20th annual conference on Learning theory
Online learning with prior knowledge

COLT'07 Proceedings of the 20th annual conference on Learning theory
A contextual-bandit approach to personalized news article recommendation

Proceedings of the 19th international conference on World wide web
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Algorithms for adversarial bandit problems with multiple plays

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Proceedings of the fourth ACM international conference on Web search and data mining
X-Armed Bandits

The Journal of Machine Learning Research
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.