The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic approximation of metric spaces and its algorithmic applications
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Bounded Geometries, Fractals, and Low-Distortion Embeddings
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
A tight bound on approximating arbitrary metrics by tree metrics
Journal of Computer and System Sciences - Special issue: STOC 2003
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Gaussian Processes for Ordinal Regression
The Journal of Machine Learning Research
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Prediction, Learning, and Games
Prediction, Learning, and Games
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
SoftRank: optimizing non-smooth rank metrics
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Learning diverse rankings with multi-armed bandits
Proceedings of the 25th international conference on Machine learning
Better algorithms for benign bandits
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
Online learning with prior knowledge
COLT'07 Proceedings of the 20th annual conference on Learning theory
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Algorithms for adversarial bandit problems with multiple plays
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
The Journal of Machine Learning Research
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Hi-index | 0.00 |
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.