Information-based objective functions for active data selection
Neural Computation
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Integration of news content into web results
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Predicting bounce rates in sponsored search advertisements
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Linearly Parameterized Bandits
Mathematics of Operations Research
Winter Simulation Conference
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Operations Research
LogUCB: an explore-exploit algorithm for comments recommendation
Proceedings of the 21st ACM international conference on Information and knowledge management
Sequential selection of correlated ads by POMDPs
Proceedings of the 21st ACM international conference on Information and knowledge management
IEEE/ACM Transactions on Networking (TON)
Interactive exploratory search for multi page search results
Proceedings of the 22nd international conference on World Wide Web
Mixing bandits: a recipe for improved cold-start recommendations in a social network
Proceedings of the 7th Workshop on Social Network Mining and Analysis
Online learning for auction mechanism in bandit setting
Decision Support Systems
Hi-index | 0.00 |
We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDP-based policy for the discounted reward case, and also give an approximation of it with formal error guarantee. We discuss lower bounds on regret in the undiscounted reward scenario, and propose a general two-level bandit policy for it. We propose three different instantiations of our general policy and provide theoretical justifications of how the regret of the instantiated policies depend on the characteristics of the clusters. Finally, we empirically demonstrate the efficacy of our policies on large-scale real-world and synthetic data, and show that they significantly outperform classical policies designed for bandits with independent arms.