Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Online decision problems with large strategy sets
Online decision problems with large strategy sets
An incentive-compatible multi-armed bandit mechanism
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
The price of truthfulness for pay-per-click auctions
Proceedings of the 10th ACM conference on Electronic commerce
Designing incentives for online question and answer forums
Proceedings of the 10th ACM conference on Electronic commerce
The role of game theory in human computation systems
Proceedings of the ACM SIGKDD Workshop on Human Computation
Truthful mechanisms with implicit payment computation
Proceedings of the 11th ACM conference on Electronic commerce
Incentivizing high-quality user-generated content
Proceedings of the 20th international conference on World wide web
A game-theoretic analysis of rank-order mechanisms for user-generated content
Proceedings of the 12th ACM conference on Electronic commerce
Implementing optimal outcomes in social computing: a game-theoretic approach
Proceedings of the 21st international conference on World Wide Web
A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities
Proceedings of the 13th ACM Conference on Electronic Commerce
Mean field equilibria of multiarmed bandit games
Proceedings of the 13th ACM Conference on Electronic Commerce
A game-theoretic analysis of the ESP game
ACM Transactions on Economics and Computation - Inaugural Issue
Incentives, gamification, and game theory: an economic approach to badge design
Proceedings of the fourteenth ACM conference on Electronic commerce
Social computing and user-generated content: a game-theoretic approach
ACM SIGecom Exchanges
Hi-index | 0.00 |
Motivated by the problem of learning the qualities of user-generated content on the Web, we study a multi-armed bandit problem where the number and success probabilities of the arms of the bandit are endogenously determined by strategic agents in response to the incentives provided by the learning algorithm. We model the contributors of user-generated content as attention-motivated agents who derive benefit when their contribution is displayed, and have a cost to quality, where a contribution's quality is the probability of its receiving a positive viewer vote. Agents strategically choose whether and what quality contribution to produce in response to the algorithm that decides how to display contributions. The algorithm, which would like to eventually only display the highest quality contributions, can only learn a contribution's quality from the viewer votes the contribution receives when displayed. The problem of inferring the relative qualities of contributions using viewer feedback, to optimize for overall viewer satisfaction over time, can then be modeled as the classic multi-armed bandit problem, except that the arms available to the bandit and therefore the achievable regret are endogenously determined by strategic agents --- a good algorithm for this setting must not only quickly identify the best contributions, but also incentivize high-quality contributions to choose amongst in the first place. We first analyze the well-known UCB algorithm Ma [Auer et al. 2002] as a mechanism in this setting, where the total number of potential contributors or arms, K, can grow with the total number of viewers or available periods, T, and the maximum possible success probability of an arm, γ, may be bounded away from 1 to model malicious or error-prone viewers in the audience. We first show that while Ma can incentivize high-quality arms and achieve strong sublinear equilibrium regret when K(T) does not grow too quickly with T, it incentivizes very low quality contributions when K(T) scales proportionally with T. We then show that modifying the UCB mechanism to explore a randomly chosen restricted subset of √{T} arms provides excellent incentive properties --- this modified mechanism achieves strong sublinear regret, which is the regret measured against the maximum achievable quality γ, in every equilibrium, for all ranges of K(T) ≤ T, for all possible values of the audience parameter $\gamma$.