Learning and incentives in user-generated content: multi-armed bandits with endogenous arms

Authors:
Arpita Ghosh;Patrick Hummel
Affiliations:
Cornell University, Ithaca, NY, USA;Google Inc., Mountain View, CA, USA
Venue:
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Year:
2013

Citing 14
Cited 2

Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Online decision problems with large strategy sets

Online decision problems with large strategy sets
An incentive-compatible multi-armed bandit mechanism

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Characterizing truthful multi-armed bandit mechanisms: extended abstract

Proceedings of the 10th ACM conference on Electronic commerce
The price of truthfulness for pay-per-click auctions

Proceedings of the 10th ACM conference on Electronic commerce
Designing incentives for online question and answer forums

Proceedings of the 10th ACM conference on Electronic commerce
The role of game theory in human computation systems

Proceedings of the ACM SIGKDD Workshop on Human Computation
Truthful mechanisms with implicit payment computation

Proceedings of the 11th ACM conference on Electronic commerce
Incentivizing high-quality user-generated content

Proceedings of the 20th international conference on World wide web
A game-theoretic analysis of rank-order mechanisms for user-generated content

Proceedings of the 12th ACM conference on Electronic commerce
Implementing optimal outcomes in social computing: a game-theoretic approach

Proceedings of the 21st international conference on World Wide Web
A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities

Proceedings of the 13th ACM Conference on Electronic Commerce
Mean field equilibria of multiarmed bandit games

Proceedings of the 13th ACM Conference on Electronic Commerce
A game-theoretic analysis of the ESP game

ACM Transactions on Economics and Computation - Inaugural Issue

Incentives, gamification, and game theory: an economic approach to badge design

Proceedings of the fourteenth ACM conference on Electronic commerce
Social computing and user-generated content: a game-theoretic approach

ACM SIGecom Exchanges

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motivated by the problem of learning the qualities of user-generated content on the Web, we study a multi-armed bandit problem where the number and success probabilities of the arms of the bandit are endogenously determined by strategic agents in response to the incentives provided by the learning algorithm. We model the contributors of user-generated content as attention-motivated agents who derive benefit when their contribution is displayed, and have a cost to quality, where a contribution's quality is the probability of its receiving a positive viewer vote. Agents strategically choose whether and what quality contribution to produce in response to the algorithm that decides how to display contributions. The algorithm, which would like to eventually only display the highest quality contributions, can only learn a contribution's quality from the viewer votes the contribution receives when displayed. The problem of inferring the relative qualities of contributions using viewer feedback, to optimize for overall viewer satisfaction over time, can then be modeled as the classic multi-armed bandit problem, except that the arms available to the bandit and therefore the achievable regret are endogenously determined by strategic agents --- a good algorithm for this setting must not only quickly identify the best contributions, but also incentivize high-quality contributions to choose amongst in the first place. We first analyze the well-known UCB algorithm Ma [Auer et al. 2002] as a mechanism in this setting, where the total number of potential contributors or arms, K, can grow with the total number of viewers or available periods, T, and the maximum possible success probability of an arm, γ, may be bounded away from 1 to model malicious or error-prone viewers in the audience. We first show that while Ma can incentivize high-quality arms and achieve strong sublinear equilibrium regret when K(T) does not grow too quickly with T, it incentivizes very low quality contributions when K(T) scales proportionally with T. We then show that modifying the UCB mechanism to explore a randomly chosen restricted subset of √{T} arms provides excellent incentive properties --- this modified mechanism achieves strong sublinear regret, which is the regret measured against the maximum achievable quality γ, in every equilibrium, for all ranges of K(T) ≤ T, for all possible values of the audience parameter $\gamma$.