STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
IEEE Transactions on Knowledge and Data Engineering
Sampling from large matrices: An approach through geometric functional analysis
Journal of the ACM (JACM)
Algorithms and incentives for robust ranking
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
SybilGuard: defending against sybil attacks via social networks
IEEE/ACM Transactions on Networking (TON)
Making decisions based on the preferences of multiple agents
Communications of the ACM
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
Incentivizing high-quality user-generated content
Proceedings of the 20th international conference on World wide web
CDAS: a crowdsourcing data analytics system
Proceedings of the VLDB Endowment
Efficient crowdsourcing for multi-class labeling
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Aggregating crowdsourced binary ratings
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
A large fraction of user-generated content on the Web, such as posts or comments on popular online forums, consists of abuse or spam. Due to the volume of contributions on popular sites, a few trusted moderators cannot identify all such abusive content, so viewer ratings of contributions must be used for moderation. But not all viewers who rate content are trustworthy and accurate. What is a principled approach to assigning trust and aggregating user ratings, in order to accurately identify abusive content? In this paper, we introduce a framework to address the problem of moderating online content using crowdsourced ratings. Our framework encompasses users who are untrustworthy or inaccurate to an unknown extent --- that is, both the content and the raters are of unknown quality. With no knowledge whatsoever about the raters, it is impossible to do better than a random estimate. We present efficient algorithms to accurately detect abuse that only require knowledge about the identity of a single 'good' agent, who rates contributions accurately more than half the time. We prove that our algorithm can infer the quality of contributions with error that rapidly converges to zero as the number of observations increases; we also numerically demonstrate that the algorithm has very high accuracy for much fewer observations. Finally, we analyze the robustness of our algorithms to manipulation by adversarial or strategic raters, an important issue in moderating online content, and quantify how the performance of the algorithm degrades with the number of manipulating agents.