Semi-supervised correction of biased comment ratings

Authors:
Abhinav Mishra;Rajeev Rastogi
Affiliations:
Yahoo! Labs Bangalore, Bangalore, India;Yahoo! Labs Bangalore, Bangalore, India
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 20
Cited 1

Social psychological aspects of computer-mediated communication (Reprint)

Computer-supported cooperative work: a book of readings
Data quality in context

Communications of the ACM
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Stable algorithms for link analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Slash(dot) and burn: distributed moderation in a large online conversation space

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Bias and controversy: beyond the statistical deviation

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Follow the reader: filtering comments on slashdot

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Truth Discovery with Multiple Conflicting Information Providers on the Web

IEEE Transactions on Knowledge and Data Engineering
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Using graph-based metrics with empirical risk minimization to speed up active learning on networked data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking Comments on the Social Web

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
How useful are your comments?: analyzing and predicting youtube comments and comment ratings

Proceedings of the 19th international conference on World wide web
Towards quality discourse in online news comments

Proceedings of the ACM 2011 conference on Computer supported cooperative work
Topicality, time, and sentiment in online news comments

CHI '11 Extended Abstracts on Human Factors in Computing Systems
User reputation in a comment rating environment

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Content-driven trust propagation framework

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Personalized recommendation of user comments via factor models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Robust detection of comment spam using entropy rate

Proceedings of the 5th ACM workshop on Security and artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many instances, offensive comments on the internet attract a disproportionate number of positive ratings from highly biased users. This results in an undesirable scenario where these offensive comments are the top rated ones. In this paper, we develop semi-supervised learning techniques to correct the bias in user ratings of comments. Our scheme uses a small number of comment labels in conjunction with user rating information to iteratively compute user bias and unbiased ratings for unlabeled comments. We show that the running time of each iteration is linear in the number of ratings, and the system converges to a unique fixed point. To select the comments to label, we devise an active learning algorithm based on empirical risk minimization. Our active learning method incrementally updates the risk for neighboring comments each time a comment is labeled, and thus can easily scale to large comment datasets. On real-life comments from Yahoo! News, our semi-supervised and active learning algorithms achieve higher accuracy than simple baselines, with few labeled examples.