Trust, but verify: predicting contribution quality for knowledge base construction and curation

Authors:
Chun How Tan;Eugene Agichtein;Panos Ipeirotis;Evgeniy Gabrilovich
Affiliations:
Google, Mountain View, USA;Emory University, Google, Atlanta, USA;New York University, Google, New York, USA;Google, Mountain View, USA
Venue:
Proceedings of the 7th ACM international conference on Web search and data mining
Year:
2014

Citing 31
Cited 0

Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms

Management Science
Studying cooperation and conflict between authors with history flow visualizations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A framework to predict the quality of answers with non-textual features

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Knowledge sharing and yahoo answers: everyone knows something

Proceedings of the 17th international conference on World Wide Web
Predicting information seeker satisfaction in community question answering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Quality-aware collaborative question answering: methods and evaluation

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Learning to recognize reliable users and content in social media with coupled mutual reinforcement

Proceedings of the 18th international conference on World wide web
Boosting with structural sparsity

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
A jury of your peers: quality, experience and ownership in Wikipedia

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Measuring author contributions to the Wikipedia

WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Evaluating and predicting answer quality in community QA

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale human computation engine

Proceedings of the ACM SIGKDD Workshop on Human Computation
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Learning From Crowds

The Journal of Machine Learning Research
Instrumenting the crowd: using implicit behavioral measures to predict task performance

Proceedings of the 24th annual ACM symposium on User interface software and technology
Learning from history: predicting reverted work at the word level in wikipedia

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Human-machine cooperation with epistemological DBs: supporting user corrections to knowledge bases

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
Have you done anything like that?: predicting performance using inter-category reputation

Proceedings of the sixth ACM international conference on Web search and data mining
Building, maintaining, and using knowledge bases: a report from the trenches

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Steering user behavior with badges

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The largest publicly available knowledge repositories, such as Wikipedia and Freebase, owe their existence and growth to volunteer contributors around the globe. While the majority of contributions are correct, errors can still creep in, due to editors' carelessness, misunderstanding of the schema, malice, or even lack of accepted ground truth. If left undetected, inaccuracies often degrade the experience of users and the performance of applications that rely on these knowledge repositories. We present a new method, CQUAL, for automatically predicting the quality of contributions submitted to a knowledge base. Significantly expanding upon previous work, our method holistically exploits a variety of signals, including the user's domains of expertise as reflected in her prior contribution history, and the historical accuracy rates of different types of facts. In a large-scale human evaluation, our method exhibits precision of 91% at 80% recall. Our model verifies whether a contribution is correct immediately after it is submitted, significantly alleviating the need for post-submission human reviewing.