Internet-scale collection of human-reviewed data

Authors:
Qi Su;Dmitry Pavlov;Jyh-Herng Chow;Wendell C. Baker
Affiliations:
Yahoo! Inc, Sunnyvale, CA;Yahoo! Inc, Sunnyvale, CA;Yahoo! Inc, Sunnyvale, CA;Yahoo! Inc, Sunnyvale, CA
Venue:
Proceedings of the 16th international conference on World Wide Web
Year:
2007

Citing 12
Cited 38

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Recommender systems

Communications of the ACM
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Secure distributed human computation

Proceedings of the 6th ACM conference on Electronic commerce
The Wisdom of Crowds

The Wisdom of Crowds
Peekaboom: a game for locating objects in images

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Verbosity: a game for collecting common-sense facts

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Games with a Purpose

Computer
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Finding the right facts in the crowd: factoid question answering over social media

Proceedings of the 17th international conference on World Wide Web
Knowledge sharing and yahoo answers: everyone knows something

Proceedings of the 17th international conference on World Wide Web
Predicting information seeker satisfaction in community question answering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A few bad votes too many?: towards robust ranking in social media

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Onomatopeta!: Developing a Japanese Onomatopoeia Learning-Support System Utilizing Native Speakers Cooperation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Quality-aware collaborative question answering: methods and evaluation

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling information-seeker satisfaction in community question answering

ACM Transactions on Knowledge Discovery from Data (TKDD)
How valuable is medical social media data? Content analysis of the medical web

Information Sciences: an International Journal
Learning to recognize reliable users and content in social media with coupled mutual reinforcement

Proceedings of the 18th international conference on World wide web
Data quality from crowdsourcing: a study of annotation selection criteria

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Ranking community answers by modeling question-answer relationships via analogical reasoning

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Fraud Detection by Human Agents: A Pilot Study

EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploratory modeling with collaborative design spaces

ACM SIGGRAPH Asia 2009 papers
Model for Voter Scoring and Best Answer Selection in Community Q&A Services

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Practical lessons of data mining at Yahoo!

Proceedings of the 18th ACM conference on Information and knowledge management
Socializing or knowledge sharing?: characterizing social intent in community question answering

Proceedings of the 18th ACM conference on Information and knowledge management
Question Answering Based on Answer Trustworthiness

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Acquiring high quality non-expert knowledge from on-demand workforce

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Why users of yahoo!: answers do not answer questions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Evaluating and predicting answer quality in community QA

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Toward automatic task design: a progress report

Proceedings of the ACM SIGKDD Workshop on Human Computation
Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Identifying aspects for web-search queries

Journal of Artificial Intelligence Research
A human-centric runtime framework for mixed service-oriented systems

Distributed and Parallel Databases
Stimulating skill evolution in market-based crowdsourcing

BPM'11 Proceedings of the 9th international conference on Business process management
Expertise ranking using activity and contextual link measures

Data & Knowledge Engineering
Declarative platform for data sourcing games

Proceedings of the 21st international conference on World Wide Web
Crowd-based data sourcing

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
The characteristics and motivations of health answerers for sharing information, knowledge, and experiences in online environments

Journal of the American Society for Information Science and Technology
Collective intelligence in the online social network of yahoo!answers and its implications

Proceedings of the 21st ACM international conference on Information and knowledge management
Bayesian vote weighting in crowdsourcing systems

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Auction-based crowdsourcing supporting skill management

Information Systems
Supervised collaboration for syntactic annotation of Quranic Arabic

Language Resources and Evaluation
Towards human-centric personalized expertise ranking in community-based question answering

Proceedings of the 2013 ACM SIGCOMM workshop on Future human-centric multimedia networking
Finding more trustworthy answers: Various trustworthiness factors in question answering

Journal of Information Science
Crowdsourcing tasks to social networks in BPEL4People

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enterprise and web data processing and content aggregation systems often require extensive use of human-reviewed data (e.g. for training and monitoring machine learning-based applications). Today these needs are often met by in-house efforts or out-sourced offshore contracting. Emerging applications attempt to provide automated collection of human-reviewed data at Internet-scale. We conduct extensive experiments to study the effectiveness of one such application. We also study the feasibility of using Yahoo! Answers, a general question-answering forum, for human-reviewed data collection.