Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Retrieval sensitivity under training using different measures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance judgments between TREC and Non-TREC assessors
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing for relevance evaluation
ACM SIGIR Forum
Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Are your participants gaming the system?: screening mechanical turk workers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Crowdsourcing using Mechanical Turk: quality management and scalability
Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011
Design and implementation of relevance assessments using crowdsourcing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
CrowdTerrier: automatic crowdsourced relevance assessments with terrier
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An examination of content farms in web search using crowdsourcing
Proceedings of the 21st ACM international conference on Information and knowledge management
Crowdsourcing for information retrieval: introduction to the special issue
Information Retrieval
News vertical search: when and what to display to users
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Identifying local events by using microblogs as social sensors
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Hi-index | 0.00 |
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.