Repeatable and reliable search system evaluation using crowdsourcing

Authors:
Roi Blanco;Harry Halpin;Daniel M. Herzig;Peter Mika;Jeffrey Pound;Henry S. Thompson;Thanh Tran Duc
Affiliations:
Yahoo! Research, Barcelona, Spain;University of Edinburgh, Edinburgh, United Kingdom;Karlsruhe Institute of Technology, Karlsruhe, Germany;Yahoo! Research, Barcelona, Spain;University of Waterloo, Waterloo, Canada;University of Edinburgh, Edinburgh, United Kingdom;Karlsruhe Institute of Technology, Karlsruhe, Germany
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 16
Cited 12

Variations in relevance assessments and the measurement of retrieval effectiveness

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Swoogle: a search and metadata engine for the semantic web

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Novelty detection: the TREC experience

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Sindice.com: a document-oriented lookup index for open linked data

International Journal of Metadata, Semantics and Ontologies
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
SPARK: A Keyword Search Engine on Relational Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Overview of the INEX 2008 Ad Hoc Track

Advances in Focused Retrieval
Language-model-based ranking for queries on RDF-graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Ad-hoc object retrieval in the web of data

Proceedings of the 19th international conference on World wide web
Crowdsourcing assessments for XML ranked retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Effective and efficient entity search in RDF data

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Relevance feedback between hypertext and Semantic Web search: Frameworks and evaluation

Web Semantics: Science, Services and Agents on the World Wide Web
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Proceedings of the 21st international conference on World Wide Web
Query-Independent learning to rank for RDF entity search

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Combining inverted indices and structured search for ad-hoc object retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Evaluating web archive search systems

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Influence of timeline and named-entity components on user engagement

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Pick-a-crowd: tell me what you like, and i'll tell you what to do

Proceedings of the 22nd international conference on World Wide Web
Understanding how people interact with web search results that change in real-time using implicit feedback

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Repeatable and reliable semantic search evaluation

Web Semantics: Science, Services and Agents on the World Wide Web
Large-scale linked data integration using probabilistic reasoning and crowdsourcing

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.