Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems

Authors:
Guido Zuccon;Teerapong Leelanupab;Stewart Whiting;Emine Yilmaz;Joemon M. Jose;Leif Azzopardi
Affiliations:
Australian e-Health Research Centre, CSIRO, Brisbane, Australia;King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand;School of Computing Science, University of Glasgow, Glasgow, UK;Microsoft Research, Cambridge, UK;School of Computing Science, University of Glasgow, Glasgow, UK;School of Computing Science, University of Glasgow, Glasgow, UK
Venue:
Information Retrieval
Year:
2013

Citing 15
Cited 1

The TREC interactive track: an annotated bibliography

Information Processing and Management: an International Journal - Special issue on interactivity at the text retrieval conference (TREC)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Evaluation Challenges and Directions for Information-Seeking Support Systems

Computer
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Methods for Evaluating Interactive Information Retrieval Systems with Users

Foundations and Trends in Information Retrieval
Who are the crowdworkers?: shifting demographics in mechanical turk

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Crowdsourcing document relevance assessment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Analyzing the Amazon Mechanical Turk marketplace

XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A methodology for evaluating aggregated search results

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Explicit search result diversification through sub-queries

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Crowdsourcing for information retrieval: introduction to the special issue

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users' information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.