MapReduce for information retrieval evaluation: "let's quickly test this on 12 TB of data"

  • Authors:
  • Djoerd Hiemstra;Claudia Hauff

  • Affiliations:
  • University of Twente, The Netherlands;University of Twente, The Netherlands

  • Venue:
  • CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net