ChatNoir: a search engine for the ClueWeb09 corpus

  • Authors:
  • Martin Potthast;Matthias Hagen;Benno Stein;Jan Graßegger;Maximilian Michel;Martin Tippmann;Clement Welsch

  • Affiliations:
  • Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F information retrieval model including PageRank and spam likelihood. The search engine is scalable and returns the first results within three seconds, which is significantly faster than Indri. A convenient API allows for implementing reproducible experiments based on retrieving documents from the ClueWeb09 corpus. The search engine has successfully accomplished a load test involving 100,000 queries.