TopX 2.0 at the INEX 2008 Efficiency Track

  • Authors:
  • Martin Theobald;Mohammed Abujarour;Ralf Schenkel

  • Affiliations:
  • Max Planck Institute for Informatics, Saarbrücken, Germany;Hasso Plattner Institute, Potsdam, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany and Saarland University, Saarbrücken, Germany

  • Venue:
  • Advances in Focused Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode--an average of merely 89 ms per CAS query and 49 ms per CO query.