TopX 2.0 at the INEX 2008 Efficiency Track

Authors:
Martin Theobald;Mohammed Abujarour;Ralf Schenkel
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany;Hasso Plattner Institute, Potsdam, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany and Saarland University, Saarbrücken, Germany
Venue:
Advances in Focused Retrieval
Year:
2009

Citing 0
Cited 1

TopX 2.0 at the INEX 2009 ad-hoc and efficiency tracks: distributed indexing for top-k-style content-and-structure retrieval

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode--an average of merely 89 ms per CAS query and 49 ms per CO query.