A time machine for text search

  • Authors:
  • Klaus Berberich;Srikanta Bedathur;Thomas Neumann;Gerhard Weikum

  • Affiliations:
  • Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text search over temporally versioned document collections such as web archives has received little attention as a research problem. As a consequence, there is no scalable and principled solution to search such a collection as of a specified time. In this work, we address this shortcoming and propose an efficient solution for time-travel text search by extending the inverted file index to make it ready for temporal search. We introduce approximate temporal coalescing as a tunable method to reduce the index size without significantly affecting the quality of results. In order to further improve the performance of time-travel queries, we introduce two principled techniques to trade off index size for its performance. These techniques can be formulated as optimization problems that can be solved to near-optimality. Finally, our approach is evaluated in a comprehensive series of experiments on two large-scale real-world datasets. Results unequivocally show that our methods make it possible to build an efficient "time machine" scalable to large versioned text collections.