Efficient temporal keyword search over versioned text

Authors:
Avishek Anand;Srikanta Bedathur;Klaus Berberich;Ralf Schenkel
Affiliations:
Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 10
Cited 4

Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The budgeted maximum coverage problem

Information Processing Letters
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The LHAM log-structured history data access method

The VLDB Journal — The International Journal on Very Large Data Bases
An asymptotically optimal multiversion B-tree

The VLDB Journal — The International Journal on Very Large Data Bases
Join operations in temporal databases

The VLDB Journal — The International Journal on Very Large Data Bases
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A time machine for text search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
FluxCapacitor: efficient time-travel text search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On the value of temporal information in information retrieval

ACM SIGIR Forum

Temporal index sharding for space-time efficiency in archive search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster temporal range queries over versioned text

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Index maintenance for time-travel text search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Optimizing positional index structures for versioned document collections

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern text analytics applications operate on large volumes of temporal text data such as Web archives, newspaper archives, blogs, wikis, and micro-blogs. In these settings, searching and mining needs to use constraints on the time dimension in addition to keyword constraints. A natural approach to address such queries is using an inverted index whose entries are enriched with valid-time intervals. It has been shown that these indexes have to be partitioned along time in order to achieve efficiency. However, when the temporal predicate corresponds to a long time range, requiring the processing of multiple partitions, naive query processing incurs high cost of reading of redundant entries across partitions. We present a framework for efficient approximate processing of keyword queries over a temporally partitioned inverted index which minimizes this overhead, thus speeding up query processing. By using a small synopsis for each partition we identify partitions that maximize the number of final non-redundant results, and schedule them for processing early on. Our approach aims to balance the estimated gains in the final result recall against the cost of index reading required. We present practical algorithms for the resulting optimization problem of index partition selection. Our experiments with three diverse, large-scale text archives reveal that our proposed approach can provide close to 80% result recall even when only about half the index is allowed to be read.