A time machine for text search

Authors:
Klaus Berberich;Srikanta Bedathur;Thomas Neumann;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 23
Cited 32

Versioning a full-text information retrieval system

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Comparison of access methods for time-evolving data

ACM Computing Surveys (CSUR)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Algorithms: A Creative Approach

Introduction to Algorithms: A Creative Approach
Modern Information Retrieval

Modern Information Retrieval
An Online Algorithm for Segmenting Time Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Coalescing in Temporal Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
Algorithm Design

Algorithm Design
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning strategies for mixed-mode querying

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient search in large textual collections with redundancy

Proceedings of the 16th international conference on World Wide Web
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing of versioned document sequences

ECIR'07 Proceedings of the 29th European conference on IR research
Indexing shared content in information retrieval systems

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

FluxCapacitor: efficient time-travel text search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A visual-analytic toolkit for dynamic interaction graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Temporal Language Models for Determining Time of Non-timestamped Documents

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Zoetrope: interacting with the ephemeral web

Proceedings of the 21st annual ACM symposium on User interface software and technology
Metadata domain-knowledge driven search engine in "HyperManyMedia" E-learning resources

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Parsimonious temporal aggregation

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
EverLast: a distributed architecture for preserving the web

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
How to Trace and Revise Identities

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Compact full-text indexing of versioned document collections

Proceedings of the 18th ACM conference on Information and knowledge management
Leveraging temporal dynamics of document content in relevance ranking

Proceedings of the third ACM international conference on Web search and data mining
Metadata as seeds for building an ontology driven information retrieval system

International Journal of Hybrid Intelligent Systems
A pocket guide to web history

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Durable top-k search in document archives

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Exploiting time-based synonyms in searching document archives

Proceedings of the 10th annual joint conference on Digital libraries
Efficient temporal keyword search over versioned text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Predicate-based indexing for desktop search

The VLDB Journal — The International Journal on Very Large Data Bases
Determining time of queries for re-ranking search results

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
QUEST: query expansion using synonyms over time

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
InZeit: efficiently identifying insightful time points

Proceedings of the VLDB Endowment
Hybrid index structures for temporal-textual web search

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Temporal index sharding for space-time efficiency in archive search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster temporal range queries over versioned text

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
NTLM: a time-enhanced language model based ranking approach for web search

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
On relevance, time and query expansion

Proceedings of the 20th ACM international conference on Information and knowledge management
Parsimonious temporal aggregation

The VLDB Journal — The International Journal on Very Large Data Bases
Index maintenance for time-travel text search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Optimizing positional index structures for versioned document collections

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling geographic, temporal, and proximity contexts for improving geotemporal search

Journal of the American Society for Information Science and Technology
A survey of web archive search architectures

Proceedings of the 22nd international conference on World Wide Web companion
A survey of temporal web search experience

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text search over temporally versioned document collections such as web archives has received little attention as a research problem. As a consequence, there is no scalable and principled solution to search such a collection as of a specified time. In this work, we address this shortcoming and propose an efficient solution for time-travel text search by extending the inverted file index to make it ready for temporal search. We introduce approximate temporal coalescing as a tunable method to reduce the index size without significantly affecting the quality of results. In order to further improve the performance of time-travel queries, we introduce two principled techniques to trade off index size for its performance. These techniques can be formulated as optimization problems that can be solved to near-optimality. Finally, our approach is evaluated in a comprehensive series of experiments on two large-scale real-world datasets. Results unequivocally show that our methods make it possible to build an efficient "time machine" scalable to large versioned text collections.