Versioning a full-text information retrieval system
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Wave-indices: indexing evolving databases
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The LHAM log-structured history data access method
The VLDB Journal — The International Journal on Very Large Data Bases
An asymptotically optimal multiversion B-tree
The VLDB Journal — The International Journal on Very Large Data Bases
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
A time machine for text search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the value of temporal information in information retrieval
ACM SIGIR Forum
Introduction to Information Retrieval
Introduction to Information Retrieval
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient indexing of versioned document sequences
ECIR'07 Proceedings of the 29th European conference on IR research
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Efficient temporal keyword search over versioned text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Index maintenance for time-travel text search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the web archives or wikis. Typical approaches for efficient evaluation of these queries involve slicing either the entire collection [20] or individual index lists [10] along the time-axis. Both these methods are not satisfactory since they sacrifice compactness of index for processing efficiency making them either too big or, otherwise, too slow. We present a novel index organization scheme that shards each index list with almost zero increase in index size but still minimizes the cost of reading index entries during query processing. Based on the optimal sharding thus btained, we develop a practically efficient sharding that takes into account the different costs of random and sequential accesses. Our algorithm merges shards from the optimal solution to allow for a few extra sequential accesses while gaining significantly by reducing the number of random accesses. We empirically establish the effectiveness of our sharding scheme with experiments over the revision history of the English Wikipedia between 2001-2005 (approx 700 GB) and an archive of U.K. governmental web sites (approx 400 GB). Our results demonstrate the feasibility of faster time-travel query processing with no space overhead.