Versioning a full-text information retrieval system
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The log-structured merge-tree (LSM-tree)
Acta Informatica
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Comparison of access methods for time-evolving data
ACM Computing Surveys (CSUR)
The LHAM log-structured history data access method
The VLDB Journal — The International Journal on Very Large Data Bases
An asymptotically optimal multiversion B-tree
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A time machine for text search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid index maintenance for contiguous inverted lists
Information Retrieval
Efficient online index construction for text databases
ACM Transactions on Database Systems (TODS)
Introduction to Information Retrieval
Introduction to Information Retrieval
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
On-line index maintenance using horizontal partitioning
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient indexing of versioned document sequences
ECIR'07 Proceedings of the 29th European conference on IR research
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Efficient temporal keyword search over versioned text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Temporal index sharding for space-time efficiency in archive search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A survey of web archive search architectures
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that efficiently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.