Hybrid index maintenance for contiguous inverted lists

Authors:
Stefan Büttcher;Charles L. Clarke
Affiliations:
Google Inc., Mountain View, USA;University of Waterloo, Waterloo, Canada
Venue:
Information Retrieval
Year:
2008

Citing 18
Cited 5

Optimization for dynamic inverted index maintenance

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Synthetic workload performance analysis of incremental updates

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental updates of inverted lists for text document retrieval

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Compression and fast indexing for multi-gigabyte text databases

Australian Computer Journal
In situ generation of compressed inverted files

Journal of the American Society for Information Science
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Incremental Indexing for Full-Text Information Retrieval

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
A statistics-based approach to incrementally update inverted files

Information Processing and Management: an International Journal
Fast on-line index construction by geometric partitioning

Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists

Information Processing and Management: an International Journal
Hybrid index maintenance for growing text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-user file system search

Multi-user file system search
A hybrid approach to index maintenance in dynamic text retrieval systems

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Low-cost management of inverted files for online full-text search

Proceedings of the 18th ACM conference on Information and knowledge management
Scalable, statistical storage allocation for extensible inverted file construction

Journal of Systems and Software
Fast construction of the HYB index

ACM Transactions on Information Systems (TOIS)
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
Index maintenance for time-travel text search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories: merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query performance, but substantially better update performance. The actual time savings achievable depend on the size of the text collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.