Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Synthetic workload performance analysis of incremental updates
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Compression and fast indexing for multi-gigabyte text databases
Australian Computer Journal
In situ generation of compressed inverted files
Journal of the American Society for Information Science
In-memory hash tables for accumulating text vocabularies
Information Processing Letters
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
A statistics-based approach to incrementally update inverted files
Information Processing and Management: an International Journal
Fast on-line index construction by geometric partitioning
Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
Hybrid index maintenance for growing text collections
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-user file system search
A hybrid approach to index maintenance in dynamic text retrieval systems
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Low-cost management of inverted files for online full-text search
Proceedings of the 18th ACM conference on Information and knowledge management
Scalable, statistical storage allocation for extensible inverted file construction
Journal of Systems and Software
Fast construction of the HYB index
ACM Transactions on Information Systems (TOIS)
Searching web data: An entity retrieval and high-performance indexing model
Web Semantics: Science, Services and Agents on the World Wide Web
Index maintenance for time-travel text search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories: merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query performance, but substantially better update performance. The actual time savings achievable depend on the size of the text collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.