Index compression is good, especially for random access

Authors:
Stefan Büttcher;Charles L. A. Clarke
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 17
Cited 13

An introduction to disk drive modeling

Computer
Adding compression to a full-text retrieval system

Software—Practice & Experience
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Interpolation search—a log logN search

Communications of the ACM
Bibliography and reading on CPU cache memories and related topics

ACM SIGARCH Computer Architecture News
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Effect of node size on the performance of cache-conscious B+-trees

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Main Memory Indexing: The Case for BD-Tree

IEEE Transactions on Knowledge and Data Engineering
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines

ACM Computing Surveys (CSUR)

Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
Index compression using 64-bit words

Software—Practice & Experience
On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
An efficient random access inverted index for information retrieval

Proceedings of the 19th international conference on World wide web
An indexing scheme for fast and accurate chemical fingerprint database searching

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Reordering columns for smaller indexes

Information Sciences: an International Journal
Efficient compressed inverted index skipping for disjunctive text-queries

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
SkipBlock: self-indexing for block-based inverted list

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficient phrase querying with flat position index

Proceedings of the 20th ACM international conference on Information and knowledge management
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections

Proceedings of the VLDB Endowment
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
Quasi-succinct indices

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Index compression techniques are known to substantially decrease the storage requirements of a text retrieval system. As a side-effect, they may increase its retrieval performance by reducing disk I/O overhead. Despite this advantage, developers sometimes choose to store index data in uncompressed form, in order to not obstruct random access into each index term's postings list. In this paper, we show that index compression does not harm random access performance. In fact, we demonstrate that, in some cases, random access into a term's postings list may be realized more efficiently if the list is stored in compressed form instead of uncompressed. This is regardless of whether the index is stored on disk or in main memory, since both types of storage - hard drives and RAM - do not support efficient random access in the first place.