Quasi-succinct indices

Authors:
Sebastiano Vigna
Affiliations:
Università degli Studi di Milano, Milano, Italy
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 18
Cited 1

Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Compact pat trees

Compact pat trees
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Exploiting clustering in inverted file Compression

DCC '96 Proceedings of the Conference on Data Compression
Index Compression through Document Reordering

DCC '02 Proceedings of the Data Compression Conference
Assigning document identifiers to enhance compressibility of Web Search Engines indexes

Proceedings of the 2004 ACM symposium on Applied computing
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Time-space trade-offs for predecessor search

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Inverted files for text search engines

ACM Computing Surveys (CSUR)
TSP and cluster-based solutions to the reassignment of document identifiers

Information Retrieval
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Index compression is good, especially for random access

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval

Introduction to Information Retrieval
Compressed Prefix Sums

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Compressing term positions in web indexes

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Broadword implementation of rank/select queries

WEA'08 Proceedings of the 7th international conference on Experimental algorithms

Unicorn: a system for searching the social graph

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller gaps using less bits. Additional data such as counts and positions is stored using similar techniques. A large body of research has been built in the last 30 years around gap compression, including theoretical modeling of the gap distribution, specialized instantaneous codes suitable for gap encoding, and ad hoc document reorderings which increase the efficiency of instantaneous codes. This paper proposes to represent an index using a different architecture based on quasi-succinct representation of monotone sequences. We show that, besides being theoretically elegant and simple, the new index provides expected constant-time operations, space savings, and, in practice, significant performance improvements on conjunctive, phrasal and proximity queries.