VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming

Authors:
Fabrizio Silvestri;Rossano Venturini
Affiliations:
ISTI-CNR, Pisa, Italy;ISTI-CNR, Pisa, Italy
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 16
Cited 11

Modeling word occurrences for the compression of concordances

ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Generating a canonical prefix encoding

Communications of the ACM
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
Compressing Inverted Files

Information Retrieval
Performance of Hardware Compressed Main Memory

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Assigning identifiers to documents to enhance the clustering property of fulltext indexes

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved Word-Aligned Binary Compression for Text Indexing

IEEE Transactions on Knowledge and Data Engineering
Algorithms

Algorithms
Variable-length Codes for Data Compression

Variable-length Codes for Data Compression
On placing skips optimally in expectation

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Sorting out the document identifier assignment problem

ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval

Composite hashing with multiple information sources

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
SIMD-based decoding of posting lists

Proceedings of the 20th ACM international conference on Information and knowledge management
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal

Proceedings of the 15th International Conference on Extending Database Technology
Efficient query recommendations in the long tail via center-piece subgraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Optimizing top-k document retrieval strategies for block-max indexes

Proceedings of the sixth ACM international conference on Web search and data mining
Semantic hashing using tags and topic modeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Document identifier reassignment and run-length-compressed inverted indexes for improved search performance

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Encoding lists of integers efficiently is important for many applications in different fields. Adjacency lists of large graphs are usually encoded to save space and to improve decoding speed. Inverted indexes of Information Retrieval systems keep the lists of postings compressed in order to exploit the memory hierarchy. Secondary indexes of DBMSs are stored similarly to inverted indexes in IR systems. In this paper we propose Vector of Splits Encoding (VSEncoding), a novel class of encoders that work by optimally partitioning a list of integers into blocks which are efficiently compressed by using simple encoders. In previous works heuristics were applied during the partitioning step. Instead, we find the optimal solution by using a dynamic programming approach. Experiments show that our class of encoders outperform all the existing methods in literature by more than 10% (with the exception of Binary Interpolative Coding with which they, roughly, tie) still retaining a very fast decompression algorithm.