Modeling word occurrences for the compression of concordances
ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Generating a canonical prefix encoding
Communications of the ACM
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Information Retrieval
Performance of Hardware Compressed Main Memory
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Algorithms
Variable-length Codes for Data Compression
Variable-length Codes for Data Compression
On placing skips optimally in expectation
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Composite hashing with multiple information sources
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
SIMD-based decoding of posting lists
Proceedings of the 20th ACM international conference on Information and knowledge management
Searching web data: An entity retrieval and high-performance indexing model
Web Semantics: Science, Services and Agents on the World Wide Web
VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal
Proceedings of the 15th International Conference on Extending Database Technology
Efficient query recommendations in the long tail via center-piece subgraphs
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Optimizing top-k document retrieval strategies for block-max indexes
Proceedings of the sixth ACM international conference on Web search and data mining
Semantic hashing using tags and topic modeling
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Encoding lists of integers efficiently is important for many applications in different fields. Adjacency lists of large graphs are usually encoded to save space and to improve decoding speed. Inverted indexes of Information Retrieval systems keep the lists of postings compressed in order to exploit the memory hierarchy. Secondary indexes of DBMSs are stored similarly to inverted indexes in IR systems. In this paper we propose Vector of Splits Encoding (VSEncoding), a novel class of encoders that work by optimally partitioning a list of integers into blocks which are efficiently compressed by using simple encoders. In previous works heuristics were applied during the partitioning step. Instead, we find the optimal solution by using a dynamic programming approach. Experiments show that our class of encoders outperform all the existing methods in literature by more than 10% (with the exception of Binary Interpolative Coding with which they, roughly, tie) still retaining a very fast decompression algorithm.