Robust transmission of unbounded strings using Fibonacci representations
IEEE Transactions on Information Theory
Supporting random access in files of variable length records
Information Processing Letters
A new data structure for cumulative frequency tables
Software—Practice & Experience
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems (TOIS)
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Exploiting clustering in inverted file Compression
DCC '96 Proceedings of the Conference on Data Compression
XPRESS: a queriable compression for XML data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XGRIND: A Query-Friendly XML Compressor
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Supporting efficient query processing on compressed XML files
Proceedings of the 2005 ACM symposium on Applied computing
Compressed Data Structures: Dictionaries and Data-Aware Measures
DCC '06 Proceedings of the Data Compression Conference
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Tournament Coding of Integer Sequences
The Computer Journal
Directly Addressable Variable-Length Codes
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Run-length encodings (Corresp.)
IEEE Transactions on Information Theory
DACs: Bringing direct access to variable-length codes
Information Processing and Management: an International Journal
Improved address-calculation coding of integer arrays
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
On the compression of search trees
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Sequences of integers are common data types, occurring either as primary data or ancillary structures. The sizes of sequences can be large, making compression an interesting option. Effective compression presupposes variable-length coding, which destroys the regular alignment of values. Yet it would often be desirable to access only a small subset of the entries, either by position (ordinal number) or by content (element value), without having to decode most of the sequence from the start. Here such a random access technique for compressed integers is described, with the special feature that no auxiliary index is needed. The solution applies a method called interpolative coding, which is one of the most efficient non-statistical codes for integers. Indexing is avoided by address calculation guaranteeing sufficient space for codes even in the worst case. The additional redundancy, compared to regular interpolative coding, is only about 1bit per source integer for uniform distribution. The time complexity of random access is logarithmic with respect to the source size for both position-based and content-based retrieval. According to experiments, random access is faster than full decoding when the number of accessed integers is not more than approximately 0.75.n/log"2n for sequence length n. The tests also confirm that the method is quite competitive with other approaches to random access coding, suggested in the literature.