Interpolative coding of integer sequences supporting log-time random access

Authors:
J. Teuhola
Affiliations:
Department of Information Technology, University of Turku, Finland
Venue:
Information Processing and Management: an International Journal
Year:
2011

Citing 19
Cited 3

Robust transmission of unbounded strings using Fibonacci representations

IEEE Transactions on Information Theory
Supporting random access in files of variable length records

Information Processing Letters
A new data structure for cumulative frequency tables

Software—Practice & Experience
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
A text compression scheme that allows fast searching directly in the compressed file

ACM Transactions on Information Systems (TOIS)
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
Exploiting clustering in inverted file Compression

DCC '96 Proceedings of the Conference on Data Compression
XPRESS: a queriable compression for XML data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XGRIND: A Query-Friendly XML Compressor

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Supporting efficient query processing on compressed XML files

Proceedings of the 2005 ACM symposium on Applied computing
Compressed Data Structures: Dictionaries and Data-Aware Measures

DCC '06 Proceedings of the Data Compression Conference
Improved Word-Aligned Binary Compression for Text Indexing

IEEE Transactions on Knowledge and Data Engineering
Tournament Coding of Integer Sequences

The Computer Journal
Directly Addressable Variable-Length Codes

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compact set representation for information retrieval

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Run-length encodings (Corresp.)

IEEE Transactions on Information Theory

DACs: Bringing direct access to variable-length codes

Information Processing and Management: an International Journal
Improved address-calculation coding of integer arrays

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
On the compression of search trees

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequences of integers are common data types, occurring either as primary data or ancillary structures. The sizes of sequences can be large, making compression an interesting option. Effective compression presupposes variable-length coding, which destroys the regular alignment of values. Yet it would often be desirable to access only a small subset of the entries, either by position (ordinal number) or by content (element value), without having to decode most of the sequence from the start. Here such a random access technique for compressed integers is described, with the special feature that no auxiliary index is needed. The solution applies a method called interpolative coding, which is one of the most efficient non-statistical codes for integers. Indexing is avoided by address calculation guaranteeing sufficient space for codes even in the worst case. The additional redundancy, compared to regular interpolative coding, is only about 1bit per source integer for uniform distribution. The time complexity of random access is logarithmic with respect to the source size for both position-based and content-based retrieval. According to experiments, random access is faster than full decoding when the number of accessed integers is not more than approximately 0.75.n/log"2n for sequence length n. The tests also confirm that the method is quite competitive with other approaches to random access coding, suggested in the literature.