Compressed Data Structures: Dictionaries and Data-Aware Measures

Authors:
Ankur Gupta;Wing-Kai Hon;Rahul Shah;Scott Vitter
Affiliations:
Purdue University;Purdue University;Purdue University;Purdue University
Venue:
DCC '06 Proceedings of the Data Compression Conference
Year:
2006

Citing 0
Cited 7

Compressed Prefix Sums

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Rank and Select for Succinct Data Structures

Electronic Notes in Theoretical Computer Science (ENTCS)
Storage and Retrieval of Individual Genomes

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Interpolative coding of integer sequences supporting log-time random access

Information Processing and Management: an International Journal
Compressed dictionaries: space measures, data sets, and experiments

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
On compressing permutations and adaptive sorting

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose measures for compressed data structures, in which space usage is measured in a data-aware manner. In particular, we consider the fundamental dictionary problem on set data, where the task is to construct a data structure to represent a set S of n items out of a universe U = {0, . . . , u - 1} and support various queries on S. We use a well-known data-aware measure for set data called gap to bound the space of our data structures. We describe a novel dictionary structure taking gap+O(n log(u/n)/ log n)+O(n log log(u/n)) bits. Under the RAM model, our dictionary supports membership, rank, select, and predecessor queries in nearly optimal time, matching the time bound of Andersson and Thorup's predecessor structure [AT00], while simultaneously improving upon their space usage. Our dictionary structure uses exactly gap bits in the leading term (i.e., the constant factor is 1) and answers queries in near-optimal time. When seen from the worst case perspective, we present the first O(n log(u/n))-bit dictionary structure which supports these queries in nearoptimal time under RAM model. We also build a dictionary which requires the same space and supports membership, select, and partial rank queries even more quickly in O(log log n) time. To the best of our knowledge, this is the first of a kind result which achieves data-aware space usage and retains near-optimal time.