New trie data structures which support very fast search operations
Journal of Computer and System Sciences
Surpassing the information theoretic bound with fusion trees
Journal of Computer and System Sciences - Special issue: papers from the 22nd ACM symposium on the theory of computing, May 14–16, 1990
Data compression in full-text retrieval systems
Journal of the American Society for Information Science
Optimal bounds for the predecessor problem
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Tight(er) worst-case bounds on dynamic searching and priority queues
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Membership in Constant Time and Almost-Minimum Space
SIAM Journal on Computing
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Mathematics for the Analysis of Algorithms
Mathematics for the Analysis of Algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
IP Address Lookup Made Fast and Simple
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Searching in Compressed Dictionaries
DCC '02 Proceedings of the Data Compression Conference
Compact representations of ordered sets
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Dictionaries using variable-length keys and data, with applications
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Squeezing succinct data structures into entropy bounds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Time-space trade-offs for predecessor search
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Rank and select revisited and extended
Theoretical Computer Science
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Note: On compact representations of All-Pairs-Shortest-Path-Distance matrices
Theoretical Computer Science
Sampled longest common prefix array
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Fast prefix search in little space, with applications
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
E=I+T: The internal extent formula for compacted tries
Information Processing Letters
Proceedings of the 20th international conference on World wide web
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Theory and practice of monotone minimal perfect hashing
Journal of Experimental Algorithmics (JEA)
Improved address-calculation coding of integer arrays
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 5.23 |
In this paper, we propose measures for compressed data structures, in which space usage is measured in a data-aware manner. In particular, we consider the fundamental dictionary problem on set data, where the task is to construct a data structure for representing a set S of n items out of a universe U={0,...,u-1} and supporting various queries on S. We use a well-known data-aware measure for set data called gap to bound the space of our data structures. We describe a novel dictionary structure that requires gap+O(nlog(u/n)/logn)+O(nloglog(u/n)) bits. Under the RAM model, our dictionary supports membership, rank, and predecessor queries in nearly optimal time, matching the time bound of Andersson and Thorup's predecessor structure [A. Andersson, M. Thorup, Tight(er) worst-case bounds on dynamic searching and priority queues, in: ACM Symposium on Theory of Computing, STOC, 2000], while simultaneously improving upon their space usage. We support select queries even faster in O(loglogn) time. Our dictionary structure uses exactly gap bits in the leading term (i.e., the constant factor is 1) and answers queries in near-optimal time. When seen from the worst-case perspective, we present the first O(nlog(u/n))-bit dictionary structure that supports these queries in near-optimal time under the RAM model. We also build a dictionary which requires the same space and supports membership, select, and partial rank queries even more quickly in O(loglogn) time. We go on to show that for many (real-world) datasets, data-aware methods lead to a worthwhile compression over combinatorial methods. To the best of our knowledge, these are the first results that achieve data-aware space usage and retain near-optimal time.