Monotone minimal perfect hashing: searching a sorted table with O(1) accesses

Authors:
Djamal Belazzougui;Paolo Boldi;Rasmus Pagh;Sebastiano Vigna
Affiliations:
Institut National d'Informatique, Oued Smar, Algiers, Algeria;Università degli Studi di Milano, Italy;IT University of Copenhagen, Denmark;Università degli Studi di Milano, Italy
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 20
Cited 15

Storing a Sparse Table with 0(1) Worst Case Access Time

Journal of the ACM (JACM)
The spatial complexity of oblivious k-probe Hash functions

SIAM Journal on Computing
Order-preserving minimal perfect hash functions and information retrieval

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Nonoblivious hashing

Journal of the ACM (JACM)
Implicit O(1) probe search

SIAM Journal on Computing
The effect of table expansion on the program complexity of perfect hash functions

BIT
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Locality preserving dictionaries: theory & application to clustering in databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Should Tables Be Sorted?

Journal of the ACM (JACM)
Cache oblivious search trees via binary trees of small height

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with Constant Query Time

SIAM Journal on Computing
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Exact and approximate membership testers

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
A locality-preserving cache-oblivious dynamic dictionary

Journal of Algorithms
Compressed data structures: Dictionaries and data-aware measures

Theoretical Computer Science
On the program size of perfect and universal hash functions

SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
The program complexity of searching a table

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I

Distributed indexing for semantic search

Proceedings of the 3rd International Semantic Search Workshop
Optimal trade-offs for succinct string indexes

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Fast prefix search in little space, with applications

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Flash memory efficient LTL model checking

Science of Computer Programming
Dynamic z-fast tries

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
Theory and practice of monotone minimal perfect hashing

Journal of Experimental Algorithmics (JEA)
Order-preserving encryption revisited: improved security analysis and alternative solutions

CRYPTO'11 Proceedings of the 31st annual conference on Advances in cryptology
Alphabet-independent compressed text indexing

ESA'11 Proceedings of the 19th European conference on Algorithms
SILT: a memory-efficient, high-performance key-value store

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Improved compressed indexes for full-text document retrieval

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Secure and Fast Aggregation of Financial Data in Cloud-Based Expense Tracking Applications

Journal of Network and Systems Management
New lower and upper bounds for representing sequences

ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Improved compressed indexes for full-text document retrieval

Journal of Discrete Algorithms
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A minimal perfect hash function maps a set S of n keys into the set {0, 1,..., n -- 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicographical ordering of the keys. A monotone minimal perfect hash function can be seen as a very weak form of index that provides ranking just on the set S (and answers randomly outside of S). Our goal is to minimise the description size of the hash function: we show that, for a set S of n elements out of a universe of 2w elements, O(n log log w) bits are sufficient to hash monotonically with evaluation time O(log w). Alternatively, we can get space O(n log w) bits with O(1) query time. Both of these data structures improve a straightforward construction with O(n log w) space and O(log w) query time. As a consequence, it is possible to search a sorted table with O(1) accesses to the table (using additional O(n log log w) bits). Our results are based on a structure (of independent interest) that represents a trie in a very compact way, but admits errors. As a further application of the same structure, we show how to compute the predecessor (in the sorted order of S) of an arbitrary element, using O(1) accesses in expectation and an index of O(n log w) bits, improving the trivial result of O(n w) bits. This implies an efficient index for searching a blocked memory.