Practical perfect hashing in nearly optimal space

Authors:
Fabiano C. Botelho;Rasmus Pagh;Nivio Ziviani
Affiliations:
Data Domain an EMC Company, Santa Clara, USA;IT University of Copenhagen, Denmark;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Information Systems
Year:
2013

Citing 35
Cited 2

The input/output complexity of sorting and related problems

Communications of the ACM
The spatial complexity of oblivious k-probe Hash functions

SIAM Journal on Computing
Practical minimal perfect hash functions for large databases

Communications of the ACM
A faster algorithm for constructing minimal perfect hash functions

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for covering complete uniform hypergraphs

Information Processing Letters
An optimal algorithm for generating minimal perfect hash functions

Information Processing Letters
Randomized algorithms

Randomized algorithms
A versatile data structure for edge-oriented graph algorithms

Communications of the ACM
Perfect hashing

Theoretical Computer Science
Memory management during run generation in external sorting

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Linear hash functions

Journal of the ACM (JACM)
External memory algorithms and data structures

External memory algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with Constant Query Time

SIAM Journal on Computing
Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions

WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Sorting and Searching on the Word RAM

STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
Efficient Minimal Perfect Hashing in Nearly Minimal Space

STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Simple Minimal Perfect Hashing in Less Space

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Almost random graphs with simple hash functions

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
The pure literal rule threshold and cores in random hypergraphs

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Perfect spatial hashing

ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules

The Computer Journal
External perfect hashing for very large key sets

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Distributed perfect hashing for very large key sets

Proceedings of the 3rd international conference on Scalable information systems
Applications of a Splitting Trick

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
Balanced allocation and dictionaries with tightly packed constant size bins

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Cores of random r-partite hypergraphs

Information Processing Letters
A practical minimal perfect hashing method

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Maintaining external memory efficient hash tables

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Perfect Hashing Schemes for Mining Traversal Patterns

Fundamenta Informaticae
Simple and space-efficient minimal perfect hash functions

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Cores of random r-partite hypergraphs

Information Processing Letters
Memory efficient sanitization of a deduplicated storage system

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

A hash function is a mapping from a key universe U to a range of integers, i.e., h:U@?{0,1,...,m-1}, where m is the range's size. A perfect hash function for some set S@?U is a hash function that is one-to-one on S, where m=|S|. A minimal perfect hash function for some set S@?U is a perfect hash function with a range of minimum size, i.e., m=|S|. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n+o(n) to 3.3n+o(n) bits, and for m=1.23n it ranges from 1.95n+o(n) to 2.7n+o(n) bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46min using a commodity PC. An open source implementation of the algorithms is available at http://cmph.sf.net under the GNU Lesser General Public License (LGPL).