The input/output complexity of sorting and related problems
Communications of the ACM
The spatial complexity of oblivious k-probe Hash functions
SIAM Journal on Computing
Practical minimal perfect hash functions for large databases
Communications of the ACM
A faster algorithm for constructing minimal perfect hash functions
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for covering complete uniform hypergraphs
Information Processing Letters
An optimal algorithm for generating minimal perfect hash functions
Information Processing Letters
Randomized algorithms
A versatile data structure for edge-oriented graph algorithms
Communications of the ACM
Theoretical Computer Science
Memory management during run generation in external sorting
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Journal of the ACM (JACM)
External memory algorithms and data structures
External memory algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with Constant Query Time
SIAM Journal on Computing
Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions
WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Sorting and Searching on the Word RAM
STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
Efficient Minimal Perfect Hashing in Nearly Minimal Space
STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
Polynomial Hash Functions Are Reliable (Extended Abstract)
ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Simple Minimal Perfect Hashing in Less Space
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Almost random graphs with simple hash functions
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
The pure literal rule threshold and cores in random hypergraphs
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules
The Computer Journal
External perfect hashing for very large key sets
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Distributed perfect hashing for very large key sets
Proceedings of the 3rd international conference on Scalable information systems
Applications of a Splitting Trick
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Minimal perfect hashing: A competitive method for indexing internal memory
Information Sciences: an International Journal
Balanced allocation and dictionaries with tightly packed constant size bins
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Cores of random r-partite hypergraphs
Information Processing Letters
A practical minimal perfect hashing method
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Maintaining external memory efficient hash tables
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Perfect Hashing Schemes for Mining Traversal Patterns
Fundamenta Informaticae
Simple and space-efficient minimal perfect hash functions
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Cores of random r-partite hypergraphs
Information Processing Letters
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
A hash function is a mapping from a key universe U to a range of integers, i.e., h:U@?{0,1,...,m-1}, where m is the range's size. A perfect hash function for some set S@?U is a hash function that is one-to-one on S, where m=|S|. A minimal perfect hash function for some set S@?U is a perfect hash function with a range of minimum size, i.e., m=|S|. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n+o(n) to 3.3n+o(n) bits, and for m=1.23n it ranges from 1.95n+o(n) to 2.7n+o(n) bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46min using a commodity PC. An open source implementation of the algorithms is available at http://cmph.sf.net under the GNU Lesser General Public License (LGPL).