Minimal perfect hashing: A competitive method for indexing internal memory

Authors:
Fabiano C. Botelho;Anísio Lacerda;Guilherme Vale Menezes;Nivio Ziviani
Affiliations:
Data Domain an EMC Company, Santa Clara, USA;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 28
Cited 5

An optimal algorithm for generating minimal perfect hash functions

Information Processing Letters
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Perfect hashing

Theoretical Computer Science
The exponential hash function

Journal of Experimental Algorithmics (JEA)
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Analysis of the Search Performance of Coalesced Hashing

Journal of the ACM (JACM)
Linear hash functions

Journal of the ACM (JACM)
The processor-memory bottleneck: problems and solutions

Crossroads - Computer architecture
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Pseudochaining in hash tables

Communications of the ACM
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with Constant Query Time

SIAM Journal on Computing
Efficient Minimal Perfect Hashing in Nearly Minimal Space

STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
Fast Hashing on the Pentium

CRYPTO '96 Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology
Optimizing database architecture for the new bottleneck: memory access

The VLDB Journal — The International Journal on Very Large Data Bases
Cuckoo hashing

Journal of Algorithms
Beyond Relational Databases

Queue - Databases
Architecture-conscious hashing

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Lock-free dynamic hash tables with open addressing

Distributed Computing - Special issue: PODC 02
Split-ordered lists: Lock-free extensible hash tables

Journal of the ACM (JACM)
External perfect hashing for very large key sets

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hopscotch Hashing

DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Indexing internal memory with minimal perfect hash functions

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Addressing for random-access storage

IBM Journal of Research and Development
Cache-Conscious collision resolution in string hash tables

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Simple and space-efficient minimal perfect hash functions

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

SILT: a memory-efficient, high-performance key-value store

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Practical perfect hashing in nearly optimal space

Information Systems
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Memory efficient sanitization of a deduplicated storage system

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Consistency analysis on orientation features for fast and accurate palmprint identification

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. Until recently, the amount of space to store an MPHF description for practical implementations found in the literature was O(logn) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs presented in the literature changed this scenario: an MPHF can now be described by approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are, after the new recent results, a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with other open addressing and chaining hash schemes such as linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing, sparse hashing, hopscotch hashing, chaining with move to front heuristic and exact fit. We considered lookup time for successful and unsuccessful searches in two scenarios: (i) the MPHF description fits in the CPU cache and (ii) the MPHF description does not fit entirely in the CPU cache. Considering lookup time, the minimal perfect hashing outperforms the other hashing schemes in the two scenarios and, in the first scenario, the performance is better even when the compared methods leave more than 80% of the hash table entries free. Considering space overhead (the amount of used space other than the key-value pairs), the minimal perfect hashing is within a factor of O(logn) bits lower than the other hashing schemes for both scenarios.