Minimal perfect hashing: A competitive method for indexing internal memory

  • Authors:
  • Fabiano C. Botelho;Anísio Lacerda;Guilherme Vale Menezes;Nivio Ziviani

  • Affiliations:
  • Data Domain an EMC Company, Santa Clara, USA;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. Until recently, the amount of space to store an MPHF description for practical implementations found in the literature was O(logn) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs presented in the literature changed this scenario: an MPHF can now be described by approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are, after the new recent results, a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with other open addressing and chaining hash schemes such as linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing, sparse hashing, hopscotch hashing, chaining with move to front heuristic and exact fit. We considered lookup time for successful and unsuccessful searches in two scenarios: (i) the MPHF description fits in the CPU cache and (ii) the MPHF description does not fit entirely in the CPU cache. Considering lookup time, the minimal perfect hashing outperforms the other hashing schemes in the two scenarios and, in the first scenario, the performance is better even when the compared methods leave more than 80% of the hash table entries free. Considering space overhead (the amount of used space other than the key-value pairs), the minimal perfect hashing is within a factor of O(logn) bits lower than the other hashing schemes for both scenarios.