Distributed perfect hashing for very large key sets

Authors:
Fabiano C. Botelho;Daniel Galinkin;Wagner Meira, Jr.;Nivio Ziviani
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the 3rd international conference on Scalable information systems
Year:
2008

Citing 10
Cited 1

Parallel computing (2nd ed.): theory and practice

Parallel computing (2nd ed.): theory and practice
Memory management during run generation in external sorting

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Perfect hashing schemes for mining traversal patterns

Fundamenta Informaticae
Perfect spatial hashing

ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules

The Computer Journal
External perfect hashing for very large key sets

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A practical minimal perfect hashing method

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Simple and space-efficient minimal perfect hash functions

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Practical perfect hashing in nearly optimal space

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A perfect hash function (PHF) h: S → [0, m -- 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.