Parallel computing (2nd ed.): theory and practice
Parallel computing (2nd ed.): theory and practice
Memory management during run generation in external sorting
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Perfect hashing schemes for mining traversal patterns
Fundamenta Informaticae
ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules
The Computer Journal
External perfect hashing for very large key sets
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A practical minimal perfect hashing method
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Simple and space-efficient minimal perfect hash functions
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Practical perfect hashing in nearly optimal space
Information Systems
Hi-index | 0.00 |
A perfect hash function (PHF) h: S → [0, m -- 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.