External perfect hashing for very large key sets

Authors:
Fabiano C. Botelho;Nivio Ziviani
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 21
Cited 14

Storing a Sparse Table with 0(1) Worst Case Access Time

Journal of the ACM (JACM)
The spatial complexity of oblivious k-probe Hash functions

SIAM Journal on Computing
Practical minimal perfect hash functions for large databases

Communications of the ACM
A faster algorithm for constructing minimal perfect hash functions

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for covering complete uniform hypergraphs

Information Processing Letters
An optimal algorithm for generating minimal perfect hash functions

Information Processing Letters
Perfect hashing

Theoretical Computer Science
Memory management during run generation in external sorting

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Linear hash functions

Journal of the ACM (JACM)
Even strongly universal hashing is pretty fast

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Optimizing database architecture for the new bottleneck: memory access

The VLDB Journal — The International Journal on Very Large Data Bases
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Beyond Relational Databases

Queue - Databases
An optimal Bloom filter replacement

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Perfect hashing schemes for mining traversal patterns

Fundamenta Informaticae
Perfect spatial hashing

ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules

The Computer Journal
A practical minimal perfect hashing method

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Maintaining external memory efficient hash tables

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Simple and space-efficient minimal perfect hash functions

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Semi-external LTL Model Checking

CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
Distributed perfect hashing for very large key sets

Proceedings of the 3rd international conference on Scalable information systems
Long term data storage issues for situational awareness

Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Perfect hashing for state spaces in BDD representation

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Privacy-preserving queries over relational databases

PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Searching

Algorithms and theory of computation handbook
Flash memory efficient LTL model checking

Science of Computer Programming
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
External memory breadth-first search with delayed duplicate detection on the GPU

MoChArt'10 Proceedings of the 6th international conference on Model checking and artificial intelligence
Coherent parallel hashing

Proceedings of the 2011 SIGGRAPH Asia Conference
Theory and practice of monotone minimal perfect hashing

Journal of Experimental Algorithmics (JEA)
Practical perfect hashing in nearly optimal space

Information Systems
Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Document vector representations for feature extraction in multi-stage document ranking

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous "practical" methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,...,2n - 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions.