Storing a Sparse Table with 0(1) Worst Case Access Time
Journal of the ACM (JACM)
The spatial complexity of oblivious k-probe Hash functions
SIAM Journal on Computing
Practical minimal perfect hash functions for large databases
Communications of the ACM
A faster algorithm for constructing minimal perfect hash functions
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for covering complete uniform hypergraphs
Information Processing Letters
An optimal algorithm for generating minimal perfect hash functions
Information Processing Letters
Theoretical Computer Science
Memory management during run generation in external sorting
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Journal of the ACM (JACM)
Even strongly universal hashing is pretty fast
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Optimizing database architecture for the new bottleneck: memory access
The VLDB Journal — The International Journal on Very Large Data Bases
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Queue - Databases
An optimal Bloom filter replacement
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Perfect hashing schemes for mining traversal patterns
Fundamenta Informaticae
ACM SIGGRAPH 2006 Papers
Perfect Hashing Schemes for Mining Association Rules
The Computer Journal
A practical minimal perfect hashing method
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Maintaining external memory efficient hash tables
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Simple and space-efficient minimal perfect hash functions
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Semi-external LTL Model Checking
CAV '08 Proceedings of the 20th international conference on Computer Aided Verification
Distributed perfect hashing for very large key sets
Proceedings of the 3rd international conference on Scalable information systems
Long term data storage issues for situational awareness
Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Perfect hashing for state spaces in BDD representation
KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Privacy-preserving queries over relational databases
PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Algorithms and theory of computation handbook
Flash memory efficient LTL model checking
Science of Computer Programming
Minimal perfect hashing: A competitive method for indexing internal memory
Information Sciences: an International Journal
External memory breadth-first search with delayed duplicate detection on the GPU
MoChArt'10 Proceedings of the 6th international conference on Model checking and artificial intelligence
Proceedings of the 2011 SIGGRAPH Asia Conference
Theory and practice of monotone minimal perfect hashing
Journal of Experimental Algorithmics (JEA)
Practical perfect hashing in nearly optimal space
Information Systems
Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.00 |
We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous "practical" methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,...,2n - 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions.