Less hashing, same performance: building a better bloom filter

Authors:
Adam Kirsch;Michael Mitzenmacher
Affiliations:
Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA;Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA
Venue:
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Year:
2006

Citing 9
Cited 19

More analysis of double hashing

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Practical performance of Bloom filters and parallel free-text searching

Communications of the ACM
The analysis of closed hashing under limited randomness

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Balls and bins: a study in negative dependence

Random Structures & Algorithms
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Compressed bloom filters

IEEE/ACM Transactions on Networking (TON)
An optimal Bloom filter replacement

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
On the false-positive rate of Bloom filters

Information Processing Letters

Building high accuracy bloom filters using partitioned hashing

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Why simple hash functions work: exploiting the entropy in a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Simple summaries for hashing with choices

IEEE/ACM Transactions on Networking (TON)
Bloom filter based routing for content-based publish/subscribe

Proceedings of the second international conference on Distributed event-based systems
Optimizing data popularity conscious bloom filters

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell

PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
A sequential indexing scheme for flash-based embedded systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Towards a new generation of information-oriented internetworking architectures

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Cache-, hash-, and space-efficient bloom filters

Journal of Experimental Algorithmics (JEA)
A 1 cycle-per-byte XML parsing accelerator

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Cache-, hash- and space-efficient bloom filters

WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Removing the redundancy from distributed semantic web data

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Protecting against DNS reflection attacks with Bloom filters

DIMVA'11 Proceedings of the 8th international conference on Detection of intrusions and malware, and vulnerability assessment
bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
PBFilter: A flash-based indexing scheme for embedded systems

Information Systems
Duplicate detection in pay-per-click streams using temporal stateful Bloom filters

International Journal of Data Analysis Techniques and Strategies
Space-efficient and exact de bruijn graph representation based on a bloom filter

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Toward intersection filter-based optimization for joins in MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence
Sketching for big data recommender systems using fast pseudo-random fingerprints

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.