A new analysis of the false positive rate of a Bloom filter

Authors:
Ken Christensen;Allen Roginsky;Miguel Jimeno
Affiliations:
Department of Computer Science and Engineering, University of South Florida, 4202 East Fowler Avenue, ENB 118, Tampa, FL 33620, USA;Computer Security Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA;Department of Computer Science and Engineering, University of South Florida, 4202 East Fowler Avenue, ENB 118, Tampa, FL 33620, USA
Venue:
Information Processing Letters
Year:
2010

Citing 6
Cited 5

Accessing textual documents using compressed indexes of arrays of small bloom filters

The Computer Journal
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
A second look at bloom filters

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
On the false-positive rate of Bloom filters

Information Processing Letters

Understanding bloom filter intersection for lazy address-set disambiguation

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
L-priorities bloom filter: A new member of the bloom filter family

International Journal of Automation and Computing
Inferential time-decaying Bloom filters

Proceedings of the 16th International Conference on Extending Database Technology
A proactive routing protocol for wireless ad hoc networks assuring some privacy

Proceedings of the 2nd ACM workshop on Hot topics on wireless network security and privacy
Optimized hash for network path encoding with minimized false positives

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.89

Visualization

Abstract

A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is a key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error - for a small Bloom filter the prediction of false positive rate will be in error when the classic formula is used.