A new analysis of the false positive rate of a Bloom filter

  • Authors:
  • Ken Christensen;Allen Roginsky;Miguel Jimeno

  • Affiliations:
  • Department of Computer Science and Engineering, University of South Florida, 4202 East Fowler Avenue, ENB 118, Tampa, FL 33620, USA;Computer Security Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA;Department of Computer Science and Engineering, University of South Florida, 4202 East Fowler Avenue, ENB 118, Tampa, FL 33620, USA

  • Venue:
  • Information Processing Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.89

Visualization

Abstract

A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is a key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error - for a small Bloom filter the prediction of false positive rate will be in error when the classic formula is used.