Understanding bloom filter intersection for lazy address-set disambiguation

  • Authors:
  • Mark C. Jeffrey;J. Gregory Steffan

  • Affiliations:
  • University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada

  • Venue:
  • Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and decide set disjointness. This is in contrast with the conventional and well-studied approach of making individual membership queries into the Bloom filter. In this paper we present much-needed probabilistic models for the unconventional application of testing set disjointness using Bloom filters. Consequently, we demonstrate that intersecting Bloom filters requires substantially larger bit-arrays to provide the same probability of false set-overlap as querying into the bit-array. For when intersection is unavoidable, we prove that partitioned Bloom filters require less space than unpartitioned. Finally, we show that for Bloom filters with a single hash function, surprisingly, intersection and querying share the same probability of false set-overlap.