Bloofi: a hierarchical Bloom filter index with applications to distributed data provenance

Authors:
Adina Crainiceanu
Affiliations:
United States Naval Academy
Venue:
Proceedings of the 2nd International Workshop on Cloud Intelligence
Year:
2013

Citing 11
Cited 0

Optimal Semijoins for Distributed Database Systems

IEEE Transactions on Software Engineering
S-tree: a dynamic balanced signature index for office retrieval

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area Web cache sharing protocol

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Compressed bloom filters

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximately detecting duplicates for streaming data using stable bloom filters

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Towards "intelligent compression" in streams: a biased reservoir sampling based Bloom filter approach

Proceedings of the 15th International Conference on Extending Database Technology
Rya: a scalable RDF triple store for the clouds

Proceedings of the 1st International Workshop on Cloud Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bloom filters are probabilistic data structures that have been successfully used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the huge increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all the Bloom filters need to be searched for potential matches. As an example, in a federated cloud environment, with hundreds of geographically distributed clouds participating in the federation, information needs to be shared by the semi-autonomous cloud providers. Each cloud provider could encode the information using Bloom filters and share the Bloom filters with a central coordinator. The problem of interest is not only whether a given object is in any of the sets represented by the Bloom filters, but which of the existing sets contain the given object. This problem cannot be solved by just constructing a Bloom filter on the union of all the sets. We propose Bloofi, a hierarchical index structure for Bloom filters that speeds-up the search process and can be efficiently constructed and maintained. We apply our index structure to the problem of determining the complete data provenance graph in a geographically distributed setting. Our theoretical and experimental results show that Bloofi provides a scalable and efficient solution for searching through a large number of Bloom filters.