Optimal Semijoins for Distributed Database Systems
IEEE Transactions on Software Engineering
S-tree: a dynamic balanced signature index for office retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Bitmap index design and evaluation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area Web cache sharing protocol
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Proceedings of the 15th International Conference on Extending Database Technology
Rya: a scalable RDF triple store for the clouds
Proceedings of the 1st International Workshop on Cloud Intelligence
Hi-index | 0.00 |
Bloom filters are probabilistic data structures that have been successfully used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the huge increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all the Bloom filters need to be searched for potential matches. As an example, in a federated cloud environment, with hundreds of geographically distributed clouds participating in the federation, information needs to be shared by the semi-autonomous cloud providers. Each cloud provider could encode the information using Bloom filters and share the Bloom filters with a central coordinator. The problem of interest is not only whether a given object is in any of the sets represented by the Bloom filters, but which of the existing sets contain the given object. This problem cannot be solved by just constructing a Bloom filter on the union of all the sets. We propose Bloofi, a hierarchical index structure for Bloom filters that speeds-up the search process and can be efficiently constructed and maintained. We apply our index structure to the problem of determining the complete data provenance graph in a geographically distributed setting. Our theoretical and experimental results show that Bloofi provides a scalable and efficient solution for searching through a large number of Bloom filters.