Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Optimal Semijoins for Distributed Database Systems
IEEE Transactions on Software Engineering
Randomized algorithms
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
IEEE/ACM Transactions on Networking (TON)
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
R* Optimizer Validation and Performance Evaluation for Distributed Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Informed content delivery across adaptive overlay networks
IEEE/ACM Transactions on Networking (TON)
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Enhancing Collaborative Spam Detection with Bloom Filters
ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Improving distributed join efficiency with extended bloom filter operations
AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
Bloom histogram: path selectivity estimation for XML data with updates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Hash-AV: fast virus signature scanning by cache-resident filters
International Journal of Security and Networks
GossipTrust for Fast Reputation Aggregation in Peer-to-Peer Networks
IEEE Transactions on Knowledge and Data Engineering
L-CBF: a low-power, fast counting bloom filter architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimizing Distributed Joins with Bloom Filters
ICDCIT '08 Proceedings of the 5th International Conference on Distributed Computing and Internet Technology
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Distributed and Parallel Databases
XML processing in DHT networks
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Distributed Structural Relaxation of XPath Queries
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Distributed top-k aggregation queries at large
Distributed and Parallel Databases
IEEE Transactions on Knowledge and Data Engineering
Understanding bloom filter intersection for lazy address-set disambiguation
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
XStreamCluster: an efficient algorithm for streaming XML data clustering
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Probabilistic threshold join over distributed uncertain data
WAIM'11 Proceedings of the 12th international conference on Web-age information management
One is enough: distributed filtering for duplicate elimination
Proceedings of the 20th ACM international conference on Information and knowledge management
Towards benefit-based RDF source selection for SPARQL queries
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Faster upper bounding of intersection sizes
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
When private set intersection meets big data: an efficient and scalable protocol
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
TWINS: Efficient time-windowed in-network joins for sensor networks
Information Sciences: an International Journal
Hi-index | 0.00 |
Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.