Optimal Semijoins for Distributed Database Systems
IEEE Transactions on Software Engineering
Design and validation of computer protocols
Design and validation of computer protocols
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Cache-oblivious streaming B-trees
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An improved construction for counting bloom filters
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Compact Hash Tables Using Bidirectional Linear Probing
IEEE Transactions on Computers
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Counting Data Stream Based on Improved Counting Bloom Filter
WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
A Reconfigurable Bloom Filter Architecture for BLASTN
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
LazyBase: freshness vs. performance in information management
ACM SIGOPS Operating Systems Review
On the energy consumption and performance of systems software
Proceedings of the 4th Annual International Conference on Systems and Storage
BloomFlash: Bloom Filter on Flash-Based Storage
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
A Forest-structured Bloom Filter with flash memory
MSST '11 Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies
Streaming quotient filter: a near optimal approximate duplicate detection approach for data streams
Proceedings of the VLDB Endowment
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can represent. This paper first describes the quotient filter, which supports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient filter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged. The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quotient filter advantages and thus serve as SSD-optimized alternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient filter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient filter performed insertions 8.6--11 times faster than the fastest Bloom filter variant and performed lookups 0.94--2.56 times faster.