Fast hash table lookup using extended bloom filter: an aid to network processing
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Modified collision packet classification using counting Bloom filter in tuple space
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
IEEE Transactions on Knowledge and Data Engineering
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Frequency Based Chunking for Data De-Duplication
MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
High throughput data redundancy removal algorithm with scalable performance
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Don't thrash: how to cache your hash on flash
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
BloomFlash: Bloom Filter on Flash-Based Storage
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Hi-index | 0.00 |
For the big data, the fingerprints of the data chunks are very huge and cannot be stored in the memory completely. Accordingly, a new query mechanism namely Two-stage Bloom Filter mechanism is proposed. First, each bit of the second grade bloom filter represents the chunks having the identical fingerprints which reducing the rate of false positives. Second, a two-dimensional list is created corresponding to the two grade bloom filter to gather the absolute addresses of the data chunks with the identical fingerprints. Finally, we suggest a new hash function class with the strong global random characteristic. Two-stage Bloom Filter decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive. Our experiments indicate that Two-stage Bloom Filter reduces about 30~40% storage accessing of false positive with the same length of the first grade Bloom Filter.