TBF: a high-efficient query mechanism in de-duplication backup system

Authors:
Bin Zhou;Hai Jin;Xia Xie;PingPeng Yuan
Affiliations:
Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China,School of Compu ...;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Venue:
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Year:
2012

Citing 13
Cited 0

Fast hash table lookup using extended bloom filter: an aid to network processing

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Improving duplicate elimination in storage systems

ACM Transactions on Storage (TOS)
TAPER: tiered approach for eliminating redundancy in replica synchronization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Modified collision packet classification using counting Bloom filter in tuple space

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality

FAST '09 Proccedings of the 7th conference on File and storage technologies
The Dynamic Bloom Filters

IEEE Transactions on Knowledge and Data Engineering
Bimodal content defined chunking for backup streams

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Frequency Based Chunking for Data De-Duplication

MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
High throughput data redundancy removal algorithm with scalable performance

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Don't thrash: how to cache your hash on flash

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
BloomFlash: Bloom Filter on Flash-Based Storage

ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the big data, the fingerprints of the data chunks are very huge and cannot be stored in the memory completely. Accordingly, a new query mechanism namely Two-stage Bloom Filter mechanism is proposed. First, each bit of the second grade bloom filter represents the chunks having the identical fingerprints which reducing the rate of false positives. Second, a two-dimensional list is created corresponding to the two grade bloom filter to gather the absolute addresses of the data chunks with the identical fingerprints. Finally, we suggest a new hash function class with the strong global random characteristic. Two-stage Bloom Filter decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive. Our experiments indicate that Two-stage Bloom Filter reduces about 30~40% storage accessing of false positive with the same length of the first grade Bloom Filter.