TBF: a high-efficient query mechanism in de-duplication backup system

  • Authors:
  • Bin Zhou;Hai Jin;Xia Xie;PingPeng Yuan

  • Affiliations:
  • Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China,School of Compu ...;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

  • Venue:
  • GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the big data, the fingerprints of the data chunks are very huge and cannot be stored in the memory completely. Accordingly, a new query mechanism namely Two-stage Bloom Filter mechanism is proposed. First, each bit of the second grade bloom filter represents the chunks having the identical fingerprints which reducing the rate of false positives. Second, a two-dimensional list is created corresponding to the two grade bloom filter to gather the absolute addresses of the data chunks with the identical fingerprints. Finally, we suggest a new hash function class with the strong global random characteristic. Two-stage Bloom Filter decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive. Our experiments indicate that Two-stage Bloom Filter reduces about 30~40% storage accessing of false positive with the same length of the first grade Bloom Filter.