Implementation and performance evaluation of fuzzy file block matching

  • Authors:
  • Bo Han;Pete Keleher

  • Affiliations:
  • Department of Computer Science, University of Maryland, College Park, MD;Department of Computer Science, University of Maryland, College Park, MD

  • Venue:
  • ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The fuzzy file block matching technique (fuzzy matching for short), was first proposed for opportunistic use of Content Addressable Storage. Fuzzy matching aims to increase the hit ratio in the content-addressable storage providers, and thus can improve the performance of underlying distributed file storage systems by potentially saving significant network bandwidth and reducing file transmission costs. Fuzzy matching employs shingling to represent the fuzzy hashing of file blocks for similarity detection, and error-correcting information to reconstruct the canonical content of a file block from some similar blocks. In this paper, we present the implementation details of fuzzy matching and a very basic evaluation of its performance. In particular, we show that fuzzy matching can recover new versions of GNU Emacs source from older versions.