Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Deciding when to forget in the Elephant file system
Proceedings of the seventeenth ACM symposium on Operating systems principles
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Error-Correction Coding for Digital Communications
Error-Correction Coding for Digital Communications
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Error Correction Coding: Mathematical Methods and Algorithms
Error Correction Coding: Mathematical Methods and Algorithms
File System Support for Collaboration in theWide Area
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Operation-based update propagation in a mobile file system
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems
Proceedings of the 23rd international conference on Supercomputing
Hi-index | 0.00 |
The fuzzy file block matching technique (fuzzy matching for short), was first proposed for opportunistic use of Content Addressable Storage. Fuzzy matching aims to increase the hit ratio in the content-addressable storage providers, and thus can improve the performance of underlying distributed file storage systems by potentially saving significant network bandwidth and reducing file transmission costs. Fuzzy matching employs shingling to represent the fuzzy hashing of file blocks for similarity detection, and error-correcting information to reconstruct the canonical content of a file block from some similar blocks. In this paper, we present the implementation details of fuzzy matching and a very basic evaluation of its performance. In particular, we show that fuzzy matching can recover new versions of GNU Emacs source from older versions.