File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Data de-duplication is a simple compression method that became verypopular in storage archival and backup. It has the advantage ofdirect, random access to any piece ("chunk") of a file in one tablelookup; that's not the case with differential file compression, theother common storage archival method. The compression efficiency(chunk matching) of de-duplication improves for smaller chunk sizes,however the sequence of hashes replacing the de-duplicated object(file) increases significantly. We propose a simple scheme to shrinkthe list of hashes generated during de-duplication of an object.This shrinkage is orders of magnitude smaller than what a customarycompression algorithm (gzip) achieves and has a significant impacton overall de-duplication efficiency.