Sequence of Hashes Compression in Data De-duplication

Authors:
Subashini Balachandran;Cornel Constantinescu
Affiliations:
-;-
Venue:
DCC '08 Proceedings of the Data Compression Conference
Year:
2008

Citing 0
Cited 1

File recipe compression in data deduplication systems

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data de-duplication is a simple compression method that became verypopular in storage archival and backup. It has the advantage ofdirect, random access to any piece ("chunk") of a file in one tablelookup; that's not the case with differential file compression, theother common storage archival method. The compression efficiency(chunk matching) of de-duplication improves for smaller chunk sizes,however the sequence of hashes replacing the de-duplicated object(file) increases significantly. We propose a simple scheme to shrinkthe list of hashes generated during de-duplication of an object.This shrinkage is orders of magnitude smaller than what a customarycompression algorithm (gzip) achieves and has a significant impacton overall de-duplication efficiency.