A study of practical deduplication

  • Authors:
  • Dutch T. Meyer;William J. Bolosky

  • Affiliations:
  • Microsoft Research and The University of British Columbia;Microsoft Research

  • Venue:
  • FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.