Rethinking deduplication scalability

  • Authors:
  • Petros Efstathopoulos;Fanglu Guo

  • Affiliations:
  • Symantec Research Labs, Symantec Corporation, Culver City, CA;Symantec Research Labs, Symantec Corporation, Culver City, CA

  • Venue:
  • HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Deduplication, a form of compression aiming to eliminate duplicates in data, has become an important feature of most commercial and research backup systems. Since the advent of deduplication, most research efforts have focused on maximizing deduplication efficiency--i.e., the offered compression ratio--and have achieved near-optimal usage of raw storage. However, the capacity goals of next-generation Petabyte systems requires a highly scalable design, able to overcome the current scalability limitations of deduplication. We advocate a shift towards scalability-centric design principles for deduplication systems, and present some of the mechanisms used in our prototype, aiming at high scalability, good deduplication efficiency, and high throughput.