Improving duplicate elimination in storage systems

  • Authors:
  • Deepak R. Bobbarjung;Suresh Jagannathan;Cezary Dubnicki

  • Affiliations:
  • Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;NEC Laboratories America, Princeton, NJ

  • Venue:
  • ACM Transactions on Storage (TOS)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff, and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.