Low-cost data deduplication for virtual machine backup in cloud storage

  • Authors:
  • Wei Zhang;Tao Yang;Gautham Narayanasamy;Hong Tang

  • Affiliations:
  • University of California at Santa Barbara;University of California at Santa Barbara;University of California at Santa Barbara;Alibaba Inc.

  • Venue:
  • HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a virtualized cloud cluster, frequent snapshot backup of virtual disks improves hosting reliability; however, it takes significant memory resource to detect and remove duplicated content blocks among snapshots. This paper presents a low-cost deduplication solution scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup instead of using inline deduplication, and partition global index and detection requests among machines using fingerprint values. Then each machine conducts duplicate detection partition by partition independently with minimal memory usage. Another optimization is to allocate and control buffer space for exchanging detection requests and duplicate summaries among machines. Our evaluation shows that the proposed multi-stage scheme uses a small amount of memory while delivering a satisfactory backup throughput.