Low-cost data deduplication for virtual machine backup in cloud storage

Authors:
Wei Zhang;Tao Yang;Gautham Narayanasamy;Hong Tang
Affiliations:
University of California at Santa Barbara;University of California at Santa Barbara;University of California at Santa Barbara;Alibaba Inc.
Venue:
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Year:
2013

Citing 10
Cited 0

Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Fast, inexpensive content-addressed storage in foundation

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Sparse indexing: large scale, inline deduplication using sampling and locality

FAST '09 Proccedings of the 7th conference on File and storage technologies
Cumulus: filesystem backup to the cloud

FAST '09 Proccedings of the 7th conference on File and storage technologies
Decentralized deduplication in SAN cluster file systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
MAD2: A scalable high-throughput exact deduplication approach for network backup services

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Building a high-performance deduplication system

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
iDedup: latency-aware, inline data deduplication for primary storage

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

CLOUD '12 Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a virtualized cloud cluster, frequent snapshot backup of virtual disks improves hosting reliability; however, it takes significant memory resource to detect and remove duplicated content blocks among snapshots. This paper presents a low-cost deduplication solution scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup instead of using inline deduplication, and partition global index and detection requests among machines using fingerprint values. Then each machine conducts duplicate detection partition by partition independently with minimal memory usage. Another optimization is to allocate and control buffer space for exchanging detection requests and duplicate summaries among machines. Our evaluation shows that the proposed multi-stage scheme uses a small amount of memory while delivering a satisfactory backup throughput.