WAN-optimized replication of backup datasets using stream-informed delta compression

  • Authors:
  • Phlip Shilane;Mark Huang;Grant Wallace;Windsor Hsu

  • Affiliations:
  • Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation

  • Venue:
  • ACM Transactions on Storage (TOS)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Replicating data off site is critical for disaster recovery reasons, but the current approach of transferring tapes is cumbersome and error prone. Replicating across a wide area network (WAN) is a promising alternative, but fast network connections are expensive or impractical in many remote locations, so improved compression is needed to make WAN replication truly practical. We present a new technique for replicating backup datasets across a WAN that not only eliminates duplicate regions of files (deduplication) but also compresses similar regions of files with delta compression, which is available as a feature of EMC Data Domain systems. Our main contribution is an architecture that adds stream-informed delta compression to already existing deduplication systems and eliminates the need for new, persistent indexes. Unlike techniques based on knowing a file's version or that use a memory cache, our approach achieves delta compression across all data replicated to a server at any time in the past. From a detailed analysis of datasets and statistics from hundreds of customers using our product, we achieve an additional 2X compression from delta compression beyond deduplication and local compression, which enables customers to replicate data that would otherwise fail to complete within their backup window.