Characterization of incremental data changes for efficient data protection

  • Authors:
  • Hyong Shim;Philip Shilane;Windsor Hsu

  • Affiliations:
  • Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation

  • Venue:
  • USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protecting data on primary storage often requires creating secondary copies by periodically replicating the data to external target systems. We analyze over 100,000 traces from 125 customer block-based primary storage systems to gain a high-level understanding of I/O characteristics and then perform an in-depth analysis of over 500 traces from 13 systems that span at least 24 hours. Our analysis has the twin goals of minimizing overheads on primary systems and improving data replication efficiency. We compare our results with a study a decade ago [20] and provide fresh insights into patterns of incremental changes on primary systems over time. Primary storage systems often create snapshots as point-in-time copies in order to support host I/O while replicating changed data to target systems. However, creating standard snapshots on a primary storage system incurs overheads in terms of capacity and I/O, and we present a new snapshot technique called a replication snapshot that reduces these overheads. Replicated data also requires capacity and I/O on the target system, and we investigate techniques to significantly reduce these overheads. We also find that highly sequential or random I/O patterns have different incremental change characteristics. Where applicable, we present our findings as advice to storage engineers and administrators.