SnapMirror®: file system based asynchronous mirroring for disaster recovery

Authors:
Hugo Patterson;Stephen Manley;Mike Federwisch;Dave Hitz;Steve Kleiman;Shane Owara
Affiliations:
Network Appliance Inc., Sunnyvale, CA;Network Appliance Inc., Sunnyvale, CA;Network Appliance Inc., Sunnyvale, CA;Network Appliance Inc., Sunnyvale, CA;Network Appliance Inc., Sunnyvale, CA;Network Appliance Inc., Sunnyvale, CA
Venue:
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Year:
2002

Citing 7
Cited 7

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Disconnected operation in the Coda File System

ACM Transactions on Computer Systems (TOCS)
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Logical vs. physical file system backup

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A trace-driven analysis of the UNIX 4.2 BSD file system

Proceedings of the tenth ACM symposium on Operating systems principles
File system design for an NFS file server appliance

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference

Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
SecondSite: disaster tolerance as a service

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
WAN optimized replication of backup datasets using stream-informed delta compression

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN-optimized replication of backup datasets using stream-informed delta compression

ACM Transactions on Storage (TOS)
A fast disaster recovery mechanism for volume replication systems

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Systems research and innovation in data ONTAP

ACM SIGOPS Operating Systems Review
Characterization of incremental data changes for efficient data protection

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computerized data has become critical to the survival of an enterprise. Companies must have a strategy for recovering their data should a disaster such as a fire destroy the primary data center. Current mechanisms offer data managers a stark choice: rely on affordable tape but risk the loss of a full day of data and face many hours or even days to recover, or have the benefits of a fully synchronized on-line remote mirror, but pay steep costs in both write latency and network bandwidth to maintain the mirror. In this paper, we argue that asynchronous mirroring, in which batches of updates are periodically sent to the remote mirror, can let data managers find a balance between these extremes. First, by eliminating the write latency issue, asynchrony greatly reduces the performance cost of a remote mirror. Second, by storing up batches of writes, asynchronous mirroring can avoid sending deleted or overwritten data and thereby reduce network bandwidth requirements. Data managers can tune the update frequency to trade network bandwidth against the potential loss of more data. We present Snap-Mirror, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer. We use traces of production filers to show that even updating an asynchronous mirror every 15 minutes can reduce data transferred by 30% to 80%. We find that exploiting file system knowledge of deletions is critical to achieving any reduction for no-overwrite file systems such as WAFL and LFS. Experiments on a running system show that using file system metadata can reduce the time to identify changed blocks from minutes to seconds compared to purely logical approaches. Finally, we show that using SnapMirror to update every 30 minutes increases the response time of a heavily loaded system only 22%.