Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Potential benefits of delta encoding and data compression for HTTP
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Efficient distributed backup with delta compression
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Delta algorithms: an empirical analysis
ACM Transactions on Software Engineering and Methodology (TOSEM)
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Data Redundancy and Compression Methods for a Disk-Based Network Backup System
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Supporting practical content-addressable caching with CZIP compression
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Characterizing datasets for data deduplication in backup applications
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
SnapMirror®: file system based asynchronous mirroring for disaster recovery
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Efficient Deduplication Techniques for Modern Backup Operation
IEEE Transactions on Computers
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Delta compressed and deduplicated storage using stream-informed locality
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
Hi-index | 0.00 |
Replicating data off site is critical for disaster recovery reasons, but the current approach of transferring tapes is cumbersome and error prone. Replicating across a wide area network (WAN) is a promising alternative, but fast network connections are expensive or impractical in many remote locations, so improved compression is needed to make WAN replication truly practical. We present a new technique for replicating backup datasets across a WAN that not only eliminates duplicate regions of files (deduplication) but also compresses similar regions of files with delta compression, which is available as a feature of EMC Data Domain systems. Our main contribution is an architecture that adds stream-informed delta compression to already existing deduplication systems and eliminates the need for new, persistent indexes. Unlike techniques based on knowing a file's version or that use a memory cache, our approach achieves delta compression across all data replicated to a server at any time in the past. From a detailed analysis of datasets and statistics from hundreds of customers using our product, we achieve an additional 2X compression from delta compression beyond deduplication and local compression, which enables customers to replicate data that would otherwise fail to complete within their backup window.