Efficient distributed backup with delta compression
Proceedings of the fifth workshop on I/O in parallel and distributed systems
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Migratory compression: coarse-grained data reordering to improve compressibility
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
For backup storage, increasing compression allows users to protect more data without increasing their costs or storage footprint. Though removing duplicate regions (deduplication) and traditional compression have become widespread, further compression is attainable. We demonstrate how to efficiently add delta compression to deduplicated storage to compress similar (nonduplicate) regions. A challenge when adding delta compression is the large number of data regions to be indexed. We observed that stream-informed locality is effective for delta compression, so an index for delta compression is unnecessary, and we built the first storage system prototype to combine delta compression and deduplication with this technology. Beyond demonstrating extra compression benefits between 1.4-3.5X, we also investigate throughput and data integrity challenges that arise.