Multi-level comparison of data deduplication in a backup scenario

Authors:
Dirk Meister;André Brinkmann
Affiliations:
Paderborn Center for Parallel Computing;Paderborn Center for Parallel Computing
Venue:
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Year:
2009

Citing 14
Cited 10

A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
Pastiche: making backup cheap and easy

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Deep Store: An Archival Storage System Architecture

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Optimizing the migration of virtual computers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
An analysis of compare-by-hash

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Single instance storage in Windows® 2000

WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Compare-by-hash: a reasoned analysis

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Demystifying data deduplication

Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

SNAPI '08 Proceedings of the 2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os

Anchor-driven subchunk deduplication

Proceedings of the 4th Annual International Conference on Systems and Storage
Secure deduplication on mobile devices

Proceedings of the 2011 Workshop on Open Source and Design of Communication
A study on data deduplication in HPC storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Space savings and design considerations in variable length deduplication

ACM SIGOPS Operating Systems Review
Fuzzy adaptive control for heterogeneous tasks in high-performance storage systems

Proceedings of the 6th International Systems and Storage Conference
Block locality caching for data deduplication

Proceedings of the 6th International Systems and Storage Conference
A scalable deduplication and garbage collection engine for incremental backup

Proceedings of the 6th International Systems and Storage Conference
CloudDT: efficient tape resource management using deduplication in cloud backup and archival services

Proceedings of the 8th International Conference on Network and Service Management
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services

Journal of Signal Processing Systems
File recipe compression in data deduplication systems

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks. This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-life user data in a weekly full backup scenario, both at a single point in time as well as over several weeks. In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application-specific chunking can help to further decrease storage capacity demands.