Demystifying data deduplication

Authors:
Nagapramod Mandagere;Pin Zhou;Mark A Smith;Sandeep Uttamchandani
Affiliations:
University of Minnesota;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center
Venue:
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Year:
2008

Citing 5
Cited 9

An Empirical Study of Delta Algorithms

ICSE '96 Proceedings of the SCM-6 Workshop on System Configuration Management
Deep Store: An Archival Storage System Architecture

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies

Multi-level comparison of data deduplication in a backup scenario

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
A running time improvement for the two thresholds two divisors algorithm

Proceedings of the 48th Annual Southeast Regional Conference
Anchor-driven subchunk deduplication

Proceedings of the 4th Annual International Conference on Systems and Storage
Secure deduplication on mobile devices

Proceedings of the 2011 Workshop on Open Source and Design of Communication
vfcBOX: multi-user consistent file sharing

Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Information Processing Letters
A study on data deduplication in HPC storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Rangoli: space management in deduplication environments

Proceedings of the 6th International Systems and Storage Conference
Memory efficient sanitization of a deduplicated storage system

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

Effectiveness and tradeoffs of deduplication technologies are not well understood -- vendors tout Deduplication as a "silver bullet" that can help any enterprise optimize its deployed storage capacity. This paper aims to provide a comprehensive taxonomy and experimental evaluation using real-world data. While the rate of change of data on a day-to-day basis has the greatest influence on the duplication in backup data, we investigate the duplication inherent in this data, independent of rate of change of data or backup schedule or backup algorithm used. Our experimental results show that between different deduplication techniques the space savings varies by about 30%, the CPU usage differs by almost 6 times and the time to reconstruct a deduplicated file can vary by more than 15 times.