An Empirical Study of Delta Algorithms
ICSE '96 Proceedings of the SCM-6 Workshop on System Configuration Management
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Multi-level comparison of data deduplication in a backup scenario
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
A running time improvement for the two thresholds two divisors algorithm
Proceedings of the 48th Annual Southeast Regional Conference
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
Secure deduplication on mobile devices
Proceedings of the 2011 Workshop on Open Source and Design of Communication
vfcBOX: multi-user consistent file sharing
Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication
Information Processing Letters
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Rangoli: space management in deduplication environments
Proceedings of the 6th International Systems and Storage Conference
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.01 |
Effectiveness and tradeoffs of deduplication technologies are not well understood -- vendors tout Deduplication as a "silver bullet" that can help any enterprise optimize its deployed storage capacity. This paper aims to provide a comprehensive taxonomy and experimental evaluation using real-world data. While the rate of change of data on a day-to-day basis has the greatest influence on the duplication in backup data, we investigate the duplication inherent in this data, independent of rate of change of data or backup schedule or backup algorithm used. Our experimental results show that between different deduplication techniques the space savings varies by about 30%, the CPU usage differs by almost 6 times and the time to reconstruct a deduplicated file can vary by more than 15 times.