A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Pastiche: making backup cheap and easy
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Optimizing the migration of virtual computers
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
An analysis of compare-by-hash
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Compare-by-hash: a reasoned analysis
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System
SNAPI '08 Proceedings of the 2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
Secure deduplication on mobile devices
Proceedings of the 2011 Workshop on Open Source and Design of Communication
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Space savings and design considerations in variable length deduplication
ACM SIGOPS Operating Systems Review
Fuzzy adaptive control for heterogeneous tasks in high-performance storage systems
Proceedings of the 6th International Systems and Storage Conference
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
A scalable deduplication and garbage collection engine for incremental backup
Proceedings of the 6th International Systems and Storage Conference
Proceedings of the 8th International Conference on Network and Service Management
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks. This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-life user data in a weekly full backup scenario, both at a single point in time as well as over several weeks. In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application-specific chunking can help to further decrease storage capacity demands.