File recipe compression in data deduplication systems

Authors:
Dirk Meister;André Brinkmann;Tim Süß
Affiliations:
Johannes Gutenberg University Mainz;Johannes Gutenberg University Mainz;Johannes Gutenberg University Mainz
Venue:
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Year:
2013

Citing 22
Cited 0

The design and implementation of INGRES

ACM Transactions on Database Systems (TODS)
"Balls into Bins" - A Simple and Tight Analysis

RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Finding Repeated Elements

Finding Repeated Elements
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
How to barter bits for chronons: compression and bandwidth trade offs for database scans

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Introduction to Information Retrieval

Introduction to Information Retrieval
Sequence of Hashes Compression in Data De-duplication

DCC '08 Proceedings of the Data Compression Conference
Sparse indexing: large scale, inline deduplication using sampling and locality

FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Block Size Optimization in Deduplication Systems

DCC '09 Proceedings of the 2009 Data Compression Conference
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
dedupv1: Improving deduplication throughput using solid state drives (SSD)

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
MAD2: A scalable high-throughput exact deduplication approach for network backup services

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Venti: a new approach to archival storage

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A study of practical deduplication

ACM Transactions on Storage (TOS)
Characteristics of backup workloads in production systems

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Delta compressed and deduplicated storage using stream-informed locality

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data deduplication systems discover and exploit redundancies between different data blocks. The most common approach divides data into chunks and identifies redundancies via fingerprints. The file content can be rebuilt by combining the chunk fingerprints which are stored sequentially in a file recipe. The corresponding file recipe data can occupy a significant fraction of the total disk space, especially if the deduplication ratio is very high. We propose a combination of efficient and scalable compression schemes to shrink the file recipes' size. A trace-based simulation shows that these methods can compress file recipes by up to 93%.