The design and implementation of INGRES
ACM Transactions on Database Systems (TODS)
"Balls into Bins" - A Simple and Tight Analysis
RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Finding Repeated Elements
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
How to barter bits for chronons: compression and bandwidth trade offs for database scans
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Introduction to Information Retrieval
Introduction to Information Retrieval
Sequence of Hashes Compression in Data De-duplication
DCC '08 Proceedings of the Data Compression Conference
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Block Size Optimization in Deduplication Systems
DCC '09 Proceedings of the 2009 Data Compression Conference
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
dedupv1: Improving deduplication throughput using solid state drives (SSD)
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
MAD2: A scalable high-throughput exact deduplication approach for network backup services
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A study of practical deduplication
ACM Transactions on Storage (TOS)
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Delta compressed and deduplicated storage using stream-informed locality
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Hi-index | 0.00 |
Data deduplication systems discover and exploit redundancies between different data blocks. The most common approach divides data into chunks and identifies redundancies via fingerprints. The file content can be rebuilt by combining the chunk fingerprints which are stored sequentially in a file recipe. The corresponding file recipe data can occupy a significant fraction of the total disk space, especially if the deduplication ratio is very high. We propose a combination of efficient and scalable compression schemes to shrink the file recipes' size. A trace-based simulation shows that these methods can compress file recipes by up to 93%.