RCS—a system for version control
Software—Practice & Experience
ACM Computing Surveys (CSUR)
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Delta algorithms: an empirical analysis
ACM Transactions on Software Engineering and Methodology (TOSEM)
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The string-to-string correction problem with block moves
ACM Transactions on Computer Systems (TOCS)
OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Search and replication in unstructured peer-to-peer networks
ICS '02 Proceedings of the 16th international conference on Supercomputing
Compactly encoding unstructured inputs with differential compression
Journal of the ACM (JACM)
Cluster-Based Delta Compression of a Collection of Files
WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
Erasure Coding Vs. Replication: A Quantitative Comparison
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Towards an Archival Intermemory
ADL '98 Proceedings of the Advances in Digital Libraries Conference
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Awarded Best Paper! - Venti: A New Approach to Archival Data Storage
FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Efficient locally trackable deduplication in replicated systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Malleable coding with edit-distance cost
ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 1
De-duplication-based archival storage system
CIT'09 Proceedings of the 3rd International Conference on Communications and information technology
Using transparent compression to improve SSD-based I/O caches
Proceedings of the 5th European conference on Computer systems
Efficient locally trackable deduplication in replicated systems
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
A running time improvement for the two thresholds two divisors algorithm
Proceedings of the 48th Annual Southeast Regional Conference
A data de-duplication access framework for solid state drives
Proceedings of the 2011 ACM Symposium on Applied Computing
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
DeFFS: Duplication-eliminated flash file system
Computers and Electrical Engineering
A two-phase differential synchronization algorithm for remote files
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication
Information Processing Letters
Transparent Online Storage Compression at the Block-Level
ACM Transactions on Storage (TOS)
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
TBF: a high-efficient query mechanism in de-duplication backup system
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
Fuzzy adaptive control for heterogeneous tasks in high-performance storage systems
Proceedings of the 6th International Systems and Storage Conference
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff, and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.