An analysis of compare-by-hash

Authors:
Val Henson
Affiliations:
Sun Microsystems
Venue:
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Year:
2003

Citing 7
Cited 24

A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Differential Collisions in SHA-0

CRYPTO '98 Proceedings of the 18th Annual International Cryptology Conference on Advances in Cryptology
CPCMS: A Configuration Management System Based on Cryptographic Names

Proceedings of the FREENIX Track: 2002 USENIX Annual Technical Conference
Pastiche: making backup cheap and easy

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Optimizing the migration of virtual computers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Awarded Best Paper! - Venti: A New Approach to Archival Data Storage

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
On the security of two MAC algorithms

EUROCRYPT'96 Proceedings of the 15th annual international conference on Theory and application of cryptographic techniques

Consistency-preserving caching of dynamic database content

Proceedings of the 16th international conference on World Wide Web
Slinky: static linking reloaded

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Improving mobile database access over wide-area networks without degrading consistency

Proceedings of the 5th international conference on Mobile systems, applications and services
TAPER: tiered approach for eliminating redundancy in replica synchronization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Design, implementation, and evaluation of duplicate transfer detection in HTTP

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Evaluating the usefulness of content addressable storage for high-performance data intensive applications

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Fast, inexpensive content-addressed storage in foundation

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A secure peer-to-peer backup system

NOTERE '08 Proceedings of the 8th international conference on New technologies in distributed systems
Cumulus: filesystem backup to the cloud

FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Applying syntactic similarity algorithms for enterprise information management

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Cumulus: Filesystem backup to the cloud

ACM Transactions on Storage (TOS)
Efficient locally trackable deduplication in replicated systems

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Efficient locally trackable deduplication in replicated systems

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Constructing and managing appliances for cloud deployments from repositories of reusable components

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Tolerating file-system mistakes with EnvyFS

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
PRESIDIO: A Framework for Efficient Archival Data Storage

ACM Transactions on Storage (TOS)
VMFlock: virtual machine co-migration for the cloud

Proceedings of the 20th international symposium on High performance distributed computing
Enhancing the performance of high availability lightweight live migration

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
A study on data deduplication in HPC storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A novel approach to data deduplication over the engineering-oriented cloud systems

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research has produced a new and perhaps dangerous technique for uniquely identifying blocks that I will call compare-by-hash. Using this technique, we decide whether two blocks are identical to each other by comparing their hash values, using a collision-resistant hash such as SHA-1[5]. If the hash values match, we assume the blocks are identical without further ado. Users of compare-by-hash argue that this assumption is warranted because the chance of a hash collision between any two randomly generated blocks is estimated to be many orders of magnitude smaller than the chance of many kinds of hardware errors. Further analysis shows that this approach is not as risk-free as it seems at first glance.