A probabilistic distributed algorithm for set intersection and its analysis
Theoretical Computer Science
The probabilistic communication complexity of set intersection
SIAM Journal on Discrete Mathematics
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
WWW '05 Proceedings of the 14th international conference on World Wide Web
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Shark: scaling file servers via cooperative caching
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Towards understanding developing world traffic
Proceedings of the 4th ACM Workshop on Networked Systems for Developing Regions
The good, the bad and the ugly of consumer cloud storage
ACM SIGOPS Operating Systems Review
Wide-area network acceleration for the developing world
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection
Expert Systems with Applications: An International Journal
Hi-index | 0.89 |
We propose a technique for reducing communication overheads when sending data across a network. Our technique, called hash challenges, leverages existing deduplication solutions based on compare-by-hash by being able to determine redundant data chunks by exchanging substantially less meta-data. Hash challenges can be used directly on any existing compare-by-hash protocol, with no relevant additional computational complexity. Using real data from reference workloads, we show that hash challenges can save as much as 64% meta-data exchanged across the network, relatively to plain compare-by-hash. This implies reductions of up to 7% in overall transferred volume, and performance gains of up to 16% with typical asymmetrical broadband connections.