Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

  • Authors:
  • João Barreto;Luís Veiga;Paulo Ferreira

  • Affiliations:
  • INESC-ID and Technical University of Lisbon, Portugal;INESC-ID and Technical University of Lisbon, Portugal;INESC-ID and Technical University of Lisbon, Portugal

  • Venue:
  • Information Processing Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.89

Visualization

Abstract

We propose a technique for reducing communication overheads when sending data across a network. Our technique, called hash challenges, leverages existing deduplication solutions based on compare-by-hash by being able to determine redundant data chunks by exchanging substantially less meta-data. Hash challenges can be used directly on any existing compare-by-hash protocol, with no relevant additional computational complexity. Using real data from reference workloads, we show that hash challenges can save as much as 64% meta-data exchanged across the network, relatively to plain compare-by-hash. This implies reductions of up to 7% in overall transferred volume, and performance gains of up to 16% with typical asymmetrical broadband connections.