TAPER: tiered approach for eliminating redundancy in replica synchronization

  • Authors:
  • Navendu Jain;Mike Dahlin;Renu Tewari

  • Affiliations:
  • Department of Computer Sciences, University of Texas at Austin, Austin, TX;Department of Computer Sciences, University of Texas at Austin, Austin, TX;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present TAPER, a scalable data replication protocol that synchronizes a large collection of data across multiple geographically distributed replica locations. TAPER can be applied to a broad range of systems, such as software distribution mirrors, content distribution networks, backup and recovery, and federated file systems. TAPER is designed to be bandwidth efficient, scalable and content-based, and it does not require prior knowledge of the replica state. To achieve these properties, TAPER provides: i) four pluggable redundancy elimination phases that balance the trade-off between bandwidth savings and computation overheads, ii) a hierarchical hash tree based directory pruning phase that quickly matches identical data from the granularity of directory trees to individual files, iii) a content-based similarity detection technique using Bloom filters to identify similar files, and iv) a combination of coarse-grained chunk matching with finer-grained block matches to achieve bandwidth efficiency. Through extensive experiments on various datasets, we observe that in comparison with rsync, a widely-used directory synchronization tool, TAPER reduces bandwidth by 15% to 71%, performs faster matching, and scales to a larger number of replicas.