A two-phase differential synchronization algorithm for remote files

  • Authors:
  • Yonghong Sheng;Dan Xu;Dongsheng Wang

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, P.R China;School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing, P.R China;,Department of Computer Science and Technology, Tsinghua University, Beijing, P.R China

  • Venue:
  • ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a two-phase synchronization algorithm—tpsync, which combines content-defined chunking (CDC) with sliding block duplicated data detection methods tpsync firstly partitions synchronized files into variable-sized chunks in coarse-grained scale with CDC method, locates the unmatched chunks of synchronized files using the edit distance algorithm, and finally generates the fine-grained delta data with fixed-sized sliding block duplicated data detection method At the first-phase, tpsync can quickly locate the partial changed chunks between two files through similar files' fingerprint characteristics On the basis of the first phase's results, small fixed-sized sliding block duplicated data detection method can produce better fine-grained delta data between the corresponding unmatched data chunks further Extensive experiments on ASCII, binary and database files demonstrate that tpsync can achieve a higher performance on synchronization time and total transferred data compared to traditional fixed-sized sliding block method—rsync Compared to rsync, tpsync reduces synchronization time by 12% and bandwidth by 18.9% on average if optimized parameters are applied on both With signature cached synchronization method adopted, tpsync can yield a better performance.