CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services

  • Authors:
  • Yujuan Tan;Hong Jiang;Dan Feng;Lei Tian;Zhichao Yan

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the relatively low bandwidth of WAN (Wide Area Network) that supports cloud backup services, both the backup time and restore time in the cloud backup environment are in desperate need for reduction to make cloud backup a practical and affordable service for small businesses and telecommuters alike. Existing solutions that employ the deduplication technology for cloud backup services only focus on removing redundant data from transmission during backup operations to reduce the backup time, while paying little attention to the restore time that we argue is an important aspect and affects the overall quality of service of the cloud backup services. In this paper, we propose a \emph{CAusality-Based deduplication performance booster for both cloud backup and restore operations}, called CABdedupe, which captures the causal relationship among chronological versions of datasets that are processed in multiple backups/restores, to remove the unmodified data from transmission during not only backup operations but also restore operations, thus to improve both the backup and restore performances. CABdedupe is a middleware that is orthogonal to and can be integrated into any existing backup system. Our extensive experiments, where we integrate CABdedupe into two existing backup systems and feed real world datasets, show that both the backup time and restore time are significantly reduced, with a reduction ratio of up to $103:1$.