SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup

  • Authors:
  • Yujuan Tan;Hong Jiang;Dan Feng;Lei Tian;Zhichao Yan;Guohui Zhou

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing de-duplication solutions in cloud backup environment either obtain high compression ratios at the cost of heavy de-duplication overheads in terms of increased latency and reduced throughput, or maintain small de-duplication overheads at the cost of low compression ratios causing high data transmission costs, which results in a large backup window. In this paper, we present SAM, a Semantic-Aware Multitiered source de-duplication framework that first combines the global file-level de-duplication and local chunk-level deduplication, and further exploits file semantics in each stage in the framework, to obtain an optimal tradeoff between the deduplication efficiency and de-duplication overhead and finally achieve a shorter backup window than existing approaches. Our experimental results with real world datasets show that SAM not only has a higher de-duplication efficiency/overhead ratio than existing solutions, but also shortens the backup window by an average of 38.7%.