CloudDT: efficient tape resource management using deduplication in cloud backup and archival services

  • Authors:
  • Abdullah Gharaibeh;Cornel Constantinescu;Maohua Lu;Anurag Sharma;Ramani R. Routray;Prasenjit Sarkar;David Pease;Matei Ripeanu

  • Affiliations:
  • The University of British Columbia;IBM Research - Almaden;IBM Research - Almaden;IBM Research - Almaden;IBM Research - Almaden;IBM Research - Almaden;IBM Research - Almaden;The University of British Columbia

  • Venue:
  • Proceedings of the 8th International Conference on Network and Service Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cloud-based backup and archival services use large tape libraries as a cost-effective cold tier in their online storage hierarchy today. These services leverage deduplication to reduce the disk storage capacity required by their customer data sets, but they usually re-duplicate the data when moving it from disk to tape. Deduplication does not add significant I/O overhead when performed on disk storage pools. However, when deduplicated data is naively placed on tape storage, the high degree of data fragmentation caused by deduplication--combined with the high seek and mount times of today's tape technology--leads to high retrieval times. This negatively impacts the recovery time objectives (RTO) that the service provider has to meet as a part of the service level agreement (SLA). This work proposes CloudDT, an extension to Cloud backup and archival services to efficiently support deduplication on tape pools. This paper (i) details the main challenges to enable efficient deduplication on tape libraries, (ii) introduces a class of solutions based on graph-modeling of similarity between data items that enables efficient placement on tapes, and (iii) presents the design and initial evaluation of algorithms that alleviate tape mount time overhead and reduce on-tape data fragmentation. Using 4.5 TB of real-world workloads, our initial evaluations show that our algorithms retain at least 95% of the deduplication storage efficiency, and offer up-to 40% faster restore performance compared to the case of restoring non-deduplicated data. Therefore, our techniques allow the backup service provider to increase tape resource utilization using deduplication, while also improving the restore time performance for the end-user.