A Novel Optimization Method to Improve De-duplication Storage System Performance

  • Authors:
  • Chuanyi Liu;Yibo Xue;Dapeng Ju;Dongsheng Wang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data De-duplication has become a commodity component in data-intensive storage systems. But compared with other traditional storage paradigms, de-duplication system achieves elimination of data duplications or redundancies at the cost of bringing several additional layers or function components into the I/O path, and these additional components are either CPU-intensive or I/O intensive, largely hindering the overall system performance. Direct against the above potential system bottlenecks, this paper quantitatively analyzes the overhead of each main component introduced by de-duplication, and then proposes two performance optimization methods. The one is parallel calculation of content aware chunk identifiers, which fully utilizes the parallelism both inter and intra chunks by using a certain task partition and chunk content distribution algorithm. Experiments demonstrate that it can improve up to 150% of the system throughput, and at the same time much better utilize the multiprocessor resources. The other one is storage pipelining, which overlaps the CPU-bound, I/O-bound and network communication tasks. Through a dedicated five-stage storage pipeline design for file archival operations, experimental results show that the system throughput can increase up to 25% according to our workloads.