I/O deduplication: utilizing content similarity to improve I/O performance

  • Authors:
  • Ricardo Koller;Raju Rangaswami

  • Affiliations:
  • School of Computing and Information Sciences, Florida International University;School of Computing and Information Sciences, Florida International University

  • Venue:
  • FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads revealed an overall improvement in disk I/O performance of 28-47% across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement.