Decentralized deduplication in SAN cluster file systems

  • Authors:
  • Austin T. Clements;Irfan Ahmad;Murali Vilayannur;Jinyuan Li

  • Affiliations:
  • MIT CSAIL;VMware, Inc.;VMware, Inc.;VMware, Inc.

  • Venue:
  • USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

File systems hosting virtual machines typically contain many duplicated blocks of data resulting in wasted storage space and increased storage array cache footprint. Deduplication addresses these problems by storing a single instance of each unique data block and sharing it between all original sources of that data. While deduplication is well understood for file systems with a centralized component, we investigate it in a decentralized cluster file system, specifically in the context of VM storage. We propose DEDE, a block-level deduplication system for live cluster file systems that does not require any central coordination, tolerates host failures, and takes advantage of the block layout policies of an existing cluster file system. In DEDE, hosts keep summaries of their own writes to the cluster file system in shared on-disk logs. Each host periodically and independently processes the summaries of its locked files, merges them with a shared index of blocks, and reclaims any duplicate blocks. DEDE manipulates metadata using general file system interfaces without knowledge of the file system implementation. We present the design, implementation, and evaluation of our techniques in the context of VMware ESX Server. Our results show an 80% reduction in space with minor performance overhead for realistic workloads.