Rangoli: space management in deduplication environments

  • Authors:
  • P. C. Nagesh;Atish Kathpal

  • Affiliations:
  • NetApp Inc.;NetApp Inc.

  • Venue:
  • Proceedings of the 6th International Systems and Storage Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Space management is the activity of monitoring and ensuring adequate free space on all volumes in a clustered storage system. Volumes that exceed used space limits are typically relieved by migrating a part of their data to other under utilized volumes. Without deduplication, space reclamation is simple as one has to just migrate as much data as the desired space reclamation. However, in deduped volumes there is no direct relation between the logical size of the file and the physical space occupied by it. Therefore, optimal space reclamation is hard as: a)migrating few files may produce little or zero bytes of free space, but still incur significant network costs. b)migrating a heavily shared file destroys the disk sharing relationships in that volume and increases the physical space consumption of that dataset. In this work, we have designed and built a fast and efficient tool Rangoli, that identifies the optimal set of files for space reclamation in a deduped environment. It can scale to millions of files and terabytes of data, running in tens of minutes. We show by experimenting on real world datasets, that alternate strategies such as those based on finding unique files or using MinHash, impact physical space consumption by a wide margin (up to 35 times) as compared to Rangoli.