Insights for data reduction in primary storage: a practical analysis

  • Authors:
  • Maohua Lu;David Chambliss;Joseph Glider;Cornel Constantinescu

  • Affiliations:
  • IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center

  • Venue:
  • Proceedings of the 5th Annual International Systems and Storage Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been increasing interest in deploying data reduction techniques in primary storage systems. This paper analyzes large datasets in four typical enterprise data environments to find patterns that can suggest good design choices for such systems. The overall data reduction opportunity is evaluated for deduplication and compression, separately and combined, then in-depth analysis is presented focusing on frequency, clustering and other patterns in the collected data. The results suggest ways to enhance performance and reduce resource requirements and system cost while maintaining data reduction effectiveness. These techniques include deciding which files to compress based on file type and size, using duplication affinity to guide deployment decisions, and optimizing the detection and mapping of duplicate content adaptively when large segments account for most of the opportunity.