Mass Storage System Performance Prediction Using a Trace-Driven Simulator

  • Authors:
  • Bill Anderson

  • Affiliations:
  • National Center for Atmospheric Research (NCAR), Boulder, CO

  • Venue:
  • MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Performance prediction of Mass Storage Systems can be difficult because of the complexity of the systems and the interdependence of the components. This difficulty can lead to over or under provisioned systems and can limit the ability to identify software algorithms (such as cache management or request ordering algorithms) that scale well and yield high performance. Moreover, as Mass Storage Systems scale to a petabyte and beyond, the ability to predict performance could become increasingly important since errors in capacity planning could also grow. This paper discusses a trace-driven discrete event simulator that we have developed to aid us in ranking design and configuration alternatives. The simulator reads in a configuration file, ingests a workload and estimates multiple metrics, including average user response times, cache hit ratios and device utilization. Simulated components include tape drives, disk systems and software components. The simulator has been used to help us determine the size of a disk cache that is used to offload reads from tapes; we found that, for files with sizes under 50 MB, a cache size of around 8 TB can provide a read hit ratio of approximately 67%. The simulator has also been used to estimate the number of STK 9940B tape drives needed to replace our 9840A and 9940A drives. Based on validation runs, we have found that metrics predicted by the simulator are within approximately 20% of the actual values.