Analysis of file I/O traces in commercial computing environments

  • Authors:
  • K. K. Ramakrishnan;Prabuddha Biswas;Ramakrishna Karedla

  • Affiliations:
  • Distributed Systems Architecture and Performance Digital Equipment Corporation 550 King Street Littleton, MA;Business and Office Systems Engg. Digital Equipment Corporation 110 Spitbrook Road, Nashua, NH;Storage Systems ArchitectureDigital Equipment Corporation333 South Street, Shrewsbury, MA

  • Venue:
  • SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Improving the performance of the file system is becoming increasingly important to alleviate the effect of I/O bottlenecks in computer systems. To design changes to an existing file system or to architect a new file system it is important to understand current usage patterns. In this paper we analyze file I/O traces of several existing production computer sytems to understand file access behavior.Our analysis suggests that a relatively small percentage of the files are active. The amount of total data active is also quite small for interactive environments. An average file encounters a relatively small number of file opens while receiving an order of magnitude larger number of reads to it. An average process opens quite a large number of files over a typical prime time period. What is more significant is that the effect of outliers on many of the characteristics we studied is dominant. A relatively small number of processes dominate the activity, and a very small number of files receive most of these operations.In addition, we provide a comprehensive analysis of the dynamic sharing of files in each of these enviroments, addressing both the simultaneous and sequential sharing aspects, and the activity to these shared files. We observe that although only a third of the active files are sequentially shared, they receive a very large proportion of the total operations. We analyze the traces from a given environment across different lengths of time, such as one hour, three hour and whole work-day intervals and do this for 3 different environments. This gives us an idea of the shortest length of the trace needed to have confidence in the estimation of the parameters.