Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers

  • Authors:
  • Sudharshan Vazhkudai;Jennifer M. Schopf

  • Affiliations:
  • -;-

  • Venue:
  • GRID '02 Proceedings of the Third International Workshop on Grid Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data grids provide an environment for communities of researchers to share, replicate, and manage access to copies of large datasets. In such environments, fetching data from one of the several replica locations requires accurate predictions of end-to-end transfer times. Predicting transfer time is significantly complicated because of the involvement of several shared components, including networks and disks in the end-to-end data path, each of which experiences load variations that can significantly affect the throughput. Of these, disk accesses are rapidly growing in cost and have not been previously considered, although on some machines they can be up to 30% of the transfer time. In this paper, we present techniques to combine observations of end-to-end application behavior and disk I/O throughput load data. We develop a set of regression models to derive predictions that characterize the effect of disk load variations on file transfer times. We also include network component variations and apply these techniques to the logs of transfer data using the GridFTP server, part of the Globus Toolkit驴. We observe up to 9% improvement in prediction accuracy when compared with approaches based on past system behavior in isolation.