Trace-based evaluation of job runtime and queue wait time predictions in grids

  • Authors:
  • Ozan Sonmez;Nezih Yigitbasi;Alexandru Iosup;Dick Epema

  • Affiliations:
  • Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands

  • Venue:
  • Proceedings of the 18th ACM international symposium on High performance distributed computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigation on how the proposed prediction methods perform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.