Instability in parallel job scheduling simulation: the role of workload flurries

  • Authors:
  • Dan Tsafrir;Dror G. Feitelson

  • Affiliations:
  • School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel;School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of computer systems depends, among other things, on the workload. This motivates the use of real workloads (as recorded in activity logs) to drive simulations of new designs. Unfortunately, real workloads may contain various anomalies that contaminate the data. A previously unrecognized type of anomaly is workload flurries: rare surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. We find that long workloads often include at least one such event. We show that in the context of parallel job scheduling these events can have a significant effect on performance evaluation results, e.g. a very small perturbation of the simulation conditions might lead to a large and disproportional change in the outcome. This instability is due to jobs in the flurry being effected in unison, a consequence of the flurry's repetitive nature. We therefore advocate that flurries be filtered out before the workload is used, in order to achieve stable and more reliable evaluation results (analogously to the removal of outliers in statistical analysis). At the same time, we note that more research is needed on the possible effects of flurries.