Modeling correlated workloads by combining model based clustering and a localized sampling algorithm

  • Authors:
  • Hui Li;Michael Muskulus;Lex Wolters

  • Affiliations:
  • Leiden University, Leiden, The Netherlands;Leiden University, Leiden, The Netherlands;Leiden University, Leiden, The Netherlands

  • Venue:
  • Proceedings of the 21st annual international conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose a new model for workload attributes on spaceshared computer systems, which is able to fit both marginal distributions and second order statistics such as the autocorrelation function (ACF). The modeling process is formed by a two-stage approach: Firstly, a mixture of Gaussians model is used to fit the probability density function (PDF), whose parameters are estimated via a framework called model based clustering (MBC). The MBC framework can further cluster the data according to the Gaussian components, which plays an important role in creating correlations in the next stage. Secondly, a novel localized sampling algorithm is proposed to generate correlations in the synthetic data series. It is discovered that the number of repetitions of cluster labels obtained via MBC empirically follow a Zipf-like (power law) distribution. Sampling repeatedly from a certain cluster according to the Zipf law is able to create correlations in the series. Furthermore, a cluster permutation procedure is introduced so that the autocorrelations in the synthetic data can be controlled to match those in the real trace via a single parameter. Our approach can generalize to more than one dimension, which means multiple correlated workload attributes can be modeled simultaneously. Experimental studies are conducted to evaluate the proposed algorithm using real workload traces on production systems such as Grids and supercomputers.