Service time estimation with a refinement enhanced hybrid clustering algorithm

  • Authors:
  • Paolo Cremonesi;Kanika Dhyani;Andrea Sansottera

  • Affiliations:
  • Politecnico di Milano, Milan, Italy;Neptuny, s.r.l., Milan, Italy;Neptuny, s.r.l., Milan, Italy

  • Venue:
  • ASMTA'10 Proceedings of the 17th international conference on Analytical and stochastic modeling techniques and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inferring service time from workload and utilization data is important to predict the performance of computer systems. While the utilization law expresses a linear relationship between the workload submitted to a computing system and its utilization, the automated analysis of real world datasets is far from trivial. Hardware and software upgrades modify the service time and periodic activities affect the utilization law. Therefore, multiple regression lines must be found in the datasets to explain the different behaviours of the system. In this paper, we propose a new methodology that works in three main phases, which involve clustering based on density of points, splitting of clusters and estimation of regression lines obtained from our extension of a clusterwise regression algorithm and a refinement procedure to remove and merge clusters. A cumulative effect of these phases is the simultaneous determination of the number of clusters while correctly identifying the point-to-cluster membership, the underlying regression lines and the outliers. A novel feature of our approach is that the selection of the number of clusters exploits the structure of the data and is not based on the model complexity as in most previous methods. A computational comparison of our method with suitable existing approaches on real world data as well as challenging synthetic "realistic" data shows the efficiency of our algorithm.