Short term performance forecasting in enterprise systems

  • Authors:
  • Rob Powers;Moises Goldszmidt;Ira Cohen

  • Affiliations:
  • Stanford University, Stanford, CA;Hewlett Packard Research Labs, Palo Alto, CA;Hewlett Packard Research Labs, Palo Alto, CA

  • Venue:
  • Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We use data mining and machine learning techniques to predict upcoming periods of high utilization or poor performance in enterprise systems. The abundant data available and complexity of these systems defies human characterization or static models and makes the task suitable for data mining techniques. We formulate the problem as one of classification: given current and past information about the system's behavior, can we forecast whether the system will meet its performance targets over the next hour? Using real data gathered from several enterprise systems in Hewlett-Packard, we compare several approaches ranging from time series to Bayesian networks. Besides establishing the predictive power of these approaches our study analyzes three dimensions that are important for their application as a stand alone tool. First, it quantifies the gain in accuracy of multivariate prediction methods over simple statistical univariate methods. Second, it quantifies the variations in accuracy when using different classes of system and workload features. Third, it establishes that models induced using combined data from various systems generalize well and are applicable to new systems, enabling accurate predictions on systems with insufficient historical data. Together this analysis offers a promising outlook on the development of tools to automate assignment of resources to stabilize performance, (e.g., adding servers to a cluster) and allow opportunistic job scheduling (e.g., backups or virus scans).