Analysis of application heartbeats: learning structural and temporal features in time series data for identification of performance problems

  • Authors:
  • Emma S. Buneci;Daniel A. Reed

  • Affiliations:
  • Duke University, Durham, NC;Microsoft Research, Redmond, WA

  • Venue:
  • Proceedings of the 2008 ACM/IEEE conference on Supercomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grids promote new modes of scientific collaboration and discovery by connecting distributed instruments, data and computing facilities. Because many resources are shared, application performance can vary widely and unexpectedly. We describe a novel performance analysis framework that reasons temporally and qualitatively about performance data from multiple monitoring levels and sources. The framework periodically analyzes application performance states by generating and interpreting signatures containing structural and temporal features from time-series data. Signatures are compared to expected behaviors and in case of mismatches, the framework hints at causes of degraded performance, based on unexpected behavior characteristics previously learned by application exposure to known performance stress factors. Experiments with two scientific applications reveal signatures that have distinct characteristics during well-performing versus poor-performing executions. The ability to automatically and compactly generate signatures capturing fundamental differences between good and poor application performance states is essential to improving the quality of service for Grid applications.