Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

  • Authors:
  • Jeffrey J. Evans;Charles E. Lucas

  • Affiliations:
  • Purdue University, West Lafayette, USA 47907;PC Krause and Associates, Inc., West Lafayette, USA 47906

  • Venue:
  • Cluster Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.