Automatic Search for Performance Problems in Parallel and Distributed Programs by Using Multi-experiment Analysis

  • Authors:
  • Thomas Fahringer;Clovis Seragiotto, Jr.

  • Affiliations:
  • -;-

  • Venue:
  • HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce Aksum, a novel system for performance analysis that helps programmers to locate and to understand performance problems in message passing, shared memory and mixed parallel programs. The user must provide the set of problem and machine sizes for which performance analysis should be conducted. The search for performance problems (properties) is user-controllable by restricting the performance analysis to specific code regions, by creating new or customizing existing property specifications and property hierarchies, by indicating the maximum search time and maximum time a single experiment may take, by providing thresholds that define whether or not a property is critical, and by indicating conditions under which the search for properties stops. Aksum automatically selects and instruments code regions for collecting raw performance data based on which performance properties are computed. Heuristics are incorporated to prune the search for performance properties. We have implemented Aksum as a portable Java-based distributed system which displays all properties detected during the search process together with the code regions that cause them. A filtering mechanism allows the examination of properties at various levels of detail. We present an experiment with a financial modeling application to demonstrate the usefulness and effectiveness of our approach.