The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation

  • Authors:
  • Mark E. Crovella;Thomas J. LeBlanc

  • Affiliations:
  • -;-

  • Venue:
  • The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional performance debugging and tuning of parallel programs is based on the "measure-modify" approach, in which detailed measurements of program executions are used to guide incremental changes to the program that result in better performance. Unfortunately, the performance of a parallel algorithm is often related to its implementation, input data, and machine characteristics in surprising ways, and the "measure-modify" approach is unsuited to exploring these relationships fully: it is too heavily dependent on experimentation and measurement, which is impractical for studying the large number of variables that can affect parallel program performance. In this paper we argue that the problem of selecting the best implementation of a parallel algorithm requires a new approach to parallel program performance evaluation, one with a greater balance between measurement and modeling. We first present examples that demonstrate that different parallelizations of a program may be necessary to achieve the best possible performance as one varies the input data, machine architecture, or number of processors used. We then present an approach to performance evaluation based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We describe a measurement tool for lost cycles analysis that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1, and use this tool to analyze the performance tradeoffs among implementations of 2D FFT and parallel subgraph isomorphism. Using these examples, we show how lost cycles analysis can be used to solve the problems associated with selecting the best implementation in a variable environment. In addition, we show that this approach can capture large amounts of performance data using only a small number of measurements, and that it is flexible enough to allow conclusions to be drawn from empirical data in some cases, and analytic results in other cases.