Experiment management support for performance tuning

  • Authors:
  • Karen L. Karavanic;Barton P. Miller

  • Affiliations:
  • University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI

  • Venue:
  • SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of a high-performance parallel system orapplication is an evolutionary process. It may begin with models orsimulations, followed by an initial implementation of the program.The code is then incrementally modified to tune its performance andcontinues to evolve throughout the applications's life span. Ateach step, the key question for developers is: how and how muchdid the performance change? This question arises comparing animplementation to models or simulations; considering versions of animplementation that use a different algorithm, communication ornumeric library, or language; studying code behavior by varyingnumber or type of processors, type of network, type of processes,input data set or work load, or scheduling algorithm; and inbenchmarking or regression testing. Despite the broad utility ofthis type of comparison, no existing performance tool provides thenecessary functionality to answer it; even state of the artresearch tools such as Paradyn[2] and Pablo[3] focus instead onmeasuring the performance of a single program execution.We describe an infrastructure for answering this question at allstages of the life of an application. We view each program run,simulation result, or program model as an experiment, andprovide this functionality in an Experiment Managementsystem. Our project has three parts: (1) a representation for thespace of executions, (2) techniques for quantitatively andautomatically comparing two or more executions, and (3) enhancedperformance diagnosis abilities based on historic performance data.In this paper we present initial results on the first two parts.The measure of success of this project is that an activity that wascomplex and cumbersome to do manually, we can automate.The first part is a concise representation for the set ofexecutions collected over the life of an application. We storeinformation about each experiment in a Program Event, whichenumerates the components of the code executed and the executionenvironment, and stores the performance data collected. Thepossible combinations of code and execution environment form themulti-dimensional Program Space, with one dimension for eachaxis of variation and one point for each Program Event. We enableexploration of this space with a simple naming mechanism, aselection and query facility, and a set of interactivevisualizations. Queries on a Program Space may be made both on thecontents of the performance data and on the metadata that describesthe multi-dimensional program space. A graphical representation ofthe Program Space serves as the user interface to the ExperimentManagement system.The second part of the project is to develop techniques forautomating comparison between experiments. Performance tuningacross multiple executions must answer the deceptively simplequestion: what changed in this run of the program? We havedeveloped techniques for determining the "difference" between twoor more program runs, automatically describing both the structuraldifferences (differences in program execution structure andresources used), and the performance variation (how were theresources used and how did this change from one run to the next).We can apply our technique to compare an actual execution with apredicted or desired performance measure for the application, andto compare distinct time intervals of a single program execution.Uses for this include performance tuning efforts, automatedscalability studies, resource allocation for metacomputing [4],performance model validation studies, and dynamic execution modelswhere processes are created, destroyed, migrated [5], communicationpatterns and use of distributed shared memory may be optimized[6,9], or data values or code may be changed by steering [7,8]. Thedifference information is not necessarily a simple measure such astotal execution time, but may be a more complex measure derivedfrom details of the program structure, an analytical performanceprediction, an actual previous execution of the code, a set ofperformance thresholds that the application is required to meet orexceed, or an incomplete set of data from selected intervals of anexecution.The third part of this research is to investigate the use of thepredicted, summary, and historical data contained in the ProgramEvents and Program Space for performance diagnosis. We areexploring novel opportunities for exploiting this collection ofdata to focus data gathering and analysis efforts to the criticalsections of a large application, and for isolating spurious effectsfrom interesting performance variations. Details of this areoutside of the scope of this paper.