Performance debugging shared memory parallel programs using run-time dependence analysis

  • Authors:
  • Ramakrishnan Rajamony;Alan L. Cox

  • Affiliations:
  • Departments of Electrical & Computer Engineering, Rice University, Houston, TX;Departments of Electrical & Computer Science, Rice University, Houston, TX

  • Venue:
  • SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a new approach to performance debugging that focuses on automatically identifying computation transformations to reduce synchronization and communication. By grouping writes together into equivalence classes, we are able to tractably collect information from long-running programs. Our performance debugger analyzes this information and suggests computation transformations in terms of the source code. We present the transformations suggested by the debugger on a suite of four applications. For Barnes-Hut and Shallow, implementing the debugger suggestions improved the performance by a factor of 1.32 and 34 times respectively on an 8-processor IBM SP2. For Ocean, our debugger identified excess synchronization that did not have a significant impact on performance. ILINK, a genetic linkage analysis program widely used by geneticists, is already well optimized. We use it only to demonstrate the feasibility of our approach to long-running applications.We also give details on how our approach can be implemented. We use novel techniques to convert control dependences to data dependences, and to compute the source operands of stores. We report on the impact of our instrumentation on the same application suite we use for performance debugging. The instrumentation slows down the execution by a factor of between 4 and 169 times. The log files produced during execution were all less than 2.5 Mbytes in size.