A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1

  • Authors:
  • Eric L. Boyd;Waqar Azeem;Hsien-Hsin Lee;Tien-Pao Shih;Shih-Hao Hung;Edward S. Davidson

  • Affiliations:
  • University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA

  • Venue:
  • ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that arc common bottlenecks in concurrent machines. The four units currently used are: memory port, floating-point, instruction issue, and a loop-carried dependence pseudo-unit. We propose a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. We delineate a comprehensive approach to modeling and improving application performance on the KSR1. Application of this approach is being automated for the KSR1 with a series of tools including K-MA and K-MACSTAT (which enable the calculation of the MACS hierarchy of performance bounds), K-Trace (which allows parallel code to be instrumented to produce a memory reference trace), and K-Cache (which simulates inter-cache communications based on a memory reference trace).