A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1

Authors:
Eric L. Boyd;Waqar Azeem;Hsien-Hsin Lee;Tien-Pao Shih;Shih-Hao Hung;Edward S. Davidson
Affiliations:
University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA;University of Michigan, USA
Venue:
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
Year:
1994

Citing 0
Cited 6

How Well Can Simple Metrics Represent the Performance of HPC Applications?

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantitative performance analysis of the SPEC OMPM2001 benchmarks

Scientific Programming - OpenMP
A genetic algorithms approach to modeling the performance of memory-bound computations

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
A performance prediction framework for scientific applications

Future Generation Computer Systems
Using automated performance modeling to find scalability bugs in complex codes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that arc common bottlenecks in concurrent machines. The four units currently used are: memory port, floating-point, instruction issue, and a loop-carried dependence pseudo-unit. We propose a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. We delineate a comprehensive approach to modeling and improving application performance on the KSR1. Application of this approach is being automated for the KSR1 with a series of tools including K-MA and K-MACSTAT (which enable the calculation of the MACS hierarchy of performance bounds), K-Trace (which allows parallel code to be instrumented to produce a memory reference trace), and K-Cache (which simulates inter-cache communications based on a memory reference trace).