Performance coupling: case studies for improving the performance of scientific applications

Authors:
Jonathan Geisler;Valerie Taylor
Affiliations:
Department of Electrical and Computer Engineering, Northwestern University, Evanston, Illinois;Department of Electrical and Computer Engineering, Northwestern University, Evanston, Illinois
Venue:
Journal of Parallel and Distributed Computing
Year:
2002

Citing 8
Cited 1

Direct methods for sparse matrices

Direct methods for sparse matrices
Measuring parallel processor performance

Communications of the ACM
Algorithm 586: ITPACK 2C: A FORTRAN Package for Solving Large Sparse Linear Systems by Adaptive Accelerated Iterative Methods

ACM Transactions on Mathematical Software (TOMS)
Performance coupling: case studies for measuring the interactions of kernels in modern applications

Performance evaluation and benchmarking with realistic applications
Quantifying the Multi-Level Nature of Tiling Interactions

International Journal of Parallel Programming
SPAR: A New Architecture for Large Finite Element Computations

IEEE Transactions on Computers
Analysis of Benchmark Characteristics and Benchmark Performance

Analysis of Benchmark Characteristics and Benchmark Performance
Measuring Cache and TLB Performance and Their Effect of Benchmark Run

Measuring Cache and TLB Performance and Their Effect of Benchmark Run

PerWiz: a what-if prediction tool for tuning message passing programs

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional performance optimization techniques have focused on finding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper, we focus on an optimization technique with a more global perspective of the application. In particular, we present a methodology for measuring the interaction, or coupling, between kernels within an application and describe how the measurements can be used to improve the performance of scientific applications. We discuss four case studies to demonstrate the use of this methodology. The first study involves the Conjugate Gradient Benchmark from the NAS Parallel Benchmarks. The coupling measurement aided in the development of a new hybrid data structure and corresponding algorithm that slightly increased the performance of the program. The second study involves the Block Tridiagonal NAS Parallel Benchmark, for which the coupling parameter aided in revising the program to reduce the level-two cache misses by 14%. Next, we introduce improvements to an application in the SpecJVM benchmark suite resulting in 41% reduction in level-one cache misses. Lastly, we present results from the Seis application from the SPEChpc Benchmarks to illustrate the coupling parameters that may result from large-scale scientific applications.