Direct methods for sparse matrices
Direct methods for sparse matrices
Measuring parallel processor performance
Communications of the ACM
ACM Transactions on Mathematical Software (TOMS)
Performance coupling: case studies for measuring the interactions of kernels in modern applications
Performance evaluation and benchmarking with realistic applications
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
SPAR: A New Architecture for Large Finite Element Computations
IEEE Transactions on Computers
Analysis of Benchmark Characteristics and Benchmark Performance
Analysis of Benchmark Characteristics and Benchmark Performance
Measuring Cache and TLB Performance and Their Effect of Benchmark Run
Measuring Cache and TLB Performance and Their Effect of Benchmark Run
PerWiz: a what-if prediction tool for tuning message passing programs
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Hi-index | 0.00 |
Traditional performance optimization techniques have focused on finding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper, we focus on an optimization technique with a more global perspective of the application. In particular, we present a methodology for measuring the interaction, or coupling, between kernels within an application and describe how the measurements can be used to improve the performance of scientific applications. We discuss four case studies to demonstrate the use of this methodology. The first study involves the Conjugate Gradient Benchmark from the NAS Parallel Benchmarks. The coupling measurement aided in the development of a new hybrid data structure and corresponding algorithm that slightly increased the performance of the program. The second study involves the Block Tridiagonal NAS Parallel Benchmark, for which the coupling parameter aided in revising the program to reduce the level-two cache misses by 14%. Next, we introduce improvements to an application in the SpecJVM benchmark suite resulting in 41% reduction in level-one cache misses. Lastly, we present results from the Seis application from the SPEChpc Benchmarks to illustrate the coupling parameters that may result from large-scale scientific applications.