An experimental approach to performance measurement of heterogeneous parallel applications using CUDA

Authors:
Allen D. Malony;Scott Biersdorff;Wyatt Spear;Shangkar Mayanglambam
Affiliations:
University of Oregon, Eugene, OR;University of Oregon, Eugene, OR;University of Oregon, Eugene, OR;Qualcomm Corporation, Santa Clara, CA
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 6
Cited 5

From trace generation to visualization: a performance framework for distributed parallel systems

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Accelerating linpack with CUDA on heterogenous clusters

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Integrated Performance Views in Charm++: Projections Meets TAU

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Introducing the open trace format (OTF)

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

GRace: a low-overhead mechanism for detecting data races in GPU programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Analyzing program flow within a many-kernel OpenCL application

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Parallelizing SOR for GPGPUs using alternate loop tiling

Parallel Computing
Tools for machine-learning-based empirical autotuning and specialization

International Journal of High Performance Computing Applications
Portable and Transparent Host-Device Communication Optimization for GPGPU Environments

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous parallel systems using GPU devices for application acceleration have garnered significant attention in the supercomputing community. However, to realize the full potential of GPU computing, application developers will require tools to measure and analyze accelerator performance with respect to the parallel execution as a whole. A performance measurement technology for the NVIDIA CUDA platform has been developed and integrated with the TAU parallel performance system. The design of the TAUcuda package is based on an experimental NVIDIA CUDA driver and associated runtime and device libraries. In any environment where the CUDA experimental driver is installed, TAUcuda can provide detailed performance information regarding the execution of GPU kernels and the interactions with the parallel program without any modification to the program source or executable code. The paper describes the TAUcuda technology and how it is integrated with the TAU measurement framework to provide integrated performance views. Various examples of TAUcuda use are presented, including CUDA SDK examples, a GPU version of the Linpack benchmark, and a scalable molecular dynamics application, NAMD.