Break down GPU execution time with an analytical method

Authors:
Junjie Lai;André Seznec
Affiliations:
Project ALF, INRIA, Rennes, France;Project ALF, INRIA, Rennes, France
Venue:
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Year:
2012

Citing 5
Cited 0

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Barra: A Parallel Functional Simulator for GPGPU

MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
A quantitative performance analysis model for GPU architectures

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
CuMAPz: a tool to analyze memory access patterns in CUDA

Proceedings of the 48th Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because modern GPGPU can provide significant computing power and has very high memory bandwidth, and also, developer-friendly programming interfaces such as CUDA have been introduced, GPGPU becomes more and more accepted in the HPC research area. Much research has been done to help developers to better optimize GPU applications. But to fully understand GPU performance behavior remains a hot research topic. We developed an analytical tool called TEG (Timing Estimation tool for GPU) to estimate GPU performance. Previous work shows that TEG has good approximation and can help us to quantify bottlenecks' performance effects. We have made some improvement to the tool and in this paper, we use TEG to analyze the GPU performance scaling behavior. TEG takes the dis-assembly output of CUDA kernel binary code and instruction trace as input. It does not execute the codes, but try to model the execution of CUDA codes with timing information. Because TEG takes the native GPU assembly code as input, it can estimate the execution time with a small error and it allows us to get more insight into GPU performance result.