Experiences with Mapping Non-linear Memory Access Patterns into GPUs

Authors:
Eladio Gutierrez;Sergio Romero;Maria A. Trenas;Oscar Plata
Affiliations:
Department of Computer Architecture, University of Malaga, Spain;Department of Computer Architecture, University of Malaga, Spain;Department of Computer Architecture, University of Malaga, Spain;Department of Computer Architecture, University of Malaga, Spain
Venue:
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Year:
2009

Citing 8
Cited 1

The FFT on a GPU

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
FFT and Convolution Performance in Image Filtering on GPU

IV '06 Proceedings of the conference on Information Visualization
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA

Queue - GPU Computing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
High performance discrete Fourier transforms on graphics processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel Computing Experiences with CUDA

IEEE Micro

GPU-based FFT computation for multi-gigabit wirelessHD baseband processing

EURASIP Journal on Wireless Communications and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU architecture, so as a number of critical optimizations must be applied to the code in order to get maximum performance. In this paper we discuss our experience in porting an application kernel to the GPU, and all classes of design decisions we adopted in order to obtain maximum performance.