Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs

Authors:
J. C. Juega;J. I. Gomez;C. Tenllado;F. Catthoor
Affiliations:
ArTeCS, UCM;ArTeCS, UCM;ArTeCS, UCM;IMEC
Venue:
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2014

Citing 16
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
isl: an integer set library for the polyhedral model

ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Dynamic selection of tile sizes

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) are today's most powerful coprocessors for accelerating massive data-parallel algorithms. However, programmers are forced to adopt new programming paradigms to take full advantage of their computing capabilities; this requires significant programming and maintenance effort. As a result, there is an increasing interest in the development of tools for automatic mapping of sequential code to GPUs. Current automatic tools require both a deep knowledge on the GPU architecture and the algorithm being mapped, which makes the mapping process a labor-intensive task. This paper proposes a technique that improves the code mapping of one of these tools, PPCG, removing the need for any user interaction. It relies on data reuse estimations to explore the mapping space and compute appropriate values for the number of threads per threadblock and tile sizes. Our results show speedups of 3x on average compared to the default code generated by PPCG.