Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

Authors:
Cedric Nugteren;Henk Corporaal
Affiliations:
Eindhoven University of Technology, The Netherlands;Eindhoven University of Technology, The Netherlands
Venue:
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Year:
2012

Citing 17
Cited 4

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
A software architecture for user transparent parallel image processing

Parallel Computing - Parallel computing in image and video processing
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford

IEEE Micro
SkePU: a multi-backend skeleton programming library for multi-GPU systems

Proceedings of the fourth international workshop on High-level parallel programming and applications
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Algorithmic skeletons for stream programming in embedded heterogeneous parallel image processing applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
hiCUDA: High-Level GPGPU Programming

IEEE Transactions on Parallel and Distributed Systems
High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Mint: realizing CUDA performance in 3D stencil methods with annotated C

Proceedings of the international conference on Supercomputing
Feasibility analysis of ultra high frame rate visual servoing on FPGA and SIMD processor

ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
Source-to-Source Code Translator: OpenMP C to CUDA

HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
SkelCL - A Portable Skeleton Library for High-Level GPU Programming

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
GPUs and the Future of Parallel Computing

IEEE Micro
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

The boat hull model: enabling performance prediction for parallel computing prior to code development

Proceedings of the 9th conference on Computing Frontiers
Algorithmic species: A classification of affine loop nests for parallel programming

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
KFusion: optimizing data flow without compromising modularity

Proceedings of the 12th annual international conference on Aspect-oriented software development
APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from programmers. A number of parallelizing source-to-source compilers have recently been developed to ease programming of multi-core and many-core processors. This work presents and evaluates a number of such tools, focused in particular on C-to-CUDA transformations targeting GPUs. We compare these tools both qualitatively and quantitatively to each other and identify their strengths and weaknesses. In this paper, we address the weaknesses by presenting a new classification of algorithms. This classification is used in a new source-to-source compiler, which is based on the algorithmic skeletons technique. The compiler generates target code based on skeletons of parallel structures, which can be seen as parameterisable library implementations for a set of algorithm classes. We furthermore demonstrate that the presented compiler requires little modifications to the original sequential source code, generates readable code for further fine-tuning, and delivers superior performance compared to other tools for a set of 8 image processing kernels.