Implementing the PGI Accelerator model

Authors:
Michael Wolfe
Affiliations:
The Portland Group, Inc.
Venue:
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Year:
2010

Citing 4
Cited 29

Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units

Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Proceedings of the sixth workshop on Declarative aspects of multicore programming
Copperhead: compiling an embedded data parallel language

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Reducing branch divergence in GPU programs

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Mint: realizing CUDA performance in 3D stencil methods with annotated C

Proceedings of the international conference on Supercomputing
OpenMP extensions for heterogeneous architectures

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
GROPHECY: GPU performance projection from CPU code skeletons

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Poster: determining code segments that can benefit from execution on GPUs

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems

Proceedings of the 9th conference on Computing Frontiers
Parallelizing SOR for GPGPUs using alternate loop tiling

Parallel Computing
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Designing a unified programming model for heterogeneous machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dataflow-driven GPU performance projection for multi-kernel transformations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Algorithmic species: A classification of affine loop nests for parallel programming

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A type-based approach to separating protocol from application logic: a case study in hybrid computer programming

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
High-level support for pipeline parallelism on many-core architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
accULL: an OpenACC implementation with CUDA and OpenCL support

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Input-aware auto-tuning for directive-based GPU programming

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Hybrid scheduling for event-driven simulation over heterogeneous computers

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
A preliminary evaluation of OpenACC implementations

The Journal of Supercomputing
Generating efficient data movement code for heterogeneous architectures with distributed-memory

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic data allocation and buffer management for multi-GPU machines

ACM Transactions on Architecture and Code Optimization (TACO)
GPU code generation for ODE-based applications with phased shared-data access patterns

ACM Transactions on Architecture and Code Optimization (TACO)
On Expressing Strategies for Directive-Driven Multicore Programing Models

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient implementation of data flow graphs on multi-gpu clusters

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The PGI Accelerator model is a high-level programming model for accelerators, such as GPUs, similar in design and scope to the widely-used OpenMP directives. This paper presents some details of the design of the compiler that implements the model, focusing on the Planner, the element that maps the program parallelism onto the hardware parallelism.