Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Mint: realizing CUDA performance in 3D stencil methods with annotated C
Proceedings of the international conference on Supercomputing
OpenMP extensions for heterogeneous architectures
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
GROPHECY: GPU performance projection from CPU code skeletons
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Poster: determining code segments that can benefit from execution on GPUs
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Paragon: collaborative speculative loop execution on GPU and CPU
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Proceedings of the 9th conference on Computing Frontiers
Parallelizing SOR for GPGPUs using alternate loop tiling
Parallel Computing
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Designing a unified programming model for heterogeneous machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dataflow-driven GPU performance projection for multi-kernel transformations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Algorithmic species: A classification of affine loop nests for parallel programming
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
accULL: an OpenACC implementation with CUDA and OpenCL support
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Input-aware auto-tuning for directive-based GPU programming
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Hybrid scheduling for event-driven simulation over heterogeneous computers
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
A preliminary evaluation of OpenACC implementations
The Journal of Supercomputing
Generating efficient data movement code for heterogeneous architectures with distributed-memory
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
GPU code generation for ODE-based applications with phased shared-data access patterns
ACM Transactions on Architecture and Code Optimization (TACO)
On Expressing Strategies for Directive-Driven Multicore Programing Models
Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient implementation of data flow graphs on multi-gpu clusters
Journal of Real-Time Image Processing
Hi-index | 0.00 |
The PGI Accelerator model is a high-level programming model for accelerators, such as GPUs, similar in design and scope to the widely-used OpenMP directives. This paper presents some details of the design of the compiler that implements the model, focusing on the Planner, the element that maps the program parallelism onto the hardware parallelism.