Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
IEEE Transactions on Parallel and Distributed Systems
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Patterns for parallel programming
Patterns for parallel programming
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
Implementing the PGI Accelerator model
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
SkePU: a multi-backend skeleton programming library for multi-GPU systems
Proceedings of the fourth international workshop on High-level parallel programming and applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Proceedings of the 9th conference on Computing Frontiers
Generating GPU code from a high-level representation for image processing kernels
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Hi-index | 0.00 |
Code generation and programming have become ever more challenging over the last decade due to the shift towards parallel processing. Emerging processor architectures such as multi-cores and GPUs exploit increasingly parallelism, requiring programmers and compilers to deal with aspects such as threading, concurrency, synchronization, and complex memory partitioning. We advocate that programmers and compilers can greatly benefit from a structured classification of program code. Such a classification can help programmers to find opportunities for parallelization, reason about their code, and interact with other programmers. Similarly, parallelising compilers and source-to-source compilers can take threading and optimization decisions based on the same classification. In this work, we introduce algorithmic species, a classification of affine loop nests based on the polyhedral model and targeted for both automatic and manual use. Individual classes capture information such as the structure of parallelism and the data reuse. To make the classification applicable for manual use, a basic vocabulary forms the base for the creation of a set of intuitive classes. To demonstrate the use of algorithmic species, we identify 115 classes in a benchmark set. Additionally, we demonstrate the suitability of algorithmic species for automated uses by showing a tool to automatically extract species from program code, a species-based source-to-source compiler, and a species-based performance prediction model.