Towards metaprogramming for parallel systems on a chip

Authors:
Lee Howes;Anton Lokhmotov;Alastair F. Donaldson;Paul H. J. Kelly
Affiliations:
Department of Computing, Imperial College London, London, UK;Department of Computing, Imperial College London, London, UK;Computing Laboratory, University of Oxford, Oxford, UK;Department of Computing, Imperial College London, London, UK
Venue:
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Year:
2009

Citing 7
Cited 1

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Program optimization carving for GPU computing

Journal of Parallel and Distributed Computing
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Principles of Parallel Programming

Principles of Parallel Programming

Generating GPU code from a high-level representation for image processing kernels

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We demonstrate that the performance of commodity parallel systems significantly depends on low-level details, such as storage layout and iteration space mapping, which motivates the need for tools and techniques that separate a high-level algorithm description from low-level mapping and tuning. We propose to build a tool based on the concept of decoupled Access/Execute metadata which allow the programmer to specify both execution constraints and memory access pattern of a computation kernel.