Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor

Authors:
Alastair F. Donaldson;Paul Keir;Anton Lokhmotov
Affiliations:
Codeplay Software, Edinburgh, UK EH1 3HP;Department of Computing Science, University of Glasgow, Glasgow, UK G12 8QQ;Department of Computing, Imperial College London, London, UK SW7 2AZ
Venue:
Euro-Par 2008 Workshops - Parallel Processing
Year:
2009

Citing 7
Cited 1

A bridging model for parallel computation

Communications of the ACM
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Auto-parallelisation of sieve C++ programs

Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Delayed side-effects ease multi-core programming

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Automatic analysis of DMA races using model checking and k-induction

Formal Methods in System Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe compiler and run-time optimisations for effective auto-parallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by annotating sieve scopes , which abstract the "read in, compute in parallel, write out" processing paradigm. We show that the semantics of sieve scopes enables data movement optimisations, such as re-organising global memory reads to minimise DMA transfers and streaming reads from uniformly accessed arrays. We also describe run-time optimisations for committing side-effects to main memory. We provide experimental results showing the benefits of our optimisations, and compare the Sieve-Cell system with IBM's OpenMP implementation for Cell.