A bridging model for parallel computation
Communications of the ACM
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Auto-parallelisation of sieve C++ programs
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Delayed side-effects ease multi-core programming
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Automatic analysis of DMA races using model checking and k-induction
Formal Methods in System Design
Hi-index | 0.00 |
We describe compiler and run-time optimisations for effective auto-parallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by annotating sieve scopes , which abstract the "read in, compute in parallel, write out" processing paradigm. We show that the semantics of sieve scopes enables data movement optimisations, such as re-organising global memory reads to minimise DMA transfers and streaming reads from uniformly accessed arrays. We also describe run-time optimisations for committing side-effects to main memory. We provide experimental results showing the benefits of our optimisations, and compare the Sieve-Cell system with IBM's OpenMP implementation for Cell.