BlockLib: a skeleton library for cell broadband engine

Authors:
Markus Ålind;Mattias V. Eriksson;Christoph W. Kessler
Affiliations:
Linköping university, Linköping, Sweden;Linköping university, Linköping, Sweden;Linköping university, Linköping, Sweden
Venue:
Proceedings of the 1st international workshop on Multicore software engineering
Year:
2008

Citing 10
Cited 7

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Structured development of parallel programs

Structured development of parallel programs
NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model

The Journal of Supercomputing
Practical Pram Programming

Practical Pram Programming
Parallel Programming Using Skeleton Functions

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
A Skeleton Library

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Parallel Computing
Managing distributed shared arrays in a bulk-synchronous parallel programming environment: Research Articles

Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining

Journal of Parallel and Distributed Computing

International workshop on multicore software engineering (IWMSE 2008)

Companion of the 30th international conference on Software engineering
Towards an Intelligent Environment for Programming Multi-core Computing Systems

Euro-Par 2008 Workshops - Parallel Processing
Automatic parallelization of simulation code for equation-based models with software pipelining and measurements on three platforms

ACM SIGARCH Computer Architecture News
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
SkePU: a multi-backend skeleton programming library for multi-GPU systems

Proceedings of the fourth international workshop on High-level parallel programming and applications
Optimized on-chip-pipelined mergesort on the cell/B.E.

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Scheduling streaming applications on a complex multicore platform

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cell Broadband Engine is a heterogeneous multicore processor for high-performance computing and gaming. Its architecture allows for an impressive peak performance but, at the same time, makes it very hard to write efficient code. The need to simultaneously exploit SIMD instructions, coordinate parallel execution of the slave processors, overlap DMA memory traffic with computation, keep data properly aligned in memory, and explicitly manage the very small on-chip memory buffers of the slave processors, leads to very complex code. In this work, we adopt the skeleton programming approach to abstract from much of the complexity of Cell programming while maintaining high performance. The abstraction is achieved through a library of parallel generic building blocks, called BlockLib. Macro-based generative programming is used to reduce the overhead of genericity in skeleton functions and control code size expansion. We demonstrate the library usage with a parallel ODE solver application. Our experimental results show that BlockLib code achieves performance close to hand-written code and even outperforms the native IBM BLAS library in cases where several slave processors are used.