Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

Authors:
Cédric Augonnet;Samuel Thibault;Raymond Namyst;Maik Nijhuis
Affiliations:
INRIA Bordeaux Sud-Ouest --- LaBRI, University of Bordeaux,;INRIA Bordeaux Sud-Ouest --- LaBRI, University of Bordeaux,;INRIA Bordeaux Sud-Ouest --- LaBRI, University of Bordeaux,;Vrije Universiteit Amsterdam,
Venue:
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Year:
2009

Citing 8
Cited 5

MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization

IEEE Transactions on Parallel and Distributed Systems
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping and Synchronizing Streaming Applications on Cell Processors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Programming heterogeneous clusters with accelerators using object-based programming

Scientific Programming
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies all computing units is the main challenge. We therefore designed the StarPU runtime system for providing portable support for heterogeneous multicore processors to high performance applications and compiler environments. StarPU provides a high-level, unified execution model which is tightly coupled to an expressive data management library. In addition to our previous results on using multicore processors alongside with graphic processors, we show that StarPU is flexible enough to efficiently exploit the heterogeneous resources in the Cell processor. We present a scalable design supporting multiple different accelerators while minimizing the overhead on the overall system. Using experiments with classical linear algebra algorithms, we show that StarPU improves programmability and provides performance portability.