MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping and Synchronizing Streaming Applications on Cell Processors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies all computing units is the main challenge. We therefore designed the StarPU runtime system for providing portable support for heterogeneous multicore processors to high performance applications and compiler environments. StarPU provides a high-level, unified execution model which is tightly coupled to an expressive data management library. In addition to our previous results on using multicore processors alongside with graphic processors, we show that StarPU is flexible enough to efficiently exploit the heterogeneous resources in the Cell processor. We present a scalable design supporting multiple different accelerators while minimizing the overhead on the overall system. Using experiments with classical linear algebra algorithms, we show that StarPU improves programmability and provides performance portability.