Optimized on-chip pipelining of memory-intensive computations on the cell BE

Authors:
Christoph W. Kessler;Jörg Keller
Affiliations:
Linköpings Universitet, Linköping, Sweden;FernUniversität in Hagen, Hagen, Germany
Venue:
ACM SIGARCH Computer Architecture News
Year:
2009

Citing 13
Cited 1

Partitioning Problems in Parallel, Pipeline, and Distributed Computing

IEEE Transactions on Computers
An introduction to parallel algorithms

An introduction to parallel algorithms
Scheduling inverse trees under the communication model of the LogP-machine

Theoretical Computer Science
A constructive algorithm for memory-aware task assignment and scheduling

Proceedings of the ninth international symposium on Hardware/software codesign
Mapping tree-structured combinatorial optimization problems onto parallel computers

Solving Combinatorial Optimization Problems in Parallel - Methods and Techniques
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Efficient Techniques for Clustering and Scheduling onto Embedded Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Sequential and Parallel Algorithms for Partitioning Tree Task Graphs on Shared Memory Architecture

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Why Systolic Architectures?

Computer
CellSort: high performance sorting on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
A fast and accurate technique for mapping parallel applications on stream-oriented MPSoC platforms with communication awareness

International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems

Optimized on-chip-pipelined mergesort on the cell/B.E.

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiprocessors-on-chip, such as the Cell BE processor, regularly suffer from restricted bandwidth to off-chip main memory. We propose to reduce memory bandwidth requirements, and thus increase performance, by expressing our application as a task graph, by running dependent tasks concurrently and by pipelining results directly from task to task where possible, instead of buffering in off-chip memory. To maximize bandwidth savings and balance load simultaneously, we solve a mapping problem of tasks to SPEs on the Cell BE. We present three approaches: an integer linear programming formulation that allows to compute Paretooptimal mappings for smaller task graphs, general heuristics, and a problem speci c approximation algorithm. We validate the mappings for dataparallel computations and sorting.