Partitioning Problems in Parallel, Pipeline, and Distributed Computing
IEEE Transactions on Computers
An introduction to parallel algorithms
An introduction to parallel algorithms
Scheduling inverse trees under the communication model of the LogP-machine
Theoretical Computer Science
A constructive algorithm for memory-aware task assignment and scheduling
Proceedings of the ninth international symposium on Hardware/software codesign
Mapping tree-structured combinatorial optimization problems onto parallel computers
Solving Combinatorial Optimization Problems in Parallel - Methods and Techniques
Efficient Techniques for Clustering and Scheduling onto Embedded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Sequential and Parallel Algorithms for Partitioning Tree Task Graphs on Shared Memory Architecture
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Computer
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Optimized on-chip-pipelined mergesort on the cell/B.E.
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Hi-index | 0.00 |
Multiprocessors-on-chip, such as the Cell BE processor, regularly suffer from restricted bandwidth to off-chip main memory. We propose to reduce memory bandwidth requirements, and thus increase performance, by expressing our application as a task graph, by running dependent tasks concurrently and by pipelining results directly from task to task where possible, instead of buffering in off-chip memory. To maximize bandwidth savings and balance load simultaneously, we solve a mapping problem of tasks to SPEs on the Cell BE. We present three approaches: an integer linear programming formulation that allows to compute Paretooptimal mappings for smaller task graphs, general heuristics, and a problem speci c approximation algorithm. We validate the mappings for dataparallel computations and sorting.