Exploiting fine-grained parallelism on cell processors

Authors:
Ralf Hoffmann;Andreas Prell;Thomas Rauber
Affiliations:
Department of Computer Science, University of Bayreuth, Germany;Department of Computer Science, University of Bayreuth, Germany;Department of Computer Science, University of Bayreuth, Germany
Venue:
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Year:
2010

Citing 10
Cited 0

Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Introduction to the cell broadband engine architecture

IBM Journal of Research and Development
CellSs: making it easier to program the cell broadband engine processor

IBM Journal of Research and Development
An adaptive cut-off for task parallelism

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Available task-level parallelism on the Cell BE

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications

Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications
Dynamic Task Scheduling and Load Balancing on Cell Processors

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Driven by increasing specialization, multicore integration will soon enable large-scale chip multiprocessors (CMPs) with many processing cores. In order to take advantage of increasingly parallel hardware, independent tasks must be expressed at a fine level of granularity to maximize the available parallelism and thus potential speedup. However, the efficiency of this approach depends on the runtime system, which is responsible for managing and distributing the tasks. In this paper, we present a hierarchically distributed task pool for task parallel programming on Cell processors. By storing subsets of the task pool in the local memories of the Synergistic Processing Elements (SPEs), access latency and thus overheads are greatly reduced. Our experiments show that only a worker-centric runtime system that utilizes the SPEs for both task creation and execution is suitable for exploiting fine-grained parallelism.