Available task-level parallelism on the Cell BE

Authors:
Alejandro Rico;Alex Ramirez;Mateo Valero
Affiliations:
(Corresponding author: Alejandro Rico, Universitat Politecnica de Catalunya, Jordi Girona 1-3, D6-113, 08034 Barcelona, Spain. Tel.: +34 93 40 54097/ E-mail: arico@ac.upc.edu) Universitat Politecn ...;Universitat Politecnica de Catalunya, Barcelona, Spain and Barcelona Supercomputing Center, Barcelona, Spain;Universitat Politecnica de Catalunya, Barcelona, Spain and Barcelona Supercomputing Center, Barcelona, Spain
Venue:
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Year:
2009

Citing 15
Cited 6

What is scalability?

ACM SIGARCH Computer Architecture News
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiprocessor scalability predictions through detailed program execution analysis

ICS '95 Proceedings of the 9th international conference on Supercomputing
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems

IEEE Design & Test
SCALEA: A Performance Analysis Tool for Distributed and Parallel Programs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Minimum and Maximum Utilization Bounds for Multiprocessor Rate Monotonic Scheduling

IEEE Transactions on Parallel and Distributed Systems
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A framework for modelling and analysis of software systems scalability

Proceedings of the 28th international conference on Software engineering
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Analyzing overheads and scalability characteristics of openMP applications

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science

Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Exploiting fine-grained parallelism on cell processors

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Task Superscalar: An Out-of-Order Task Pipeline

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
On the simulation of large-scale architectures using multiple application abstraction levels

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Tagged procedure calls (TPC): efficient runtime support for task-based parallelism on the cell processor

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Support for OpenMP tasks on cell architecture

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a clear industrial trend towards chip multiprocessors (CMP) as the most power efficient way of further increasing performance. Heterogeneous CMP architectures take one more step along this power efficiency trend by using multiple types of processors, tailored to the workloads they will execute. Programming these CMP architectures has been identified as one of the main challenges in the near future, and programming heterogeneous systems is even more challenging. High-level programming models which allow the programmer to identify parallel tasks, and the runtime management of the inter-task dependencies, have been identified as a suitable model for programming such heterogeneous CMP architectures. In this paper we analyze the performance of Cell Superscalar, a task-based programming model for the Cell Broadband Engine Architecture, in terms of its scalability to higher number of on-chip processors. Our results show that the low performance of the PPE component limits the scalability of some applications to less than 16 processors. Since the PPE has been identified as the limiting element, we perform a set of simulation studies evaluating the impact of out-of-order execution, branch prediction and larger caches on the task management overhead. We conclude that out-of-order execution is a very desirable feature, since it increases task management performance by 50%. We also identify memory latency as a fundamental aspect in performance, while the working set is not that large. We expect a significant performance impact if task management would run using a fast private memory to store the task dependency graph instead of relying on the cache hierarchy.