SPENK: adding another level of parallelism on the cell broadband engine

Authors:
Mohamed F. Ahmed;Reda A. Ammar;Sanguthevar Rajasekaran
Affiliations:
University of Connecticut, Storrs, CT;University of Connecticut, Storrs, CT;University of Connecticut, Storrs, CT
Venue:
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Year:
2008

Citing 12
Cited 2

MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Charm++ simplifies coding for the cell processor

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
High performance combinatorial algorithm design on the Cell Broadband Engine processor

Parallel Computing
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor

International Journal of Parallel Programming
Optimization of collective communication in intra-cell MPI

HiPC'07 Proceedings of the 14th international conference on High performance computing

Optimizing explicit data transfers for data parallel applications on the cell architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Analysis of task offloading for accelerators

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell Broadband Engine (CBE) is a heterogeneous multi-core processor with unique design properties for high-performance computing. It consists of one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs) connected with the Elements Interconnect Network (EIB). It employs novel techniques, such as software managed cache, to hide memory latency and guarantee, by default, maximum utilization for the overall system resources. However, utilization of these facilities requires complex designs and implementations of algorithms to get best performance. In this paper we discuss our micro-threading model realized by a nano-kernel implemented on top of each SPE. SPE's Nano-kernel, or SPENK, employs the micro-threading model to increase the utilization of the CBE resources while simplifying the programming model. Our framework boosted processor's overall performance by a factor of five compared to the current threading model. It allowed us to build a distributed model for the SPEs' tasks management and automated Local Storage (LS) management. We further utilized the micro-threading model to build an event based programming model on top of the CBE architecture. We tested our framework on two types of algorithms: (1) Uniform memory access algorithms, such as parallel summation, and (2) Non-uniform or irregular memory access algorithms, specifically tree spanning algorithms. For the first type of algorithms we could obtain up to three times performance improvement and fivefold performance improvement in the second type of algorithms.