MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Charm++ simplifies coding for the cell processor
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)
Journal of Parallel and Distributed Computing
Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Optimization of collective communication in intra-cell MPI
HiPC'07 Proceedings of the 14th international conference on High performance computing
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
The Cell Broadband Engine (CBE) is a heterogeneous multi-core processor with unique design properties for high-performance computing. It consists of one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs) connected with the Elements Interconnect Network (EIB). It employs novel techniques, such as software managed cache, to hide memory latency and guarantee, by default, maximum utilization for the overall system resources. However, utilization of these facilities requires complex designs and implementations of algorithms to get best performance. In this paper we discuss our micro-threading model realized by a nano-kernel implemented on top of each SPE. SPE's Nano-kernel, or SPENK, employs the micro-threading model to increase the utilization of the CBE resources while simplifying the programming model. Our framework boosted processor's overall performance by a factor of five compared to the current threading model. It allowed us to build a distributed model for the SPEs' tasks management and automated Local Storage (LS) management. We further utilized the micro-threading model to build an event based programming model on top of the CBE architecture. We tested our framework on two types of algorithms: (1) Uniform memory access algorithms, such as parallel summation, and (2) Non-uniform or irregular memory access algorithms, specifically tree spanning algorithms. For the first type of algorithms we could obtain up to three times performance improvement and fivefold performance improvement in the second type of algorithms.