Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Authors:
Arun Kumar;Naresh Jayam;Ashok Srinivasan;Ganapathy Senthilkumar;Pallav K. Baruah;Shakti Kapoor;Murali Krishna;Raghunath Sarma
Affiliations:
Sri Sathya Sai University, Prasanthi Nilayam, India;Sri Sathya Sai University, Prasanthi Nilayam, India;Florida State University, Tallahassee, Florida;Sri Sathya Sai University, Prasanthi Nilayam, India;Sri Sathya Sai University, Prasanthi Nilayam, India;IBM, Austin, Texas;Sri Sathya Sai University, Prasanthi Nilayam, India;Sri Sathya Sai University, Prasanthi Nilayam, India
Venue:
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2007

Citing 3
Cited 4

The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Nested parallelism for multi-core HPC systems using Java

Journal of Parallel and Distributed Computing
Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

International Journal of Communication Networks and Distributed Systems
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell Broadband Engine™ is a new heterogeneous multi-core processor from IBM, Sony, and Toshiba. It contains eight co-processors, called Synergistic Processing Elements (SPEs), which operate directly on distinct 256 KB local stores, and also have access to a shared 512 MB to 2 GB main memory. The combined peak speed of the SPEs is 204.8 Gflop/s in single precision and 14.64 Gflop/s in double precision. There is, therefore, much interest in using the Cell BE™ for high performance computing applications. However, the unconventional architecture of the SPEs, in particular their local stores, creates some programming challenges. We describe our implementation of certain core features of MPI, such as blocking point-to-point calls and collective communication calls, which can help meet these challenges, by enabling a large class of MPI applications to be ported to the Cell BE™ processor. This implementation views each SPE as a node for an MPI process. We store the application data in main memory in order to avoid being limited by the local store size. The local store is abstracted in the library and thus hidden from the application with respect to MPI calls. We have achieved bandwidth up to 6.01 GB/s and latency as low as 0.41 ms on the ping-pong test. The contribution of this work lies in (i) demonstrating that the Cell BE™ has good potential for running intra-Cell BE™ MPI applications, (ii) enabling such applications to be ported to the Cell BE™ with minimal effort, and (iii) evaluating the performance impact of different design choices.