A Buffered-Mode MPI Implementation for the Cell BETM Processor

Authors:
Arun Kumar;Ganapathy Senthilkumar;Murali Krishna;Naresh Jayam;Pallav K. Baruah;Raghunath Sharma;Ashok Srinivasan;Shakti Kapoor
Affiliations:
Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India;Dept. of Computer Science, Florida State University.,;IBM, Austin,
Venue:
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Year:
2007

Citing 10
Cited 2

A high-performance MPI implementation on a shared-memory vector supercomputer

Parallel Computing
Program transformation and runtime support for threaded MPI execution on shared-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell Broadband EngineTMis a heterogeneous multi-core architecture developed by IBM, Sony and Toshiba. It has eight computation intensive cores (SPEs) with a small local memory, and a single PowerPC core. The SPEs have a total peak single precision performance of 204.8 Gflops/s, and 14.64 Gflops/s in double precision. Therefore, the Cell has a good potential for high performance computing. But the unconventional architecture makes it difficult to program. We propose an implementation of the core features of MPI as a solution to this problem. This can enable a large class of existing applications to be ported to the Cell. Our MPI implementation attains bandwidth up to 6.01 GB/s, and latency as small as 0.41 μs. The significance of our work is in demonstrating the effectiveness of intra-Cell MPI, consequently enabling the porting of MPI applications to the Cell with minimal effort.