Optimization of collective communication in intra-cell MPI

Authors:
M. K. Velamati;A. Kumar;N. Jayam;G. Senthilkumar;P. K. Baruah;R. Sharma;S. Kapoor;A. Srinivasan
Affiliations:
Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;IBM, Austin;Dept. of Computer Science, Florida State University
Venue:
HiPC'07 Proceedings of the 14th international conference on High performance computing
Year:
2007

Citing 5
Cited 5

Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Efficient high performance collective communication for the cell blade

Proceedings of the 23rd international conference on Supercomputing
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A portable, efficient inter-core communication scheme for embedded multicore platforms

Journal of Systems Architecture: the EUROMICRO Journal
Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

International Journal of Communication Networks and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell is a heterogeneous multi-core processor, which has eight coprocessors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.