Optimization of collective communication in intra-cell MPI

  • Authors:
  • M. K. Velamati;A. Kumar;N. Jayam;G. Senthilkumar;P. K. Baruah;R. Sharma;S. Kapoor;A. Srinivasan

  • Affiliations:
  • Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;Dept. of Mathematics and Computer Science, Sri Sathya Sai University;IBM, Austin;Dept. of Computer Science, Florida State University

  • Venue:
  • HiPC'07 Proceedings of the 14th international conference on High performance computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cell is a heterogeneous multi-core processor, which has eight coprocessors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.