Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

Authors:
Farshad Khunjush;David Gong;Nikitas J. Dimopoulos
Affiliations:
Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, P.O. Box 71348-51154, Shiraz, Iran.;Department of Electrical and Computer Engineering, University of Victoria, Victoria, B.C. V8W 3P6, Canada.;Department of Electrical and Computer Engineering, University of Victoria, Victoria, B.C. V8W 3P6, Canada
Venue:
International Journal of Communication Networks and Distributed Systems
Year:
2011

Citing 13
Cited 0

A Theory for Total Exchange in Multidimensional Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Efficient Communication Using Message Prediction for Cluster Multiprocessors

CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Lazy direct-to-cache transfer during receive operations in a message passing environment

Proceedings of the 3rd conference on Computing frontiers
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimization of collective communication in intra-cell MPI

HiPC'07 Proceedings of the 14th international conference on High performance computing
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a set of factors has been leading high-performance processor architectures toward designs that feature multiple processing cores on a single chip (a.k.a. CMP). The cell broadband engine (BE) shows potential to provide high-performance to parallel applications (e.g., MPI applications). An efficient implementation of collective communication operations is one of the key issues to reach high-performance and scalability in parallel applications. In this work, we implement several collective communications and investigate their performance in terms of latency and the associated components. For this, broadcast, all-gather and total-exchange functions are implemented on the Cell BE processor.