Collective communication: theory, practice, and experience: Research Articles

Authors:
Ernie Chan;Marcel Heimlich;Avi Purkayastha;Robert van de Geijn
Affiliations:
Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, U.S.A.;Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, U.S.A.;Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX 78712, U.S.A.;Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, U.S.A.
Venue:
Concurrency and Computation: Practice & Experience
Year:
2007

Citing 0
Cited 12

Java for high performance computing: assessment of current research and practice

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A configurable algorithm for parallel image-compositing applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimal bucket algorithms for large MPI collectives on torus interconnects

Proceedings of the 24th ACM International Conference on Supercomputing
Programming the Linpack benchmark for Roadrunner

IBM Journal of Research and Development
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters

The Journal of Supercomputing
F-MPJ: scalable Java message-passing communications on parallel systems

The Journal of Supercomputing
Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

Concurrency and Computation: Practice & Experience
On distributed file tree walk of parallel file systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Elemental: A New Framework for Distributed Memory Dense Matrix Computations

ACM Transactions on Mathematical Software (TOMS)
NUMA-aware image compositing on multi-GPU platform

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.02

Visualization

Abstract

We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included. Copyright © 2007 John Wiley & Sons, Ltd.