Building a high-performance collective communication library

Authors:
Mike Barnett;Satya Gupta;David G. Payne;Lance Shuler;Robert van de Geijn;Jerrell Watts
Affiliations:
University of Idaho, Moscow, Idaho;Supercomputer Systems Division, Intel Corporation, Beaverton, Oregon;Supercomputer Systems Division, Intel Corporation, Beaverton, Oregon;Sandia National Laboratory, Albuquerque, New Mexico;The University of Texas at Austin, Austin, Texas;California Institute of Technology, Pasadena, California
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 4
Cited 23

The design of a standard message passing interface for distributed memory concurrent computers

Parallel Computing - Special issue: message passing interfaces
Broadcasting on meshes with wormhole routing

Journal of Parallel and Distributed Computing
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
Optimal Broadcasting in Mesh-Connected Architectures

Optimal Broadcasting in Mesh-Connected Architectures

Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks

IEEE Transactions on Parallel and Distributed Systems
An Estimation of Complexity and Computational Costs for Vertical Block-Cyclic Distributed Parallel LU Factorization

The Journal of Supercomputing
Supporting dynamic parallel object arrays

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Scaling the unscalable: a case study on the AlphaServer SC

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance Analysis of a Myrinet-Based Cluster

Cluster Computing
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Scalable NIC-based Reduction on Large-scale Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Reconfigurable MPI Broadcast Function

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
The design and implementation of MPI collective operations for clusters in long-and-fast networks

Cluster Computing
Implications of application usage characteristics for collective communication offload

International Journal of High Performance Computing and Networking
NIC-based reduction algorithms for large-scale clusters

International Journal of High Performance Computing and Networking
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
MPI Applications on Grids: A Topology Aware Approach

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A configurable algorithm for parallel image-compositing applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Two-tree algorithms for full bandwidth broadcast, reduction and scan

Parallel Computing
Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Efficient implementation of reduce-scatter in MPI

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Cache injection for parallel applications

Proceedings of the 20th international symposium on High performance distributed computing
A proposal of reconfigurable MPI collective communication functions

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included. To obtain this library for Intel systems contact intercom©cs.utexas.edu.